Skip to content
/ tad Public

Text as data: cleaning, pre-processing, and post processing

Notifications You must be signed in to change notification settings

alcatraz47/tad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tad

Text as data:

    - cleaning;
    - pre-processing;
    - post processing:
    - Topic Modelling: LDA, seeded LDA
    - Word2Vec and other static embedding
    - RNN, LSTM, and Seq2Seq
    - Attention and Transformers

Python Version: 3.7.16

Scikit-learn versions that are supported: 0.20 to 0.24 and 1.0, 1.0.1, and 1.0.2

To install the packages:

pip install -r requirements.txt

Ad-hoc materials:

Along with the lecture slides, we can also refer to the resources below:

  1. Speech and Language Processing:

  2. Text Processing:

  3. Fundamentals:

  4. Clustering:

  5. Topic Modelling:

  6. Word2Vec:

  7. Doc2Vec:

  8. RNN and LSTM:

  9. Attention and Transformers:

About

Text as data: cleaning, pre-processing, and post processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published