Text as data:
- cleaning;
- pre-processing;
- post processing:
- Topic Modelling: LDA, seeded LDA
- Word2Vec and other static embedding
- RNN, LSTM, and Seq2Seq
- Attention and Transformers
Python Version: 3.7.16
Scikit-learn versions that are supported: 0.20 to 0.24 and 1.0, 1.0.1, and 1.0.2
To install the packages:
pip install -r requirements.txt
Ad-hoc materials:
Along with the lecture slides, we can also refer to the resources below:
-
Speech and Language Processing:
-
Text Processing:
-
Fundamentals:
-
Tf-IDf:
- Blog: TF-IDF
-
Zipf's Law:
- Blog: Zipf's Law
- YT: Zipf's Law
-
Heaps' Law:
- YT: Heaps' law
-
-
Clustering:
-
SVD:
- Basics, YT: Singular Value Decomposition (the SVD)
- High Level Overview, YT: Singular Value Decomposition (SVD): Overview
- Mathematical Overview: YT: Singular Value Decomposition (SVD): Mathematical Overview
- Rank R Approximation, YT: Singular Value Decomposition (SVD): Matrix Approximation
-
LSA:
- YT: LSA
-
-
Topic Modelling:
-
Data Pre-processing for Topic Modelling:
-
LDA:
-
-
Word2Vec:
- YT: Word2Vec, GloVe, FastText- EXPLAINED!
- YT: Word Vector Representations: word2vec
- YT: GloVe: Global Vectors for Word Representation
- YT: Word2Vec Detailed Explanation, Train custom Word2Vec Model using genism in Python
- YT: Coding Word2Vec : Natural Language Processing
- YT: Word Embedding and Word2Vec, Clearly Explained!!!
- Extra, YT: Word Embeddings - EXPLAINED!
-
Doc2Vec:
-
RNN and LSTM:
-
Attention and Transformers:
- YT: The Attention Mechanism in LLMs a High Level Overview
- YT: The math behind Attention: Keys, Queries, and Values matrices
- YT: Attention for Neural Networks, Clearly Explained!!!
- YT: What is Transformer Models and how do they work?
- YT: Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
- YT:Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!
- Blog: A Gentle Introduction to Positional Encoding in Transformer Models, Part 1