Skip to content

Explore some interesting NLP experiments with reddit comments data.

Notifications You must be signed in to change notification settings

chocoluffy/redditQA

Repository files navigation

1-Predict-Topics

Run TF-IDF and LSI on existing subreddit comments, and given user's new comment, try predicting and recommending subreddit.

Predict Topics

2-PCA-Distribution-Plot

Build document-term matrix from BigQuery data, then run LDA to find topics distribution for each subreddit, and apply t-SNE dimension reduction with matplotlib visualization.

LDA visualization

3-Bipartite-Graph

Construct a bipartite graph between authors and topics, and propagate back and forth the labels to identify generalist/specialist among reddit authors for differnt community.

Bipartite Graph

4-LDA-On-TFIDF

Fine tune the model from week3, with TF-IDF weights applied on BOW matrix but keep in same magnitude.

Improved LDA

5-Model-Inspection

Examine the validity of models obtained from week4, and refine models by tuning hyper-parameters.

Model Inspection

6-Word2Vec

Apply non-semantic techniques(finding overlapping commenters), and semantic techniques(such as LSA, word2vec) to examine similarity between each subreddits.

Word2vec

About

Explore some interesting NLP experiments with reddit comments data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages