Author: Shuang Yuan
University of Southern California
0.1 reddit_praw/ extract data
0.2 reddit_praw/ integrate data in data/raw folder
0.3 model/ split data.csv into test.csv and train.csv
0.4 model/ train word2vec
0.5 model/reddit_tc_*.py: models
Tool for scraping titles of subreddits from
run file, read labels from 'labels' file and write results to '../data/raw/' folder
data.csv: all raw data including data and labels.
train.csv: training data.
test.csv: testing data.
raw folder: all raw data.
word2vec: word embeddings including GloVe and word vectors that we train. Note: since the files are so huge that they cannot be uploaded, we delete data in the files.
Model folder includes model files and tools for preprocessing data.