Analyze Hacker News! Get some topics!
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
figure
models
serialized_model
.gitignore
README.md
gensim_example.py
hn-analysis.Rmd
hn-analysis.Rproj
hn_dictionary.pkl
hn_dictionaryMay13_2152.pkl
hn_dictionaryMay14_0005.pkl
hn_dictionaryMay14_0240.pkl
lda_100topic_10pass.gensim
lda_100topic_10pass.gensim.expElogbeta.npy
lda_100topic_10pass.gensim.id2word
lda_100topic_10pass.gensim.state
model_100topics_10passMay13_2159.gensim
model_100topics_10passMay13_2159.gensim.expElogbeta.npy
model_100topics_10passMay13_2159.gensim.id2word
model_100topics_10passMay13_2159.gensim.state
model_100topics_10passMay14_0259.gensim
model_100topics_10passMay14_0259.gensim.expElogbeta.npy
model_100topics_10passMay14_0259.gensim.id2word
model_100topics_10passMay14_0259.gensim.state
model_topics.py
other_topics.csv
predict_topics.py
supervised_topics.csv
text_tagger.py
topics.csv
utils.py

README.md

Code to classify Hacker News articles into a few categories.

Code is a bit messy at the moment, but there are roughly two phases:

  • Phase 1: Train LDA to reduce article text to a 100 dimensional vector, using Gensim
  • Phase 2: Train Logistic Regression or Random Forest on a labeled dataset to map vectors to topics.