CS 572 - Information Retrieval final project, Emory University 2016
- L6 from Yahoo Webscope
- Wikipedia Dump
- Reddit Comment Dump (need to figure out how to link comments with posts)
- Serialize documents in Yahoo QA XML files as an SQLite database
- Build HTTP server for recieving questions for the competition: See Java example on webpage
- Make interface for adding generic prediction models
- Write mixture of experts predictor given all prediction models models
- Article summarizer (use on Wikipedia, Reddit to add more recent data)
- TF-IDF or BM-25 - Gensim
- Latent Semantic Indexing - Gensim
- Latent Dirichlet Allocation - Gensim
- Word2Vec - Gensim
- Neural network model - Some success using Keras