A project that pulls in Headlines, grades their sentiment, and outputs two webpages. I am hoping to observe the effect that our constant inundation with negative news is creating.
A WIP version is available at everything-is-x.herokuapp.com, but the current sentiment classification is done through TextBlob and is, in my opinion, insufficient. As of March 5th, 2017, manually collected/classified training data was brought in from the existing corpus the app had collected in February and the subreddits /r/UpliftingNews and /r/feelbadnews.
##TODO
-
Functional TODOS:
-
Manually classify the existing local corpus I have for use by the SVM -
Build a training set from the manually classified corpus -
Pull in extra training set data from manually filtered sources such as /r/UpliftingNews, /r/MorbidReality, etc. -
Implement training functionality for a sklearn SVM -
Implement a testing functionality for the SVM - Add a unit testing framework (tox/ nosetest, needs research)
- Implement classification pipelining.
- Implement writing the results of the SVM to database
- Schedule ML classification
- Consume the ML rating in the UI
-
-
Aesthetic TODOS:
- Present TextBlob, Manual Classification, and ML scores on blur when rolling over a story
- Add an "About" Section to the project
- Add a Homepage to the project
- Consolidate stylesheets, Markup, and JS for Goodnews and Badnews routes into one page
- Optimize pagespeed
- Play with Colors
-
Potential Extras I Would Like To Pursue:
- Pagination
- Ordering by Date/SVM Classification/Sentiment etc.
- Manual feedback from users on whether a story is mis-classified by the learning machine
- This will involve some sort of manual review workflow being added
- this will involve updating the UI to include a feedback mechanism, a route to update a CSV of disputed classifications, and a mechanism to update the training CSVs.
##Acknowledgments
- Jared for all his help with styling
- Jason for all his Python mentoring
- Shruti for the idea to use subreddits for training data