Hacker News Story Recommendations

Introduction

This library uses Naive Bayes filtering to find stories a Hacker News user would probably like to read based on stories they've already upvoted.

Installation

Linux

Before installing this tool one should go use HackerNewsToJSON to get their upvoted stories. You'll need a decent number to train the classifier. I personally have over 3000, so I'm not entirely sure how many are needed but would be surprised if it's more than 500.

Once that's done, you're ready to use this.

git clone this repository:

git clone git@github.com:JD-P/HNStoryRecommendations.git
Copy the JSON file you got from HackerNewsToJSON into the repository.
You should probably set up a .gitignore so it's not possible to accidentally upload your .json or .pickle files.
Set up a virtual environment for python3

virtualenv --python=python3 recommend_env
Activate the virtual environment

source recommend_env/bin/activate
Install the following:

pip install nltk numpy requests
This part is real hack-y, and I plan to turn this library into a proper pypy package later but you need to clone py-search-hn and put its "get_user_comments.py" and "search_hn.py" files into this repository.
You should now be ready to run the program for grabbing training data. We're grabbing 100 stories from 3 hours before and 3 hours after each of our upvoted stories as training data. That is, getting things the user didn't upvote in the same timeframe as things they did upvote.

python3 get_training_set.py your_stories_file.json

Buckle in because it'll take a little while to grab all the stories we want from the search API. This tool will output the stories to a .pickle file you'll use for the next step.

Finally, run the training program which will train the model and give you an output of stories above a 15% upvote likelihood threshold:

python3 train_model.py your_stories_file.json your_training_stories.pickle

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
get_training_set.py		get_training_set.py
readme.md		readme.md
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hacker News Story Recommendations

Introduction

Installation

Linux

About

Releases

Packages

Languages

License

JD-P/HNStoryRecommendations

Folders and files

Latest commit

History

Repository files navigation

Hacker News Story Recommendations

Introduction

Installation

Linux

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages