Hacker News Scraper

A scraper for Hacker News (duh)

Usage:

run hn_scraper.py (modify the main function call to change the end_id to parse to)

Filtering Details

Parallelism details

Generating comment trees can be expensive. We use two levels of parallelism to solve this. Items can either be stories or comments, we are given the total number of items on Hacker News (~25,000,000). One pool of workers check if items are stories, and if so initiate another pool of workers that traverse the n comment trees (n is the numebr of good top-level comments) in order to build a single comment chain for each top level comment.

Example:

TODO:

[] Add in flags for better data management [] Fix requests issue

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
hn_scrape.py		hn_scrape.py
story_list		story_list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hacker News Scraper

Usage:

Filtering Details

Parallelism details

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

EleutherAI/hn-scraper

Folders and files

Latest commit

History

Repository files navigation

Hacker News Scraper

Usage:

Filtering Details

Parallelism details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages