Skip to content

josh-works/ruby_web_scraping

Repository files navigation

I wanted a little learning project, so I decided on:

Scrape all links from Ask HN: What is your blog and why should I read it?

Get it published online, so you can visit you get a random blog, or random blog post.

So far, I've got a web scraper together that scrapes the top-level comments of the above blog post and saves them to links.txt.

To visualize, this tool shows links JUST from top-level comments on the above thread:

top-level comments

Next, I'll get some basic routing in place with Sinatra, and put it on Heroku.

Should be a cool little thing.


Misc project notes

These are notes I've taken, ordered by time thought occurred to Josh, that I'll use to guide myself in building additional resources/drills

Do this kind of scraping three times total, save outputs to a text file or database.

Tutorials and Guide's I Have Created as a result of this project

Nokogiri was such a big part of this, but I had such little knowledge, so I ended up creating this, which will be one of eventually many pieces of intermediate_ruby

I used my new Nokogiri knowledge to get this list of links:

links!

link other effort about this:

https://www.dannysalzman.com/2020/04/08/analyzing-hn-readers-personal-blogs

Potential Extensions
Sinatra Usage

Boot app with rerun 'ruby app.rb'

“No Procfile detected” in Sinatra app heroku push

About

A humble side project about Hacker News, Sinatra, and webscraping.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published