For this project I used the Request module to download the website’s HTML. Then using Beautiful Soup I parsed the HTML to grab just the components I wanted. With these tools I was able to make a web-scraper that scrapes the first 2 pages of the news website Hacker news and grabs the headlines, links, and number of votes for all articles that have over 100 votes.
To use this code
- Download beautiful soup 4
- Run the code from the command prompt