# Hacker News Web Scrapping
<hr>

Through the following set of program blocks, we will access the HackerNews webpage : _https://news.ycombinator.com_, and get the stories with more than 100 votes.

We will first start by importing the requests library to send request to HTTP, and BeautifulSoup to parse the HTML files.

In [3]:
import requests as req
from bs4 import BeautifulSoup as bs


Once the neccessary libraries are imported, we will create a fucntion that will fetch the links, and text from the anchor tag with the votes from the `score` class in the HTML page.<br>
It will append only those pages to the **_hn_** list that have votes greater than 100.

In [4]:
def create_custom_hn(links,subtext):
    hn=[]
    for idx,item in enumerate(links):
        title=item.getText()
        href= item.select('a')[0].get('href',None)
        votes=subtext[idx].select('.score')
        if len(votes):
            points = int(votes[0].getText().replace(' points', ''))
            if points>=100: 
                hn.append({'title':title,'link':href,'votes':points})
    return hn

We will create another functions that will sort the dictionary within the **_hn_** list in descending order of votes received.

In [5]:
def sort_by_votes(hnlist):
    return sorted(hnlist,key=lambda k:k['votes'],reverse=True)

Creating the main program that will scrape the website and perform the fucntions.
Since we want the program to be dynamic and ask the user for the pages it wants the program the scrape to get the news. If the user does not enter any page number then, __*user_page_input*__=1.

In [6]:
user_page_input=int(input("Enter the number of pages you want to scrape:"))
try:
    user_page_input=int(input("Enter the number of pages you want to scrape:"))
except ValueError:
    user_page_input = 1

Next we will _for loop_ through each page by using the `requests` library and use the `Beautiful Soup` package to parse the HTML page.
We will assign the header of the news to **_links_** and the sub-header to **_subtext_**.  <br>
The **_hn_** list will be appended to the final **_listing_** list.

In [7]:
print(f"Displaying News from {user_page_input} page(s) of Hacker News....")
listing=[]
for i in range(1,user_page_input+1):
    res=req.get(f'https://news.ycombinator.com/?p={i}')
    soup=bs(res.text,'html.parser')
    links=soup.select('.titleline')
    subtext=soup.select('.subtext')
    listing.extend(create_custom_hn(links,subtext))
listing=sort_by_votes(listing)

Displaying News from 3 pages....


We will loop through **_listing_** and get the data stored within the dictionaries in a presntable format.

In [8]:
for i in listing:
    print("-"*10)
    for j in i.keys():
        print(j,':',i[j])
    print("-"*10)
    print()

----------
title : Nvidia Warp: A Python framework for high performance GPU simulation and graphics (github.com/nvidia)
link : https://github.com/NVIDIA/warp
votes : 394
----------

----------
title : Start presentations on the second slide (tidyfirst.substack.com)
link : https://tidyfirst.substack.com/p/start-presentations-on-the-second
votes : 365
----------

----------
title : I found a 55 year old bug in the first Lunar Lander game (martincmartin.com)
link : https://martincmartin.com/2024/06/14/how-i-found-a-55-year-old-bug-in-the-first-lunar-lander-game/
votes : 364
----------

----------
title : The sun's magnetic field is about to flip (space.com)
link : https://www.space.com/sun-magnetic-field-flip-solar-maximum-2024
votes : 300
----------

----------
title : Mouth-based touchpad enables people living with paralysis to use computers (news.mit.edu)
link : https://news.mit.edu/2024/mouth-based-touchpad-augmental-0605
votes : 294
----------

----------
title : My thoughts on Pytho