web-scrapper

Scrap the web using Python web scrapper!

Setup and installation

Before running the web scrapper
Install the required libraries from the requirements.txt file or just copy and paste the follwing command in your terminal

pip install -r requirements.txt

Breakdown

Let't break down the code

First we will import the requests library in python

import requests as req

After this we will define a function named web-scrapper() which will scrapp the web and
then save the response to a file named response.txt as follows -

def web_scrapper(url):
    try:
        response = req.get(url) # Sending a get request to the url entered by the user
        response.raise_for_status() 
        with open("response.txt", "w", encoding="utf-8") as file:
            file.write(format_response(url,response.text))  #Saving the response to response.txt file
        print(f"Sucessfully scrapped {url} and saved to response.txt") # Printing a sucess message
    except req.exceptions.RequestException as err:
        print(f"Error scrapping {url}: {err}") #If an exception occurs, we will print it

Now it's time to define another function which will take the url input from the user and
then scrapp it!

def take_input():
    url = input("Enter the URL: ")
    if url == "":
        print("Cannot scrap empty URL!")
    else:
        web_scrapper(url)

Who doesn't like nice formated code! So, to format it, let's make another function named formatted_response()

def format_response(url,text):
    """Formats the response!"""
    formatted_response = f"Scrapped from {url}:\n\n{text}"
    return formatted_response

Lastly, we will run this script using -

if __name__ == "__main__":
    take_input() #Basically here we are running out take_input() function, which will take the input for url
                 #from the user

Now after putting all the pieces togther
We will get the following script -

import requests as req #importing the requests library as req

#Staring our main function from here
def web_scrapper(url):
    try:
        response = req.get(url) # Sending a get request to the url entered by the user
        response.raise_for_status() 
        with open("response.txt", "w", encoding="utf-8") as file:
            file.write(format_response(url,response.text))  #Saving the response to response.txt file
        print(f"Sucessfully scrapped {url} and saved to response.txt") # Printing a sucess message
    except req.exceptions.RequestException as err:
        print(f"Error scrapping {url}: {err}") #If an exception occurs, we will print it


def take_input():
    url = input("Enter the URL: ")
    if url == "":
        print("Cannot scrap empty URL!")
    else:
        web_scrapper(url)

def format_response(url,text):
    """Formats the response!"""
    formatted_response = f"Scrapped from {url}:\n\n{text}"
    return formatted_response

if __name__ == "__main__":
    take_input() #Basically here we are running out take_input() function, which will take the input for url
                 #from the user

That's all for this simple web scrapper!

Thanks for reading :)
Have an absolutely fantastic day ahead :D

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
web-scrapper.py		web-scrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-scrapper

Setup and installation

Breakdown

About

Releases

Packages

Languages

License

BlazeInferno64/web-scrapper

Folders and files

Latest commit

History

Repository files navigation

web-scrapper

Setup and installation

Breakdown

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages