# Project : Web Scrapping using Python.

Pradeep Kumar Singh  
22/04/2023

## Website used : 
Quotes to scrape.

## Project details : 
This project is about scrapping the website "Quotes to scrape".

**Steps involved**
* Downloading url.
* Scrapping required data.
* Converting data into dataframe.
* Saving the data into csv format.
* Validating the data by reading it.
    

In [2]:
# installing required libraries

! pip install bs4
! pip install requests



In [3]:
# importing required libraries

import requests
import pandas as pd
from bs4 import BeautifulSoup


In [4]:
# Downloading url and parsing it.

url= "http://quotes.toscrape.com/page/1/"
response=requests.get(url)
response=response.content
soup=BeautifulSoup(response,'html.parser')
print(soup.prettify())


<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Quotes to Scrape
  </title>
  <link href="/static/bootstrap.min.css" rel="stylesheet"/>
  <link href="/static/main.css" rel="stylesheet"/>
 </head>
 <body>
  <div class="container">
   <div class="row header-box">
    <div class="col-md-8">
     <h1>
      <a href="/" style="text-decoration: none">
       Quotes to Scrape
      </a>
     </h1>
    </div>
    <div class="col-md-4">
     <p>
      <a href="/login">
       Login
      </a>
     </p>
    </div>
   </div>
   <div class="row">
    <div class="col-md-8">
     <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
      <span class="text" itemprop="text">
       “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
      </span>
      <span>
       by
       <small class="author" itemprop="author">
        Albert Einstein
       </small>
       <a href="/author/Albert

In [5]:
# Scrapping the data from the website

contents = []
quotes=[]
tags=[]
authors=[]

for page in range (1,11):
    url= f"http://quotes.toscrape.com/page/{page}/"
    response=requests.get(url)
    response=response.content
    soup=BeautifulSoup(response,'html.parser')

    div=soup.find('div',class_='container')
    div
    contents = div.find_all('div',class_= 'quote')

    for content in contents:
        quote = content.find_all('span',class_='text',itemprop='text')
        for i in quote:
            quotes.append(i.text)

    for content in contents:
        tag = content.find('meta')
        j= tag.attrs['content']
        tags.append(j)

    for content in contents:
        author = content.find('small')
        for k in author:
            authors.append(k.text)


In [22]:
# Ziping all three lists and converting it into dataframe.

df=zip(quotes,tags,authors)
df
df=pd.DataFrame(df)
df.columns=["Quotes","Tags","Authors"]

df

Unnamed: 0,Quotes,Tags,Authors
0,“The world as we have created it is a process ...,"change,deep-thoughts,thinking,world",Albert Einstein
1,"“It is our choices, Harry, that show what we t...","abilities,choices",J.K. Rowling
2,“There are only two ways to live your life. On...,"inspirational,life,live,miracle,miracles",Albert Einstein
3,"“The person, be it gentleman or lady, who has ...","aliteracy,books,classic,humor",Jane Austen
4,"“Imperfection is beauty, madness is genius and...","be-yourself,inspirational",Marilyn Monroe
...,...,...,...
95,“You never really understand a person until yo...,better-life-empathy,Harper Lee
96,“You have to write the book that wants to be w...,"books,children,difficult,grown-ups,write,write...",Madeleine L'Engle
97,“Never tell the truth to people who are not wo...,truth,Mark Twain
98,"“A person's a person, no matter how small.”",inspirational,Dr. Seuss


In [25]:
# Saving the file into csv format.

df.to_csv('Quotes_Scrapped.csv')

In [47]:
# Validating the excel file by reading and converting into dataframe.

csv = pd.read_csv('Quotes_Scrapped.csv')
csv

Unnamed: 0.1,Unnamed: 0,Quotes,Tags,Authors
0,0,“The world as we have created it is a process ...,"change,deep-thoughts,thinking,world",Albert Einstein
1,1,"“It is our choices, Harry, that show what we t...","abilities,choices",J.K. Rowling
2,2,“There are only two ways to live your life. On...,"inspirational,life,live,miracle,miracles",Albert Einstein
3,3,"“The person, be it gentleman or lady, who has ...","aliteracy,books,classic,humor",Jane Austen
4,4,"“Imperfection is beauty, madness is genius and...","be-yourself,inspirational",Marilyn Monroe
...,...,...,...,...
95,95,“You never really understand a person until yo...,better-life-empathy,Harper Lee
96,96,“You have to write the book that wants to be w...,"books,children,difficult,grown-ups,write,write...",Madeleine L'Engle
97,97,“Never tell the truth to people who are not wo...,truth,Mark Twain
98,98,"“A person's a person, no matter how small.”",inspirational,Dr. Seuss


In [48]:
# Droping extra 'Unnamed: 0' column.

print(csv.columns)
csv.drop(['Unnamed: 0'],axis=1,inplace=True)
csv


Index(['Unnamed: 0', 'Quotes', 'Tags', 'Authors'], dtype='object')


Unnamed: 0,Quotes,Tags,Authors
0,“The world as we have created it is a process ...,"change,deep-thoughts,thinking,world",Albert Einstein
1,"“It is our choices, Harry, that show what we t...","abilities,choices",J.K. Rowling
2,“There are only two ways to live your life. On...,"inspirational,life,live,miracle,miracles",Albert Einstein
3,"“The person, be it gentleman or lady, who has ...","aliteracy,books,classic,humor",Jane Austen
4,"“Imperfection is beauty, madness is genius and...","be-yourself,inspirational",Marilyn Monroe
...,...,...,...
95,“You never really understand a person until yo...,better-life-empathy,Harper Lee
96,“You have to write the book that wants to be w...,"books,children,difficult,grown-ups,write,write...",Madeleine L'Engle
97,“Never tell the truth to people who are not wo...,truth,Mark Twain
98,"“A person's a person, no matter how small.”",inspirational,Dr. Seuss
