### Importing required libraries

- **BeautifulSoup**: Allows us to pull data out of HTML and XML documents in organized way.
- **requests**: Allow us to send HTTP/1.1 requests using Python.
- **pandas**: Allow us to manupulate, import, export of a dataset.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

### Saving the website url in  variable `"url"`

In [2]:
url = "http://quotes.toscrape.com/page/1/"

### Take out the contents from the page in variable `"c"`

In [3]:
# checking the response usiing requests liabrary
r = requests.get(url)
r

<Response [200]>

In [4]:
c = r.content

In [5]:
# Checking the contents in variable 'c'
c

b'<!DOCTYPE html>\n<html lang="en">\n<head>\n\t<meta charset="UTF-8">\n\t<title>Quotes to Scrape</title>\n    <link rel="stylesheet" href="/static/bootstrap.min.css">\n    <link rel="stylesheet" href="/static/main.css">\n</head>\n<body>\n    <div class="container">\n        <div class="row header-box">\n            <div class="col-md-8">\n                <h1>\n                    <a href="/" style="text-decoration: none">Quotes to Scrape</a>\n                </h1>\n            </div>\n            <div class="col-md-4">\n                <p>\n                \n                    <a href="/login">Login</a>\n                \n                </p>\n            </div>\n        </div>\n    \n\n<div class="row">\n    <div class="col-md-8">\n\n    <div class="quote" itemscope itemtype="http://schema.org/CreativeWork">\n        <span class="text" itemprop="text">\xe2\x80\x9cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\xe2\x80\

### Reading the data using BeautifulSoup library

In [6]:
html = BeautifulSoup(c,'html.parser')
# 'htmi.parser' helps to arrange contents just like HTML page.

### Creating variable `con` to find the contents available in 'container' class.

In [7]:
con = html.find(class_ = 'container')
con

<div class="container">
<div class="row header-box">
<div class="col-md-8">
<h1>
<a href="/" style="text-decoration: none">Quotes to Scrape</a>
</h1>
</div>
<div class="col-md-4">
<p>
<a href="/login">Login</a>
</p>
</div>
</div>
<div class="row">
<div class="col-md-8">
<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
            Tags:
            <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
<a class="tag" href="/tag/change/page/1/">change</a>
<a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
<a class="tag" href="/tag/thinking/page/1/">thinking</a>
<a class="tag" href="/tag/world/page/1/">world</a>
<

### Creating variable 'C' to find out the contents present in 'quote' class.

In [8]:
C = con.find_all('div', class_ = 'quote')

# extracting items available in text class

for items in C:
    q = items.find('span', class_ = 'text').text
    print(q)

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
“Try not to become a man of success. Rather become a man of value.”
“It is better to be hated for what you are than to be loved for what you are not.”
“I have not failed. I've just found 10,000 ways that won't work.”
“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
“A day without sunshine is like, you know, night.”


In [9]:
# extracting items available in author class

for items in C:
    aut = items.find('small', class_ = 'author').text
    print(aut)

Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin


In [10]:
# extracting tags from content

for items in C:
    t = items.find('meta')
    tag = t.attrs['content']
    print(tag)

change,deep-thoughts,thinking,world
abilities,choices
inspirational,life,live,miracle,miracles
aliteracy,books,classic,humor
be-yourself,inspirational
adulthood,success,value
life,love
edison,failure,inspirational,paraphrased
misattributed-eleanor-roosevelt
humor,obvious,simile


### Creating variable `Contents` to save the contents in a list.

In [11]:
C = con.find_all('div', class_ = 'quote')

Contents = []

for items in C:
    q = items.find('span', class_ = 'text').text
    aut = items.find('small', class_ = 'author').text
    t = items.find('meta')
    tag = t.attrs['content']
    Contents.append([q,aut,tag])

print(Contents)

[['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'Albert Einstein', 'change,deep-thoughts,thinking,world'], ['“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'J.K. Rowling', 'abilities,choices'], ['“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'Albert Einstein', 'inspirational,life,live,miracle,miracles'], ['“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'Jane Austen', 'aliteracy,books,classic,humor'], ["“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'Marilyn Monroe', 'be-yourself,inspirational'], ['“Try not to become a man of success. Rather become a man of value.”', 'Albert Einstein', 'adulthood,success,value'], ['“It is better to be hated for what you are than t

### Extracting contents from all 10 pages available in the website.

In [12]:
Contents = []

for i in range (1,10):
    url = f"http://quotes.toscrape.com/page/{i}/"
    r = requests.get(url)
    c = r.content
    html = BeautifulSoup(c,'html.parser')
    con = html.find(class_ = 'container')
    C = con.find_all('div', class_ = 'quote')
    for items in C:
        q = items.find('span', class_ = 'text').text
        aut = items.find('small', class_ = 'author').text
        t = items.find('meta')
        tag = t.attrs['content']
        Contents.append([q,aut,tag])

### Reading the data using pandas liabrary

In [13]:
df = pd.DataFrame(Contents, columns = ['Quote', 'Author', 'Tag'])
df

Unnamed: 0,Quote,Author,Tag
0,“The world as we have created it is a process ...,Albert Einstein,"change,deep-thoughts,thinking,world"
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"abilities,choices"
2,“There are only two ways to live your life. On...,Albert Einstein,"inspirational,life,live,miracle,miracles"
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"aliteracy,books,classic,humor"
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"be-yourself,inspirational"
...,...,...,...
85,“Some day you will be old enough to start read...,C.S. Lewis,"age,fairytales,growing-up"
86,“We are not necessarily doubting that God will...,C.S. Lewis,god
87,“The fear of death follows from the fear of li...,Mark Twain,"death,life"
88,“A lie can travel half way around the world wh...,Mark Twain,"misattributed-mark-twain,truth"


### Saving extracted data in local system in excel format using pandas

In [14]:
df.to_excel('Quotes_to_Scrape.xlsx')

## ***End of Assignment***