<font face="verdana" size="6" color="blue">Introducing Webscraping</font> ![](https://miro.medium.com/max/990/1*AaAIETIq7XNlLrFQW7BtZg.png)

1. <font color="blue" size=4>What is Web scraping?</font>
    - Web scraping is a technique that allows you to extract data from websites and store it locally or in a database. It's also known as web harvesting, data scraping or data crawling. There is a lot of software out there that you can install and use to web scrape. Today, I'm going to introduce you to a super friendly and free python package called Beautiful Soup but Scrapy is another free package out there thats popular.
    
2. <font color="blue" size=4>Why do we need to web scrape?</font>
    - Why do we need to webscrape? Most websites only allow you to view data thru a web browser. They don't offer the functionality to save a copy of their data. Manually coping the data could take large amounts of time. Scraping software can automate the process and perform the task within a fraction of the time.
    
3. <font color="blue" size=4>What are some common instances of webscraping?</font>
    - **E-commerce Websites:** Web scrapers can collect the data specially related to the price of a specific product from various e-commerce websites for their comparison.
    - **Content Aggregators:** Web scraping is used widely by content aggregators like news aggregators and job aggregators for providing updated data to their users.
    - **Marketing and Sales Campaigns:** Web scrapers can be used to get the data like emails, phone number etc. for sales and marketing campaigns.
    - **Data for Machine Learning Projects:** Retrieval of data for machine learning projects depends upon web scraping.
    
4. <font color="blue" size=4>How do we scrape?</font>

    What are the basic components of a web page? There are typically around 4 basic components of a web page.

    - html - which contains the main content of a page
    - css - adds styling to make the page pretty
    - js - javascript files add interactivity to pages
    - JPG & PNG - are image formats used to show pictures

<font face="verdana" size="6" color="blue">HTML</font>

Hypertext Markup Language is the expression of webpages. 
Its unlike python however in that it has no ability to rationalize. It can make text italicized or bold; it can create paragraphs; it cannot perform recursion. 

```html
<!DOCTYPE html>  
<html>  
    <!-- This is the syntax for adding helpful comments that will not be rendered to the browser -->
    <head>   
        
    </head>
    
    <body>
        <!-- The following are html elements. There is a good resource for html documentation at the end of this notebook --> 

        <h1>My Heading</h1>
        <p>My Paragraph</p>
        
    </body>

</html>
```

**To extract data using web scraping with python, you need to follow these basic steps:**

1. <font color="blue">Find the URL that you want to scrape</font>
2. <font color="blue">Inspect the Page</font>
3. <font color="blue">Find the data you want to extract</font>
4. <font color="blue">Build a simple scraper</font>
5. <font color="blue">Modify the scraper to get rid of html and extract all the useful data</font>
6. <font color="blue">Store the data in the required format</font>


In [1]:
from bs4 import BeautifulSoup
import requests 
import re  
import pandas as pd

<font face="verdana" size="4" color="blue">Let's scrape some quotes from this website - http://quotes.toscrape.com/</font>



In [11]:
#build a simple soup
url = 'http://quotes.toscrape.com/'

response = requests.get(url)

soup = BeautifulSoup(response.text, 'lxml')

quotes = soup.find_all('span', class_= 'text')

for quote in quotes:
    print(quote.text)


“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
“Try not to become a man of success. Rather become a man of value.”
“It is better to be hated for what you are than to be loved for what you are not.”
“I have not failed. I've just found 10,000 ways that won't work.”
“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
“A day without sunshine is like, you know, night.”


<font face="verdana" size="4" color="blue">Activity - Break off into groups and get the Authors</font>


In [29]:
tags = soup.find_all('div', class_= 'tags')

for tag in tags:
    
    tag = tag.find_all('a', class_ = 'tag')
    
    print(tag.text)




AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

In [33]:
authors = soup.find_all('small', class_= 'author')
tags = soup.find_all('div', class_= 'tags')

for i in range(0,len(quotes)):
    print(authors[i].text)
    print(quotes[i].text)
    quoteTags = tags[i].find_all('a', class_ = 'tag')
    
    for quoteTag in quoteTags:
        print(quoteTag.text)

Albert Einstein
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
change
deep-thoughts
thinking
world
J.K. Rowling
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
abilities
choices
Albert Einstein
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
inspirational
life
live
miracle
miracles
Jane Austen
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
aliteracy
books
classic
humor
Marilyn Monroe
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
be-yourself
inspirational
Albert Einstein
“Try not to become a man of success. Rather become a man of value.”
adulthood
success
value
André Gide
“It is better to be hated for what you are than to be loved for what you are not.”
life
love
Thomas A. Edis