# Web Scraping with Beautiful Soup
Scraping [Books to Scrape](https://books.toscrape.com/) at toscrape.com

**Objective**- toscrape.com have 1000 books in 50 web pages. My objective is to scrape all the 50 pages and save the data as 'csv' file.

**List of Contents:**
1. Importing the Libraries
2. Scraping first page
3. Scraping all the 50 pages
4. Creating the 'csv' file
5. Result     

**Result-** [Books_to_Scrape.csv](https://github.com/abjta1029/sample_data_files/blob/main/Books_to_Scrape.csv) The final 'csv' file.

#### 1. Importing the Libraries:

In [1]:
import requests

In [2]:
import bs4
from bs4 import BeautifulSoup

In [3]:
import csv

#### 2. Scraping first page:
Scraping first page as a example to get the results that further apply on rest of the pages.

In [4]:
res_example=requests.get('https://books.toscrape.com/catalogue/page-1.html')# Used requests to grab the page

In [5]:
soup_example=bs4.BeautifulSoup(res_example.text,'html.parser')

In [6]:
product_example=soup_example(class_='product_pod')

In [7]:
product_example[0]

<article class="product_pod">
<div class="image_container">
<a href="a-light-in-the-attic_1000/index.html"><img alt="A Light in the Attic" class="thumbnail" src="../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg"/></a>
</div>
<p class="star-rating Three">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
<div class="product_price">
<p class="price_color">Â£51.77</p>
<p class="instock availability">
<i class="icon-ok"></i>
    
        In stock
    
</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>
</article>

**Under product_pod class 'Book Title' and 'Book rating' is mentioned**

In [8]:
product_example[0].h3.a.attrs['title'] # title of the book

'A Light in the Attic'

In [9]:
product_example[0].find('p').get('class')[1] # rating of the book

'Three'

**Under product_price class 'Book Price' and 'Book instock availability' is mentioned**

In [10]:
price_example=soup_example(class_='product_price')

In [11]:
price_example[0]

<div class="product_price">
<p class="price_color">Â£51.77</p>
<p class="instock availability">
<i class="icon-ok"></i>
    
        In stock
    
</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>

In [12]:
price_example[0].find(class_='price_color').text.strip('Â') #price of the book

'£51.77'

In [13]:
price_example[0].find(class_='instock availability').get_text(strip=True) # instock availability of the book

'In stock'

#### 3. Now Scraping all the 50 pages together:

In [14]:
url='https://books.toscrape.com/catalogue/page-{}.html'

In [15]:
url.format(5) # to fit the page number in url

'https://books.toscrape.com/catalogue/page-5.html'

In [16]:
my_scrape_file=[]

for n in range(1,51):
    scrape_url=url.format(n)
    res=requests.get(scrape_url) # Used requests to grab the page
    
    soup=bs4.BeautifulSoup(res.text,'html.parser')
    
    product=soup(class_='product_pod')
    price=soup(class_='product_price')
    
    for item1,item2 in zip(product,price):
        name=item1.h3.a.attrs['title']
        price=item2.find(class_='price_color').text.strip('Â')
        rating=item1.find('p').get('class')[1]
        availability=item2.find(class_='instock availability').get_text(strip=True)
       
        my_scrape_file.append([name,price,rating,availability])

In [17]:
my_scrape_file[0:5] 

[['A Light in the Attic', '£51.77', 'Three', 'In stock'],
 ['Tipping the Velvet', '£53.74', 'One', 'In stock'],
 ['Soumission', '£50.10', 'One', 'In stock'],
 ['Sharp Objects', '£47.82', 'Four', 'In stock'],
 ['Sapiens: A Brief History of Humankind', '£54.23', 'Five', 'In stock']]

#### 4. Creating the 'csv' file:

In [18]:
fields=['Title of the Book','Price','Rating','Instock Availability'] # headers of the columns

In [19]:
filename='Books_to_Scrape.csv' # filename of the csv file

In [20]:
with open(filename, 'w',encoding='utf-8', newline='') as csvfile: 
    
    csvwriter = csv.writer(csvfile) # creating a csv writer object 
         
    csvwriter.writerow(fields) # writing the header
    
    csvwriter.writerows(my_scrape_file)#writing the data rows 

#### 5. Result:

In [21]:
import pandas as pd

In [22]:
Books_to_Scrape=pd.read_csv('Books_to_Scrape.csv')

In [23]:
Books_to_Scrape.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Title of the Book     1000 non-null   object
 1   Price                 1000 non-null   object
 2   Rating                1000 non-null   object
 3   Instock Availability  1000 non-null   object
dtypes: object(4)
memory usage: 31.4+ KB


In [24]:
Books_to_Scrape.head()

Unnamed: 0,Title of the Book,Price,Rating,Instock Availability
0,A Light in the Attic,£51.77,Three,In stock
1,Tipping the Velvet,£53.74,One,In stock
2,Soumission,£50.10,One,In stock
3,Sharp Objects,£47.82,Four,In stock
4,Sapiens: A Brief History of Humankind,£54.23,Five,In stock
