Web Scraping Task: Books to Scrape
Objective:
The goal of this task is to scrape book data from the website Books to Scrape and store the information in a structured format (CSV or JSON).
Task Breakdown:
Data to Scrape: From each book listed on the website, scrape the following information:
Book Title: The title of the book.
Price: The price of the book.
Availability: Whether the book is in stock or out of stock.
Rating: The star rating of the book (e.g., 3 stars, 5 stars, etc.).
Book URL: The URL of the detailed book page.
Book Image URL
Pagination: The website contains multiple pages of books. You will need to scrape all books by:
Iterating through each page and collecting the above information for all books.
Final Output: Save the scraped data in a structured format:
CSV: Save as books_data.csv, with columns for Title, Price, Availability, Rating, and Book URL.
Alternatively, you can use JSON if you prefer. Ensure the file is well-structured.

In [79]:
import requests
data=requests.get('https://books.toscrape.com/')
page_contents=data.text

In [80]:
page_contents



In [81]:
from bs4 import BeautifulSoup
doc = BeautifulSoup(page_contents,'html.parser')

In [82]:
doc

<!DOCTYPE html>

<!--[if lt IE 7]>      <html lang="en-us" class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>         <html lang="en-us" class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>         <html lang="en-us" class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-us"> <!--<![endif]-->
<head>
<title>
    All products | Books to Scrape - Sandbox
</title>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<meta content="24th Jun 2016 09:29" name="created"/>
<meta content="" name="description"/>
<meta content="width=device-width" name="viewport"/>
<meta content="NOARCHIVE,NOCACHE" name="robots"/>
<!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
<!--[if lt IE 9]>
        <script src="//html5shim.googlecode.com/svn/trunk/html5.js"></script>
        <![endif]-->
<link href="static/oscar/favicon.ico" rel="shortcut icon"/>
<link href="static/oscar/css/styles.css" rel="stylesheet" type="text/css"/>
<link href="s

# Fetching Book Title

In [83]:
def get_book_titles(doc):
    Book_title_tags = doc.find_all('h3')
    Book_titles = []
    for tags in Book_title_tags:
        Book_titles.append(tags.text)
    return Book_titles

In [84]:
get_book_titles(doc)

['A Light in the ...',
 'Tipping the Velvet',
 'Soumission',
 'Sharp Objects',
 'Sapiens: A Brief History ...',
 'The Requiem Red',
 'The Dirty Little Secrets ...',
 'The Coming Woman: A ...',
 'The Boys in the ...',
 'The Black Maria',
 'Starving Hearts (Triangular Trade ...',
 "Shakespeare's Sonnets",
 'Set Me Free',
 "Scott Pilgrim's Precious Little ...",
 'Rip it Up and ...',
 'Our Band Could Be ...',
 'Olio',
 'Mesaerion: The Best Science ...',
 'Libertarianism for Beginners',
 "It's Only the Himalayas"]

# Fetching Book price

In [85]:
def get_Book_price(doc):
    price_title_tags=doc.find_all('p',class_='price_color')
    book_price=[]
    for price in price_title_tags:
        book_price.append(price.text.replace('Â',''))
    return book_price

In [86]:
get_Book_price(doc)

['£51.77',
 '£53.74',
 '£50.10',
 '£47.82',
 '£54.23',
 '£22.65',
 '£33.34',
 '£17.93',
 '£22.60',
 '£52.15',
 '£13.99',
 '£20.66',
 '£17.46',
 '£52.29',
 '£35.02',
 '£57.25',
 '£23.88',
 '£37.59',
 '£51.33',
 '£45.17']

# Book Availability

In [87]:
def get_book_availability(doc):
    book_availability_tags=doc.find_all('p',class_ = 'instock availability')
    book_available=[]
    for avail in book_availability_tags:
        book_available.append(avail.text.strip())
    return book_available


In [88]:
get_book_availability(doc)

['In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock',
 'In stock']

# Fetching Book rating

In [89]:
def get_book_rating(doc):
    Book_rating=[]
    for book in doc.find_all('article', class_='product_pod'):
    
        rating=book.find('p',class_='star-rating')['class'][1]
        Book_rating.append(rating)
    return Book_rating

In [90]:
get_book_rating(doc)

['Three',
 'One',
 'One',
 'Four',
 'Five',
 'One',
 'Four',
 'Three',
 'Four',
 'One',
 'Two',
 'Four',
 'Five',
 'Five',
 'Five',
 'Three',
 'One',
 'One',
 'Two',
 'Two']

# Book URL

In [91]:
def get_book_url(doc):
    book_url_tags=[]
    for book in doc.find_all('article', class_='product_pod'):
       product_page_url = book.h3.a.get('href')
       book_url_tags.append(product_page_url)
    return book_url_tags


In [92]:
get_book_url(doc)

['catalogue/a-light-in-the-attic_1000/index.html',
 'catalogue/tipping-the-velvet_999/index.html',
 'catalogue/soumission_998/index.html',
 'catalogue/sharp-objects_997/index.html',
 'catalogue/sapiens-a-brief-history-of-humankind_996/index.html',
 'catalogue/the-requiem-red_995/index.html',
 'catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html',
 'catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html',
 'catalogue/the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html',
 'catalogue/the-black-maria_991/index.html',
 'catalogue/starving-hearts-triangular-trade-trilogy-1_990/index.html',
 'catalogue/shakespeares-sonnets_989/index.html',
 'catalogue/set-me-free_988/index.html',
 'catalogue/scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html',
 'catalogue/rip-it-up-and-start-again_986/index.html',
 'catalogue/our-band-could-be-your-life-scene

# Book Image URL

In [93]:
def get_image_url(doc):
    image_url_tags=[]
    for book in doc.find_all('article', class_='product_pod'):
        image_url = book.img.get('src')
        image_url_tags.append(image_url)
    return image_url_tags



In [94]:
get_image_url(doc)

['media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg',
 'media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg',
 'media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg',
 'media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg',
 'media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg',
 'media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg',
 'media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg',
 'media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg',
 'media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg',
 'media/cache/58/46/5846057e28022268153beff6d352b06c.jpg',
 'media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg',
 'media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg',
 'media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg',
 'media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg',
 'media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg',
 'media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg',
 'media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg

# Scrape Multiple Pages

In [95]:
import requests
from bs4 import BeautifulSoup
import pandas as pd


def get_doc(url):
    response = requests.get(url)
    doc = BeautifulSoup(response.text,'html.parser')
    if response.status_code != 200:
        raise Exception('Failed to load page {}'.format(response))
    return doc

def scrape_multiple_pages(n):

    URL = 'https://books.toscrape.com/catalogue/page-'
    b_title,b_price,b_stock,b_rating,b_url,b_imgurl=[],[],[],[],[],[]
    for page in range(1,n+1):
        doc = get_doc(URL + str(page)+ '.html')
        b_title.extend(get_book_titles(doc))
        b_price.extend(get_Book_price(doc))
        b_stock.extend(get_book_availability(doc))
        b_rating.extend(get_book_rating(doc))
        b_url.extend(get_book_url(doc))
        b_imgurl.extend(get_image_url(doc))

    book_dict1={
        'TITLE':b_title,
        'PRICE':b_price,
        'AVAILABILITY':b_stock,
        'RATING':b_rating,
        'BOOKURL':b_url,
        'IMAGEURL':b_imgurl }
        
    return pd.DataFrame(book_dict1)



In [98]:
scrape_multiple_pages(51)

Exception: Failed to load page <Response [404]>

In [96]:
scrape_multiple_pages(5)

Unnamed: 0,TITLE,PRICE,AVAILABILITY,RATING,BOOKURL,IMAGEURL
0,A Light in the ...,£51.77,In stock,Three,a-light-in-the-attic_1000/index.html,../media/cache/2c/da/2cdad67c44b002e7ead0cc356...
1,Tipping the Velvet,£53.74,In stock,One,tipping-the-velvet_999/index.html,../media/cache/26/0c/260c6ae16bce31c8f8c95dadd...
2,Soumission,£50.10,In stock,One,soumission_998/index.html,../media/cache/3e/ef/3eef99c9d9adef34639f51066...
3,Sharp Objects,£47.82,In stock,Four,sharp-objects_997/index.html,../media/cache/32/51/3251cf3a3412f53f339e42cac...
4,Sapiens: A Brief History ...,£54.23,In stock,Five,sapiens-a-brief-history-of-humankind_996/index...,../media/cache/be/a5/bea5697f2534a2f86a3ef27b5...
...,...,...,...,...,...,...
95,Lumberjanes Vol. 3: A ...,£19.92,In stock,Two,lumberjanes-vol-3-a-terrible-plan-lumberjanes-...,../media/cache/5f/b1/5fb1bf88dcfda795606745ce3...
96,"Layered: Baking, Building, and ...",£40.11,In stock,One,layered-baking-building-and-styling-spectacula...,../media/cache/98/d1/98d1c979c4bac9e147a671894...
97,Judo: Seven Steps to ...,£53.90,In stock,Two,judo-seven-steps-to-black-belt-an-introductory...,../media/cache/5f/52/5f52b1bc6d45daab2e330c744...
98,Join,£35.67,In stock,Five,join_902/index.html,../media/cache/93/63/9363f0065fbad5689f44fcf6e...


In [97]:
scrape_multiple_pages(50).to_csv("Bookscrape.csv")

In [99]:
scrape_multiple_pages(50).to_json("Bookscrape.json")

In [6]:
import sqlite3
import pandas as pd

df = pd.read_csv("C:\\Users\\D E L L\\Desktop\\shenll Technology\\Bookscrape.csv")
print(df)

     Unnamed: 0                             TITLE   PRICE AVAILABILITY RATING  \
0             0                A Light in the ...  £51.77     In stock  Three   
1             1                Tipping the Velvet  £53.74     In stock    One   
2             2                        Soumission  £50.10     In stock    One   
3             3                     Sharp Objects  £47.82     In stock   Four   
4             4      Sapiens: A Brief History ...  £54.23     In stock   Five   
..          ...                               ...     ...          ...    ...   
995         995  Alice in Wonderland (Alice's ...  £55.53     In stock    One   
996         996    Ajin: Demi-Human, Volume 1 ...  £57.06     In stock   Four   
997         997         A Spy's Devotion (The ...  £16.97     In stock   Five   
998         998           1st to Die (Women's ...  £53.98     In stock    One   
999         999           1,000 Places to See ...  £26.08     In stock   Five   

                           

In [7]:
conn=sqlite3.connect("BookstoScrape.db")

create_sql="CREATE TABLE IF NOT EXISTS books ( TITLE TEXT NOT NULL,PRICE TEXT,AVAILABILITY TEXT,RATING TEXT,BOOKURL TEXT,IMAGEURL TEXT);"

cursor=conn.cursor()
cursor.execute(create_sql)

<sqlite3.Cursor at 0x243fc0ffb40>