## Web Scrapping  for Book store website

In [23]:
import requests
from bs4 import BeautifulSoup
import pandas as pd


I have imported the `requests` library for making HTTP requests, `BeautifulSoup` from the `bs4` library for web scraping, and `pandas` for data manipulation in a single line using the Python `import` statement.

In [7]:

books = []

for i in range(1,20):
  url = f"https://books.toscrape.com/catalogue/page-{i}.html"
  response = requests.get(url)
  response = response.content
  soup = BeautifulSoup(response, 'html.parser')
  ol = soup.find('ol')
  articles = ol.find_all('article', class_='product_pod')
  for article in articles:
    image = article.find('img')
    title = image.attrs['alt']
    starTag = article.find('p')
    star = starTag['class'][1]
    price = article.find('p', class_='price_color').text
    price = float(price[1:])
    books.append([title, star, price])


df = pd.DataFrame(books, columns=['Title', 'Star Rating', 'Price'])
df.to_csv('books.csv')

I wrote a Python script to scrape book information from a website. I looped through multiple pages, extracted details like title, star rating, and price of each book, and stored them in a list. Then, I created a DataFrame using the pandas library and saved it as a CSV file named 'books.csv'. This allows me to analyze and work with the data more easily.

In [8]:
print(df)

                                                 Title Star Rating  Price
0                                 A Light in the Attic       Three  51.77
1                                   Tipping the Velvet         One  53.74
2                                           Soumission         One  50.10
3                                        Sharp Objects        Four  47.82
4                Sapiens: A Brief History of Humankind        Five  54.23
5                                      The Requiem Red         One  22.65
6    The Dirty Little Secrets of Getting Your Dream...        Four  33.34
7    The Coming Woman: A Novel Based on the Life of...       Three  17.93
8    The Boys in the Boat: Nine Americans and Their...        Four  22.60
9                                      The Black Maria         One  52.15
10      Starving Hearts (Triangular Trade Trilogy, #1)         Two  13.99
11                               Shakespeare's Sonnets        Four  20.66
12                                    

In [9]:

pd.set_option('display.max_rows', None)
print(df)


                                                 Title Star Rating  Price
0                                 A Light in the Attic       Three  51.77
1                                   Tipping the Velvet         One  53.74
2                                           Soumission         One  50.10
3                                        Sharp Objects        Four  47.82
4                Sapiens: A Brief History of Humankind        Five  54.23
5                                      The Requiem Red         One  22.65
6    The Dirty Little Secrets of Getting Your Dream...        Four  33.34
7    The Coming Woman: A Novel Based on the Life of...       Three  17.93
8    The Boys in the Boat: Nine Americans and Their...        Four  22.60
9                                      The Black Maria         One  52.15
10      Starving Hearts (Triangular Trade Trilogy, #1)         Two  13.99
11                               Shakespeare's Sonnets        Four  20.66
12                                    

## Transformation of Data

In [10]:
df['Star Rating'] = df['Star Rating'].map({'One': 1, 'Two': 2, 'Three': 3, 'Four': 4, 'Five': 5})
df['Price'] = pd.to_numeric(df['Price'])


In [11]:
print(df)

                                                 Title  Star Rating  Price
0                                 A Light in the Attic            3  51.77
1                                   Tipping the Velvet            1  53.74
2                                           Soumission            1  50.10
3                                        Sharp Objects            4  47.82
4                Sapiens: A Brief History of Humankind            5  54.23
5                                      The Requiem Red            1  22.65
6    The Dirty Little Secrets of Getting Your Dream...            4  33.34
7    The Coming Woman: A Novel Based on the Life of...            3  17.93
8    The Boys in the Boat: Nine Americans and Their...            4  22.60
9                                      The Black Maria            1  52.15
10      Starving Hearts (Triangular Trade Trilogy, #1)            2  13.99
11                               Shakespeare's Sonnets            4  20.66
12                       

In [13]:
expensive_books = df[df['Price'] > 50]



In [14]:
print(expensive_books)

                                                 Title  Star Rating  Price
0                                 A Light in the Attic            3  51.77
1                                   Tipping the Velvet            1  53.74
2                                           Soumission            1  50.10
4                Sapiens: A Brief History of Humankind            5  54.23
9                                      The Black Maria            1  52.15
13   Scott Pilgrim's Precious Little Life (Scott Pi...            5  52.29
15   Our Band Could Be Your Life: Scenes from the A...            3  57.25
18                        Libertarianism for Beginners            2  51.33
25                       Birdsong: A Story in Pictures            3  54.64
27                      Aladdin and His Wonderful Lamp            3  53.13
38                                 Behind Closed Doors            4  52.22
40                      Slow States of Collapse: Poems            3  57.31
46                       

In [20]:
sorted_df = df.sort_values(by='Price', ascending=False)
print(sorted_df)

                                                 Title  Star Rating  Price
366                          The Diary of a Young Girl            3  59.90
133  Thomas Jefferson and the Tripoli Pirates: The ...            1  59.64
362  The Gray Rhino: How to Recognize and Act on th...            4  59.15
301                           Hamilton: The Revolution            3  58.79
379  How to Speak Golf: An Illustrated Guide to Lin...            5  58.32
337  Aristotle and Dante Discover the Secrets of th...            4  58.14
68        The Death of Humanity: and the Case for Life            4  58.11
135  The White Cat and the Monk: A Retelling of the...            4  58.08
309                                           El Deafo            5  57.62
100  Immunity: How Elie Metchnikoff Changed the Cou...            5  57.36
186  I Had a Nice Time And Other Lies...: How to fi...            4  57.36
40                      Slow States of Collapse: Poems            3  57.31
15   Our Band Could Be Yo

In [22]:
avg_price_by_rating = df.groupby('Star Rating')['Price'].mean()
print(avg_price_by_rating)

Star Rating
1    34.876585
2    33.656329
3    34.431600
4    36.850145
5    34.315067
Name: Price, dtype: float64


In [16]:
costliest_books = df.nlargest(1, 'Price')  # Change '1' to the number of top costly books you want to display
print(costliest_books)


                         Title  Star Rating  Price
366  The Diary of a Young Girl            3   59.9
