Inorder to perform Web Scrapping, the first step is to install the **requests** library which is used for making HTTP requests in Python and **beautifulsoup4** library ehich is used for parsing HTML and XML documents. It is used for scrape and extract data from web pages by parsing the HTML.

In [8]:
!pip install requests==2.32.3
!pip install beautifulsoup4



Now we are importing the libraries, requests, beautifulsoup and urlopen to get the raw HTML data 

In [3]:
import requests
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen

Defining a variable named nyu_times_url and assining it with the Website link that we are going to scrap

In [5]:
nyu_times_url = "https://www.nytimes.com/books/best-sellers/combined-print-and-e-book-fiction/" 
nyu_times_url

'https://www.nytimes.com/books/best-sellers/combined-print-and-e-book-fiction/'

Now 'urlclient' opens a connection to the URL which is specified in the 'nyu_times_url' variable and a response object is assigned to the variable 'urlclient'

In [6]:
urlclient = urlopen(nyu_times_url)

In [7]:
nyu_times_page = urlclient.read()

Now we are initializing the Beautifulsoup object and using the 'html.parser' for parsing the HTML Webpage.

In [8]:
nyu_times_html = bs(nyu_times_page, 'html.parser')

This code utilized the BeautifulSoup's 'find_all' function to search the Parsed HTML for all the '\<li>' element with a target class. 
This will return a list of these elements and find all the fiction book details in that specific New York Times Sellers web page.
Then the result will get stored in the 'fiction_books' variable.

In [14]:
fiction_books = nyu_times_html.find_all("li", {"class":"css-1m0jikr"})
fiction_books


[<li class="css-1m0jikr" id="QmVzdFNlbGxlckJvb2s6MTY2ODAyNjI1Mi05NzgxNjY4MDI2MjUw"><article class="css-1u6k25n" itemprop="itemListElement" itemscope="" itemtype="https://schema.org/Book"><div class="css-xe4cfy"><a><p class="css-1o26r9v">New this week</p><h3 class="css-5pe77f" itemprop="name">DAYDREAM</h3><p class="css-hjukut" itemprop="author">by Hannah Grace</p><p class="css-heg334" itemprop="publisher">Atria</p><p class="css-14lubdp" itemprop="description">The third book in the Maple Hills series. A college student with writer’s block offers to tutor the captain of the hockey team.</p></a><div class="css-1cpfh5o"><div class="css-fugswm"><div class="css-79elbk"><button aria-expanded="false" aria-haspopup="true" class="css-80zux2" type="button">Buy<span aria-hidden="true"> ▾</span></button><ul aria-label="Links to Book Retailers" class="css-8dud8s" hidden=""><li><a class="css-114t425" href="https://www.amazon.com/dp/1668026252?tag=thenewyorktim-20" rel="noopener noreferrer" target="_bl

Then, we need to extract the book names, author names, publisher names, and descriptions from each '\<li>' element in 'fiction_books'.
The 'find' function of BeautifulSoup is used to get specific HTML tags with the designated class and all these extracted texts are combined together and stored in their corresponding lists. 


In [23]:
book_names = []
author_names = []
publisher_names = []
description_details = []

for fiction in fiction_books:
    book_name = fiction.find("h3", {"class": "css-5pe77f"}).text
    author_name = fiction.find("p",{"class":"css-hjukut"}).text
    publisher_name = fiction.find("p",{"class":"css-heg334"}).text
    description_detail = fiction.find("p",{"class":"css-14lubdp"}).text
    book_names.append(book_name)
    author_names.append(author_name)
    publisher_names.append(publisher_name)
    description_details.append(description_detail)

print(book_names)
print(author_names)
print(publisher_names)
print(description_details)

['DAYDREAM', 'IT ENDS WITH US', 'IT STARTS WITH US', 'THE WOMEN', 'A COURT OF THORNS AND ROSES', 'THE HOUSEMAID', 'IRON FLAME', 'DEMON COPPERHEAD', 'THE DARK WIVES', 'BY ANY OTHER NAME', 'THE GOD OF THE WOODS', 'A COURT OF MIST AND FURY', 'THE WEDDING PEOPLE', 'THE HOUSEMAID IS WATCHING', "THE HOUSEMAID'S SECRET"]
['by Hannah Grace', 'by Colleen Hoover', 'by Colleen Hoover', 'by Kristin Hannah', 'by Sarah J. Maas', 'by Freida McFadden', 'by Rebecca Yarros', 'by Barbara Kingsolver', 'by Ann Cleeves', 'by Jodi Picoult', 'by Liz Moore', 'by Sarah J. Maas', 'by Alison Espach', 'by Freida McFadden', 'by Freida McFadden']
['Atria', 'Atria', 'Atria', "St. Martin's", 'Bloomsbury', 'Grand Central', 'Red Tower', 'Harper Perennial', 'Minotaur', 'Ballantine', 'Riverhead', 'Bloomsbury', 'Holt', 'Poisoned Pen', 'Mobius']
['The third book in the Maple Hills series. A college student with writer’s block offers to tutor the captain of the hockey team.', 'A battered wife raised in a violent home attempt

In [26]:
modified_author_names = [i.replace('by ','') for i in author_names]
modified_author_names

['Hannah Grace',
 'Colleen Hoover',
 'Colleen Hoover',
 'Kristin Hannah',
 'Sarah J. Maas',
 'Freida McFadden',
 'Rebecca Yarros',
 'Barbara Kingsolver',
 'Ann Cleeves',
 'Jodi Picoult',
 'Liz Moore',
 'Sarah J. Maas',
 'Alison Espach',
 'Freida McFadden',
 'Freida McFadden']

Now these lists are converted into Dictionaries for converting it into a dataframe.

In [28]:
fiction = {'Book Name': book_names, 'Author': modified_author_names, 'Publisher': publisher_names, 'Description': description_details}
fiction

{'Book Name': ['DAYDREAM',
  'IT ENDS WITH US',
  'IT STARTS WITH US',
  'THE WOMEN',
  'A COURT OF THORNS AND ROSES',
  'THE HOUSEMAID',
  'IRON FLAME',
  'DEMON COPPERHEAD',
  'THE DARK WIVES',
  'BY ANY OTHER NAME',
  'THE GOD OF THE WOODS',
  'A COURT OF MIST AND FURY',
  'THE WEDDING PEOPLE',
  'THE HOUSEMAID IS WATCHING',
  "THE HOUSEMAID'S SECRET"],
 'Author': ['Hannah Grace',
  'Colleen Hoover',
  'Colleen Hoover',
  'Kristin Hannah',
  'Sarah J. Maas',
  'Freida McFadden',
  'Rebecca Yarros',
  'Barbara Kingsolver',
  'Ann Cleeves',
  'Jodi Picoult',
  'Liz Moore',
  'Sarah J. Maas',
  'Alison Espach',
  'Freida McFadden',
  'Freida McFadden'],
 'Publisher': ['Atria',
  'Atria',
  'Atria',
  "St. Martin's",
  'Bloomsbury',
  'Grand Central',
  'Red Tower',
  'Harper Perennial',
  'Minotaur',
  'Ballantine',
  'Riverhead',
  'Bloomsbury',
  'Holt',
  'Poisoned Pen',
  'Mobius'],
 'Description': ['The third book in the Maple Hills series. A college student with writer’s block of

Converting the dictionaries to a dataframe and then converting the dataframe to a '.csv' file

In [30]:
!pip install pandas
import pandas as pd
df_fiction = pd.DataFrame(fiction)
df_fiction

Collecting pandas
  Downloading pandas-2.2.2-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.1.1-cp312-cp312-win_amd64.whl.metadata (59 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.2-cp312-cp312-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ---------------------- ----------------- 6.6/11.5 MB 36.6 MB/s eta 0:00:01
   ---------------------------------------- 11.5/11.5 MB 37.8 MB/s eta 0:00:00
Downloading numpy-2.1.1-cp312-cp312-win_amd64.whl (12.6 MB)
   ---------------------------------------- 0.0/12.6 MB ? eta -:--:--
   ------------------------------ --------- 9.4/12.6 MB 45.2 MB/s eta 0:00:01
   ---------------------------------------- 12.6/12.6 MB 41.4 MB/s eta 0:00:00
Downloading tzdata-2024.1-py2.py3-none-any.whl (345 kB)
Installing collected packages: tzdata, numpy, pandas
Successfully 

Unnamed: 0,Book Name,Author,Publisher,Description
0,DAYDREAM,Hannah Grace,Atria,The third book in the Maple Hills series. A co...
1,IT ENDS WITH US,Colleen Hoover,Atria,A battered wife raised in a violent home attem...
2,IT STARTS WITH US,Colleen Hoover,Atria,"In the sequel to “It Ends With Us,” Lily deals..."
3,THE WOMEN,Kristin Hannah,St. Martin's,"In 1965, a nursing student follows her brother..."
4,A COURT OF THORNS AND ROSES,Sarah J. Maas,Bloomsbury,"After killing a wolf in the woods, Feyre is ta..."
5,THE HOUSEMAID,Freida McFadden,Grand Central,Troubles surface when a woman looking to make ...
6,IRON FLAME,Rebecca Yarros,Red Tower,The second book in the Empyrean series. Violet...
7,DEMON COPPERHEAD,Barbara Kingsolver,Harper Perennial,Winner of a 2023 Pulitzer Prize for fiction. A...
8,THE DARK WIVES,Ann Cleeves,Minotaur,The 11th book in the Vera Stanhope series. Ver...
9,BY ANY OTHER NAME,Jodi Picoult,Ballantine,A young woman’s play about her ancestor Emilia...


In [31]:
df_fiction.to_csv('Fiction_books.csv', index=False)

In [34]:
fiction_data = pd.read_csv("Fiction_books.csv")
fiction_data
                

Unnamed: 0,Book Name,Author,Publisher,Description
0,DAYDREAM,Hannah Grace,Atria,The third book in the Maple Hills series. A co...
1,IT ENDS WITH US,Colleen Hoover,Atria,A battered wife raised in a violent home attem...
2,IT STARTS WITH US,Colleen Hoover,Atria,"In the sequel to “It Ends With Us,” Lily deals..."
3,THE WOMEN,Kristin Hannah,St. Martin's,"In 1965, a nursing student follows her brother..."
4,A COURT OF THORNS AND ROSES,Sarah J. Maas,Bloomsbury,"After killing a wolf in the woods, Feyre is ta..."
5,THE HOUSEMAID,Freida McFadden,Grand Central,Troubles surface when a woman looking to make ...
6,IRON FLAME,Rebecca Yarros,Red Tower,The second book in the Empyrean series. Violet...
7,DEMON COPPERHEAD,Barbara Kingsolver,Harper Perennial,Winner of a 2023 Pulitzer Prize for fiction. A...
8,THE DARK WIVES,Ann Cleeves,Minotaur,The 11th book in the Vera Stanhope series. Ver...
9,BY ANY OTHER NAME,Jodi Picoult,Ballantine,A young woman’s play about her ancestor Emilia...
