# <center>Introduction to Web Scraping with Python </center>

Lead Club Data Science and AI: Christophe **HOUNWANOU**

<center > <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.6UIaApn54TOkhOQ607z-cwHaBx%26pid%3DApi&f=1&ipt=e942967549d97d5237e8669e8dbe8efece6c46a6dac545ca6d81f3ea96b93609&ipo=images" /></center>

## Why Python for Web Scraping?

Python is a versatile programming language that can be used for a wide range of tasks, including web scraping. It has a large and active community that has developed a wide range of libraries and frameworks for web scraping, such as Beautiful Soup and Scrapy. These libraries make it easy to extract data from websites, even if you have little to no experience with programming.

Python also has powerful built-in tools that are useful for web scraping, such as regular expressions and the requests library. Regular expressions are a way to search for patterns in text, and they can be used to extract specific data from websites. The requests library is a way to send HTTP requests in Python, and it is used to access the HTML code of a website.



## Introdcution to Web Scraping with Python

Web scraping refers to the process of extracting data from websites. This technique allows us to gather information from various sources on the internet and use it for analysis, research, or other applications. Python, with its rich ecosystem of libraries, is a popular choice for web scraping due to its simplicity and flexibility.

In this tutorial, we will guide you through the process of creating a web scraping project from scratch using Python. We will cover the necessary Python libraries, setting up the project environment, foundational steps, advanced functionalities, and best practices related to web scraping with Python.

### Prerequisites

Before we begin, make sure you have Python installed on your system. Additionally, we will be using the following libraries, so ensure they are installed as well:

- requests
- beautifulsoup4
- lxml

You can install these libraries using pip:

```python
pip install requests beautifulsoup4 lxml
```

Now that we have the prerequisites in place, let’s dive into the steps for creating a web scraping project using Python.

<img src = "https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.TIQ6uAElt4xpseu4vtJHxQHaEq%26pid%3DApi&f=1&ipt=d810af30fdcb6fd1adf6da5282d3a821c16d53c61c9e42bfa2b764644630b3cf&ipo=images" />


### Step 1: Making a Request to a Website

In this step, we will use the requests library to make a GET request to a website and retrieve its HTML content. This will serve as the basis for our web scraping.

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse4.mm.bing.net%2Fth%3Fid%3DOIP.dFmTpTvN6HaWwPgyISNuIAHaC2%26pid%3DApi&f=1&ipt=71ba1109ccd1ed258b894d104aba43e1fd7d0d72b4119722da4cd62665aaa18f&ipo=images" />

```python
import requests
url = "http://www.pdfdrive.com/category/1"

response = requests.get(url)

if response.status_code == 200:
    html_content = response.content
    # Now we have the HTML content of the website
else:
    print('Failed to retrieve the page')

```

In [1]:
import requests
url = "http://www.pdfdrive.com/category/1"

response = requests.get(url)
if response.status_code == 200:
    html_content = response.content
    print(html_content)
    # Now we have the HTML content of the website
else:
    print('Failed to retrieve the page')

b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>\n<meta content="width=device-width,initial-scale=1.0,minimum-scale=1.0,maximum-scale=1.0,user-scalable=no,target-densitydpi=160" name="viewport"/>\n<meta name="google-site-verification" content="CEhb1NNHLQ4klcuczJvERcX-C7xpyXPcwAB29LLiQgY"/>\n<title>Free Art Books - PDF Drive</title>\n<link rel="dns-prefetch" href="//cdn.pdfdrive.com"/>\n<meta name="yandex-verification" content="587d3e9c8ef97ef2"/>\n<link href="https://stackpath.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">\n<link href="/assets/css/A.main.css,,qv3.84+responsive.css,,qv3.84+3rdparty.css,,qv3.84,Mcc.jmwi_jqbNr.css.pagespeed.cf.ipI658D9Ma.css" re

### Step 2: Parsing HTML Content with BeautifulSoup

Once we have the HTML content of the website, we can use the beautifulsoup4 library to parse the HTML and extract the data we need.

```python
from bs4 import BeautifulSoup

# Parse the HTML content
soup = BeautifulSoup(html_content, 'lxml')

# Find elements by tag name
titles = soup.find_all('h2')
for title in titles:
    print(title.text)
```

In [6]:
from bs4 import BeautifulSoup

# Parse the HTML content
soup = BeautifulSoup(html_content, 'lxml')
title = soup.find('h2')
print(title.text)


# Find elements by tag name
titlesbook = soup.find_all('h2')
for title in titlesbook:
    print(title.text)

booksizes = soup.find_all('span', attrs={"class": "fi-pagecount "})
for booksize in booksizes:
    print(booksize.text)

Drawing the Head and Hands by Andrew Loomis
Drawing the Head and Hands by Andrew Loomis
Building Construction Handbook
Drawing Cartoons & Comics for Dummies
Art of Drawing the Human Body
How to Draw and Paint Anatomy
Mastering Photoshop for Web Design
Pencil Drawing Techniques
Children's Illustrated Dictionary
Digital Colour in Graphic Design
figure drawing – design and invention
Art Models 6: The Female Figure in Shadow and Light
Textbook of Engineering Drawing
Piano for Beginners 6th ED
Drawing Comics the Marvel Way
Draw 50 Animals
Estimating in Building Construction
Art Models 8: Practical Poses for the Working Artist
500 Poses for Photographing Women
Perspective Made Easy
Fun With A Pencil by Andrew Loomis - Alex Hays


### Step 3: Extracting Data

Now that we can parse the HTML content, we can extract specific data such as links, images, or text.


In [7]:
# getting all items from website

for book in soup.find_all("div", attrs={"class": "row"}):
    title = book.find("h2").get_text()
    url_book = "http://www.pdfdrive.com" + book.find("a", href=True)["href"]
    same_book = BeautifulSoup(requests.get(url_book).content, "lxml")
    size_book = book.find("span", {"class": "fi-size"}).get_text()
    year = book.find("span", {"class":"fi-year"}).get_text()
    number_pages = book.find("span", {"class": "fi-pagecount"}).get_text()
    img_book = book.find("img", {"class": "img-zoom"})["src"]
    langage_book = same_book.find_all("span", {"class":"info-green"})[-1].get_text()
    tags = [tag.get_text().split("?>")[-1] for tag in same_book.find_all('div', {'class': 'ebook-tags'})[0].find_all('a')]
    print(f"title: {title}, url_book: {url_book}, size_book: {size_book}, year: {year}, number_pages: {number_pages}, img_book: {img_book}, langage_book: {langage_book}, tags: {tags}")
    print("-"*114)


title: Drawing the Head and Hands by Andrew Loomis, url_book: http://www.pdfdrive.com/drawing-the-head-and-hands-by-andrew-loomis-e10182633.html, size_book: 36.13 MB, year: 2002, number_pages: 141 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/efb/efb1e479d41d511bdafbf9899f1bfdb4-s.jpg, langage_book: English, tags: ['Painting & Drawing']
------------------------------------------------------------------------------------------------------------------
title: Building Construction Handbook, url_book: http://www.pdfdrive.com/building-construction-handbook-e26240750.html, size_book: 45.32 MB, year: 2010, number_pages: 841 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/3cc/3cc1442d279dab955e40ad9e9e890b46-s.jpg, langage_book: English, tags: ['Most Popular', 'Architecture']
------------------------------------------------------------------------------------------------------------------
title: Drawing Cartoons & Comics for Dummies, url_book: http://www.pdfdrive.com/drawing-

### Step 4: Handling Pagination


In many cases, the data you want to extract from a website is spread across multiple pages using pagination. This part of the guide will cover how to efficiently handle pagination during web scraping.

The first step is identifying the links to the next pages. To do this, use selectors to locate the pagination element in the HTML structure, and then extract the link for the next page. Note that pagination structures can vary significantly from one website to another. In this guide, we'll cover the essential techniques you need to manage pagination effectively across different sites.


In [8]:
# get the next page url
datas = []
while True:    
    next_page = soup.find("a", {"rel":"next"}).get("href") #get the next page in pagination
    for book in soup.find_all("div", attrs={"class": "row"}):
        title = book.find("h2").get_text()
        url_book = "http://www.pdfdrive.com" + book.find("a", href=True)["href"]
        same_book = BeautifulSoup(requests.get(url_book).content, "lxml")
        size_book = book.find("span", {"class": "fi-size"}).get_text()
        year = book.find("span", {"class":"fi-year"}).get_text()
        number_pages = book.find("span", {"class": "fi-pagecount"}).get_text()
        img_book = book.find("img", {"class": "img-zoom"})["src"]
        langage_book = same_book.find_all("span", {"class":"info-green"})[-1].get_text()
        tags = [tag.get_text().split("?>")[-1] for tag in same_book.find_all('div', {'class': 'ebook-tags'})[0].find_all('a')]
        print(f"title: {title}, url_book: {url_book}, size_book: {size_book}, year: {year}, number_pages: {number_pages}, img_book: {img_book}, langage_book: {langage_book}, tags: {tags}")
        print("-"*114)
        data = {"title": title, "url_book": url_book, "number_pages": number_pages, "img_book": img_book, "langage_book": langage_book, "tags": tags}
        datas.append(data)
    if next_page == "javascript:void(0)": #check if it's last page and stop
        break
    url_next_page =  "http://www.pdfdrive.com" + next_page # contruct the url of next page for scraping
    soup = BeautifulSoup(requests.get(url_next_page).content, "lxml") #build the soup of the next page

title: Drawing the Head and Hands by Andrew Loomis, url_book: http://www.pdfdrive.com/drawing-the-head-and-hands-by-andrew-loomis-e10182633.html, size_book: 36.13 MB, year: 2002, number_pages: 141 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/efb/efb1e479d41d511bdafbf9899f1bfdb4-s.jpg, langage_book: English, tags: ['Painting & Drawing']
------------------------------------------------------------------------------------------------------------------
title: Building Construction Handbook, url_book: http://www.pdfdrive.com/building-construction-handbook-e26240750.html, size_book: 45.32 MB, year: 2010, number_pages: 841 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/3cc/3cc1442d279dab955e40ad9e9e890b46-s.jpg, langage_book: English, tags: ['Most Popular', 'Architecture']
------------------------------------------------------------------------------------------------------------------
title: Drawing Cartoons & Comics for Dummies, url_book: http://www.pdfdrive.com/drawing-

title: Typography, url_book: http://www.pdfdrive.com/typography-e19637783.html, size_book: 11.98 MB, year: 2011, number_pages: 242 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/93a/93a69d8669e0be1ded1afdb0494153ff-s.jpg, langage_book: English, tags: ['Graphic Design']
------------------------------------------------------------------------------------------------------------------
title: Portrait Photography, url_book: http://www.pdfdrive.com/portrait-photography-e30616734.html, size_book: 10.93 MB, year: 2012, number_pages: 329 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/476/4764a52395552bc95b5292341a12ce7b-s.jpg, langage_book: English, tags: ['Photography']
------------------------------------------------------------------------------------------------------------------
title: Alfred's Essentials of Music Theory, url_book: http://www.pdfdrive.com/alfreds-essentials-of-music-theory-e19040643.html, size_book: 16.79 MB, year: 2013, number_pages: 117 Pages, img_book

title: The Green Beauty Guide, url_book: http://www.pdfdrive.com/the-green-beauty-guide-e19440647.html, size_book: 1.99 MB, year: 2013, number_pages: 290 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/0b2/0b2e14396d533a7110a2a1ee19a9d498-s.jpg, langage_book: English, tags: ['Fashion & Beauty', "Editor's Picks"]
------------------------------------------------------------------------------------------------------------------
title: Handbook of Medicinal Herbs, url_book: http://www.pdfdrive.com/handbook-of-medicinal-herbs-e50341.html, size_book: 7.72 MB, year: 2006, number_pages: 893 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/e41/e419ced29b4cebed221038a004c9744f-s.jpg, langage_book: English, tags: ['Fashion & Beauty', 'Food & Nutrition', 'Medical']
------------------------------------------------------------------------------------------------------------------
title: Photoshop for Dummies, url_book: http://www.pdfdrive.com/photoshop-for-dummies-e7451134.html, size_

title: Bobbi Brown Makeup Manual, url_book: http://www.pdfdrive.com/bobbi-brown-makeup-manual-e6484073.html, size_book: 9.04 MB, year: 2012, number_pages: 156 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/d78/d787294d1bcbae3caa913561498d967a-s.jpg, langage_book: English, tags: ['Fashion & Beauty']
------------------------------------------------------------------------------------------------------------------
title: Graphic Design Basics, url_book: http://www.pdfdrive.com/graphic-design-basics-e2036342.html, size_book: 4.58 MB, year: 2002, number_pages: 74 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/6a3/6a30cde430c2912d7fb2d1588ccfc681-s.jpg, langage_book: German, tags: ['Graphic Design']
------------------------------------------------------------------------------------------------------------------
title: Sacred Mathematics: Japanese Temple Geometry, url_book: http://www.pdfdrive.com/sacred-mathematics-japanese-temple-geometry-e13737972.html, size_book: 9.93 M

title: Keyboard Master Class - Tom Brooks Music, url_book: http://www.pdfdrive.com/keyboard-master-class-tom-brooks-music-e7749127.html, size_book: 15.66 MB, year: 2012, number_pages: 188 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/107/107e5a6733aa8959919d5cd372266882-s.jpg, langage_book: English, tags: ['Music']
------------------------------------------------------------------------------------------------------------------
title: Yoga as Therapeutic Exercise, url_book: http://www.pdfdrive.com/yoga-as-therapeutic-exercise-e18918868.html, size_book: 36.36 MB, year: 2011, number_pages: 250 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/a23/a231f30f0ed9b8cc1dd2b8f3c8af21ee-s.jpg, langage_book: English, tags: ['Most Popular', 'Art']
------------------------------------------------------------------------------------------------------------------
title: Graphic Design & Printing Technology, url_book: http://www.pdfdrive.com/graphic-design-printing-technology-e33533658

title: The New Typography, url_book: http://www.pdfdrive.com/the-new-typography-e33462790.html, size_book: 12.16 MB, year: 2016, number_pages: 280 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/f8f/f8f17c10730e0779421156cf16f8c1a6-s.jpg, langage_book: English, tags: ['Graphic Design']
------------------------------------------------------------------------------------------------------------------
http://www.pdfdrive.com/category/1/p7/
/category/1/p8/
title: Architectural Design, url_book: http://www.pdfdrive.com/architectural-design-e24757072.html, size_book: 14.39 MB, year: 2010, number_pages: 186 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/8e6/8e6bab0df15c9da987447186491c7f11-s.jpg, langage_book: English, tags: ['Architecture']
------------------------------------------------------------------------------------------------------------------
title: Building Acoustics, url_book: http://www.pdfdrive.com/building-acoustics-e33493000.html, size_book: 4.64 MB, year: 2

title: Guide to Head and Shoulders Portrait Photography, url_book: http://www.pdfdrive.com/guide-to-head-and-shoulders-portrait-photography-e14826955.html, size_book: 8.19 MB, year: 2009, number_pages: 128 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/099/099c2f49f2a1886ab12d24327121760d-s.jpg, langage_book: English, tags: ['Art']
------------------------------------------------------------------------------------------------------------------
title: Close-Up and Macro Photography, url_book: http://www.pdfdrive.com/close-up-and-macro-photography-e27530242.html, size_book: 11.54 MB, year: 2014, number_pages: 317 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/c7f/c7ffbadc232b1faaf14942e784d77e32-s.jpg, langage_book: English, tags: ['Photography']
------------------------------------------------------------------------------------------------------------------
http://www.pdfdrive.com/category/1/p8/
/category/1/p9/
title: Introduction The Fashion Business: Theory, Practi

title: Basic Woodworking, url_book: http://www.pdfdrive.com/basic-woodworking-e3066688.html, size_book: 1.58 MB, year: 2013, number_pages: 76 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/f5d/f5d1c1b4bc269d81c68254a52544158c-s.jpg, langage_book: English, tags: ["Editor's Picks", 'Craft & Hobbies']
------------------------------------------------------------------------------------------------------------------
title: Pencil Sketching 2nd Edition, url_book: http://www.pdfdrive.com/pencil-sketching-2nd-edition-e4264513.html, size_book: 6.37 MB, year: 2001, number_pages: 129 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/7c7/7c783c9693a86be630b0a8d42efb395e-s.jpg, langage_book: English, tags: ['Painting & Drawing']
------------------------------------------------------------------------------------------------------------------
title: Fine Paintings and Sculpture, url_book: http://www.pdfdrive.com/fine-paintings-and-sculpture-e23526303.html, size_book: 16.14 MB, year: 2

title: Food Styling for Photographers, url_book: http://www.pdfdrive.com/food-styling-for-photographers-e18758944.html, size_book: 19.57 MB, year: 2008, number_pages: 272 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/b7c/b7c7cda5d2531616b3df4c4e5bdb4db0-s.jpg, langage_book: English, tags: ['Food & Nutrition', 'Photography']
------------------------------------------------------------------------------------------------------------------
title: European Drawings - 1, Catalogue of the Collections, url_book: http://www.pdfdrive.com/european-drawings-1-catalogue-of-the-collections-e12580699.html, size_book: 33.32 MB, year: 2013, number_pages: 370 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/eb8/eb8dec0d67d4e551310e190350be731e-s.jpg, langage_book: English, tags: ['Painting & Drawing']
------------------------------------------------------------------------------------------------------------------
title: The Drawings of Michelangelo and His Followers in the Ashmolean M

title: The Theory and Technique of Electronic Music, url_book: http://www.pdfdrive.com/the-theory-and-technique-of-electronic-music-e24439110.html, size_book: 1.19 MB, year: 2014, number_pages: 337 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/e09/e095bef58045602952bf223e0e32de08-s.jpg, langage_book: English, tags: ['Music']
------------------------------------------------------------------------------------------------------------------
title: Paintings, Prints, Drawings and Sculpture, url_book: http://www.pdfdrive.com/paintings-prints-drawings-and-sculpture-e21415789.html, size_book: 6.24 MB, year: 2016, number_pages: 50 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/28b/28b731d40b13565da8d92b547bd6c99c-s.jpg, langage_book: English, tags: ['Painting & Drawing']
------------------------------------------------------------------------------------------------------------------
title: Architectural and Engineering Design Standards, url_book: http://www.pdfdrive.com/arc

title: Drawing Cartoons & Comics for Dummies, url_book: http://www.pdfdrive.com/drawing-cartoons-comics-for-dummies-e34313677.html, size_book: 8.17 MB, year: 2009, number_pages: 363 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/fc7/fc77208b8ad1ab749e3c805612a4ff83-s.jpg, langage_book: English, tags: ['Most Popular', 'Painting & Drawing']
------------------------------------------------------------------------------------------------------------------
title: Designing an Aquaponic Greenhouse for an Urban Food , url_book: http://www.pdfdrive.com/designing-an-aquaponic-greenhouse-for-an-urban-food-e17513662.html, size_book: 4.49 MB, year: 2015, number_pages: 126 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/68a/68a2e1ebee3ede43f5e2171e6d9a31e4-s.jpg, langage_book: English, tags: ['Architecture']
------------------------------------------------------------------------------------------------------------------
title: Make Electronics, url_book: http://www.pdfdrive.com/ma

title: Architectural Thought : The Design Process and and the Expectant , url_book: http://www.pdfdrive.com/architectural-thought-the-design-process-and-and-the-expectant-e11462285.html, size_book: 8.85 MB, year: 2007, number_pages: 191 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/4b9/4b913261d52b58ec35748c3cc1ea7855-s.jpg, langage_book: English, tags: ['Architecture']
------------------------------------------------------------------------------------------------------------------
title: Soap Making Made Easy 2nd edition, url_book: http://www.pdfdrive.com/soap-making-made-easy-2nd-edition-e4265800.html, size_book: 3.21 MB, year: 2012, number_pages: 86 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/6da/6da5a52c77a127cc5cba8cc7a5d83903-s.jpg, langage_book: English, tags: ['Craft & Hobbies']
------------------------------------------------------------------------------------------------------------------
title: The Essential Guide to Digital Photography, url_book: htt

title: Photography and Cinema, url_book: http://www.pdfdrive.com/photography-and-cinema-e29875530.html, size_book: 2.45 MB, year: 2010, number_pages: 161 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/16c/16c7e5c24dded48c5c290055aff5fd31-s.jpg, langage_book: English, tags: ['Photography']
------------------------------------------------------------------------------------------------------------------
title: Understanding Your Dog For Dummies, url_book: http://www.pdfdrive.com/understanding-your-dog-for-dummies-e20478966.html, size_book: 3.19 MB, year: 2007, number_pages: 290 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/e63/e6392335a02a26e7085b34c48812385a-s.jpg, langage_book: English, tags: ['Art']
------------------------------------------------------------------------------------------------------------------
title: The Cognitive Neuroscience of Music, url_book: http://www.pdfdrive.com/the-cognitive-neuroscience-of-music-e33510741.html, size_book: 7.37 MB, year: 

title: Music, Philosophy And Modernity, url_book: http://www.pdfdrive.com/music-philosophy-and-modernity-e21956232.html, size_book: 1.77 MB, year: 2007, number_pages: 444 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/8e0/8e092789ea71e06dee4332fcd5ed2070-s.jpg, langage_book: English, tags: ['Music']
------------------------------------------------------------------------------------------------------------------
title: Contemporary American Painting and Sculpture, url_book: http://www.pdfdrive.com/contemporary-american-painting-and-sculpture-e22533515.html, size_book: 12.04 MB, year: 2008, number_pages: 250 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/b06/b06662dd6328534b0d1e4f927329036d-s.jpg, langage_book: English, tags: ['Painting & Drawing']
------------------------------------------------------------------------------------------------------------------
title: Lighting for Digital Photography: From Snapshots to Great Shots, url_book: http://www.pdfdrive.com/lig

title: The Collector's Guide To Emerging Art Photography, url_book: http://www.pdfdrive.com/the-collectors-guide-to-emerging-art-photography-e28843651.html, size_book: 7.16 MB, year: 2008, number_pages: 172 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/b39/b39d63770edee1fb58b6c2560b2ed907-s.jpg, langage_book: English, tags: ['Photography']
------------------------------------------------------------------------------------------------------------------
title: Guide to Making Jewelry with Beads, url_book: http://www.pdfdrive.com/guide-to-making-jewelry-with-beads-e4713867.html, size_book: 3.09 MB, year: 2012, number_pages: 27 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/f59/f59b6f681a2dbfe380fc4dc3afdaea09-s.jpg, langage_book: English, tags: ['Craft & Hobbies']
------------------------------------------------------------------------------------------------------------------
title: The architecture of humanism, url_book: http://www.pdfdrive.com/the-architecture-of-hu

title: MATLAB Creating Graphical User Interfaces, url_book: http://www.pdfdrive.com/matlab-creating-graphical-user-interfaces-e318415.html, size_book: 6.32 MB, year: 2016, number_pages: 508 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/2e6/2e68d328ce043484d7b49dfae9564648-s.jpg, langage_book: English, tags: ['Art']
------------------------------------------------------------------------------------------------------------------
title: Historic Architectural Survey, url_book: http://www.pdfdrive.com/historic-architectural-survey-e26021961.html, size_book: 5.12 MB, year: 2006, number_pages: 284 Pages, img_book: https://cdn.pdfdrive.com/assets/thumbs/4ee/4ee741fb8defecc2bcf596ed44fb0c1a-s.jpg, langage_book: English, tags: ['Art', 'History']
------------------------------------------------------------------------------------------------------------------
title: Painted Wood: History and Conservation, url_book: http://www.pdfdrive.com/painted-wood-history-and-conservation-e1928070

### Step 5: Storing Extracted Data


Once you have successfully extracted data from a website and handled pagination, the next step is to store your data in persistent storage. There are several options for this, including databases, CSV files, JSON files, and more. In this section, we will focus on storing data in CSV and JSON formats.


In [9]:
import json
import csv

# save the datas into the csv file
with open("data.json", "w") as json_file:
    json.dump(datas, json_file, indent=4)

# save the datas into the json file
with open("data.csv", "w") as csv_file:
    spamwriter = csv.writer(csv_file, delimiter=';')
    for data in datas:
        spamwriter.writerow(data.values())

 # Conclusion
 
In this introductory session, we’ll dive into extracting valuable information from websites using BeautifulSoup, a popular and powerful Python library for web scraping. We'll cover the essential basics, providing a solid foundation to help you get started with web scraping. In future sessions, we’ll take things further by exploring advanced topics, including frameworks like __Scrapy__, __selenium__ and other specialized packages that make web scraping more efficient and robust.
 
 <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.VE7qFLsdSs2uYnb0Lie1ewHaEW%26pid%3DApi&f=1&ipt=84a1ca00df40cfbbe46846d0a417b81f4c8e5cdbfe85d71e64918f6d502f41db&ipo=images" />