**Fetching data from web scraping and APIs are two different methods of retrieving information from the web. Web scraping involves extracting data directly from websites by parsing the HTML or XML structure of web pages. APIs, on the other hand, provide a structured way to access data from web services by making requests to specific endpoints. Both are very effective way of retriving desired data from web; they automate a task that would be very time consuming if done manually.**


## Web Scraping:
<br>
<img src="https://github.com/EDGE-Programe/Python-Basics/blob/master/Python_edge_program/logo_images/web_scraping.png?raw=1" alt="python" width=85% height=71% title="Web Scraping">
<br>

Web scraping is the process of automatically extracting data from websites. It involves parsing the **`HTML or XML`** structure of web pages to locate and extract specific information, such as **`text, images, tables, or links`**.

When you visit a website, your browser retrieves the HTML code that represents the content and structure of that webpage. Web scraping tools and libraries allow you to programmatically access and extract data from this HTML code. By analyzing the structure of the page and using various techniques, you can target specific elements or patterns in the HTML to extract the desired data.

Web scraping can be performed using programming languages such as Python, along with libraries like **`BeautifulSoup or Scrapy`**. These libraries provide functionalities to navigate the HTML structure, select elements based on CSS selectors or XPath expressions, and extract the relevant data.

Web scraping can be useful in various scenarios, such as:

- **Data collection:** Extracting data from multiple websites for analysis, research, or building datasets.
<br>

- **Price comparison:** Scraping e-commerce websites to compare prices and find the best deals.
<br>

- **Content aggregation:** Collecting news articles, blog posts, or other content from various sources.
<br>

- **Monitoring and tracking:** Scraping websites to track changes in data, such as stock prices, weather forecasts, or job postings.
<br>

- Lead generation: Extracting contact information from websites for sales and marketing purposes.

It's important to note that while web scraping can be a powerful tool for data extraction, it's crucial to respect website terms of service, adhere to legal and ethical guidelines, and not overload servers with excessive requests.

**`Steps to follow for a web-scraping task`:**

- **Identify the website:** Determine the website from which you want to extract data.

- **Inspect the page:** Analyze the HTML structure of the web page to identify the relevant elements and data you want to extract.

- **Choose a scraping tool/library:** Select a suitable programming language (such as Python) and a web scraping library (e.g., BeautifulSoup or Scrapy).

- **Write code for scraping:** Use the selected library to write code that navigates the website's structure, selects the desired elements, and extracts the required data.

- **Handle anti-scraping measures:** Some websites implement anti-scraping measures, such as CAPTCHAs or rate limiting. You may need to implement additional techniques to bypass or handle these measures.

- **Parse and process the data:** Once the data is extracted, you can parse and process it according to your requirements.



We will use **`BeautifulSoup`** library to build our web scraping script. <br>
<br>
<img src="https://github.com/EDGE-Programe/Python-Basics/blob/master/Python_edge_program/logo_images/bs.webp?raw=1" alt="python" width=76% height=58% title="BeautifulSoup logo">
<br>

### Step 1: Installing BeautifulSoup Library

In [None]:
# using pip -- python package manager to install BeautifulSoup
!pip install beautifulsoup4



### Step 2: Decide which WebSite to Scrape Data From

For this lecture we will scrape informations about apartments listed in the website https://www.bproperty.com/

### Step 3: Start Writing The Web-Scraper Script

In [None]:
# bs4 is short for BeautifulSoup4
from bs4 import BeautifulSoup as soup #BeautifulSoup4
from urllib.request import urlopen as ureq #urlopen for accesing the url

In [None]:
# Let's scrape information on apartments listed for sell on page-4
my_url = 'https://www.bproperty.com/en/dhaka/apartments-for-sale/page-4/' # web-page url

web_client = ureq(my_url) # opening the web url using urlopen

html_data = web_client.read() # reading the content of the web page

web_client.close() # terminating connection after reading the data

#### Terminating connection after reading data  is a good practice

In [None]:
data_soup = soup(html_data, "html.parser") # data_soup --> html data parser object using BeautifulSoup

contents = data_soup.findAll("li", {"class": "ef447dde"}) # read data under "list" where "class = ef447dde"

We didn't find the **`"li"`** and **`"class": "ef447dde"`** by magic; you need to find out these values on your own. They will change depending on the webpage you're scaraping and the data you want to scrape. For our current web scraping task on **`url:https://www.bproperty.com/en/dhaka/apartments-for-sale/page-4/`**, we found this value by using **`inspect element`** or alternatively you can **`print()`** data in **`html_data`** and parse through the data to find out what where your desired data lies within the **`html_data`.**  
<br>
<img src="https://github.com/EDGE-Programe/Python-Basics/blob/master/Python_edge_program/logo_images/bproperty_inspect.png?raw=1" alt="python" width=94% height=85% title="BPrope">
<br>

Or else as mentioned above we can scroll through the **`html_data`** to find out the parameters required retrive the desired data. <br>

In [None]:
print(html_data)

b'<!DOCTYPE html><html lang="en" dir="ltr" itemscope="" whatsappbutton="original" recommenderthreelistingsondetailpages="recommended_only" maptoggle="original" itemType="http://schema.org/WebPage"><head><meta charSet="UTF-8"/><meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=0"/><link rel="dns-prefetch" href="http://LL8IZ711CS-dsn.algolia.net"/><link rel="dns-prefetch" href="https://www.googletagmanager.com"/><link rel="dns-prefetch" href="https://www.google-analytics.com"/><link rel="dns-prefetch" href="https://images.bayut.com"/><script id="runtimeSettings">window.CONFIG={runtime: Object.assign({"CANONICAL_DOMAIN":"www.bproperty.com","CANONICAL_DOMAIN_EXCEPTION":["strat.bproperty.com","external.bproperty.com","api-legacy.bproperty.com"],"DISABLE_AUTO_PLAY_CAROUSELS":false,"EXPERIMENTS_OVERRIDE_VARIANT":"{\\"RecommenderThreeListingsOnDetailPages\\":\\"recommended_only\\", \\"MapToggle\\":\\"original\\"}","DISABLE_MAPBOX_API":false,"DISABLE_FACEBOOK_PL

Here we can see the **`key and value`** pairs inside the **`html_data`**. We can see **`category_1_id`**, **`category_1_name`**, **`loc_city_name`**, **`loc_name`**, **`price_max`** and many more information about the apartments listed for sale. As we can see the informations are under **`.ef447dde`** class.<br>

In [None]:
print(contents) # print the data in contents


[<li aria-label="Listing" class="ef447dde" role="article"><article class="ca2f5674"><script type="application/ld+json">{"@context":"https://schema.org","@type":"Apartment","name":"A 1438 Sq Ft Ready Flat Is Here For Sale At Baitul Aman Road, Adabor","url":"https://www.bproperty.com/en/property/details-5576976.html","geo":{"@type":"GeoCoordinates","latitude":23.773295,"longitude":90.358212},"floorSize":{"@type":"QuantitativeValue","value":"1,438","unitText":"SQFT"},"numberOfRooms":{"@type":"QuantitativeValue","name":"Bedroom(s)","value":"3"},"numberOfBathroomsTotal":"3","image":"https://images-cdn.bproperty.com/thumbnails/1584797-400x300.jpeg","address":{"@type":"PostalAddress","addressCountry":"Bangladesh","addressRegion":"Dhaka","addressLocality":"Adabor"}}</script><div class="_4041eb80"><a aria-label="Listing link" class="_287661cb" href="/en/property/details-5576976.html" index="0" title="A 1438 Sq Ft Ready Flat Is Here For Sale At Baitul Aman Road, Adabor"><div class="_1e33cd36"></

In [None]:
print(type(contents)) # print data type of contents

print(len(contents)) # print how many apartment data inside contents

<class 'bs4.element.ResultSet'>
24


So, we can see that **`contents`** is a **`bs4.element.ResultSet`** and it contain's sets of information about 24 apartments in that webpage, listed under the **`class = ef447dde`**.

In [None]:
# retrive required information

for container in contents:
    Ad_Title = container.a["title"] # Extract the data with "title" key using the anchor tag "a"
    price_data = container.findAll("span", {"class": "f343d9ce"}) # data inside <span class = f343d9ce>
    price = price_data[0].text # extracting data with "text" class attribute
    Advert_area = container.findAll("div", {"class": "_00a37089"}) # data inside <div class = _00a37089>
    property_area = Advert_area[0].text
    apartment_size = container.findAll("span", {"class": "_4bdd430c"})
    apartment_sqft = apartment_size[0].text
    room_info = container.findAll("span", {"class": "_6acf48d3 fd8e0e18"})
    bed_rooms = room_info[1].text
    bath_rooms = room_info[2].text
    print(bath_rooms)

3
4
3
3
2
3
3
3
3
3
3
3
3
4
3
3
3
3
3
3
3
3
3
3


The code snippet performs the following steps:

1. It iterates over a collection called **`"contents"`**.
<br>

2. Within each iteration, it extracts the data associated with the "title" key from the anchor tag **`<a>`** within the container and assigns it to the variable **`Ad_Title`**.
<br>

3.  It finds all **`<span>`** elements with the class **`"f343d9ce"`** within the container and stores them in the **`price_data`** variable as a list.


4. Assuming the desired information is in the first item of the **`price_data`** list, it extracts the text content of that element using the **`.text`** attribute and assigns it to the variable price.
<br>

5. It finds all **`<div>`** elements with the class **`"_00a37089"`** within the container and stores them in the **`Advert_area`** variable as a list.
<br>
6. Assuming the desired information is in the first item of the **`Advert_area`** **list**, it extracts the **text** content of that element using the **`.text`** attribute and assigns it to the variable **`property_area`**.

class attribute used in the **`findAll`** method may differ based on the HTML structure of the webpage being scraped.

Scraped data is most often stored in some format **(text, csv, etc..)** or sometimes driectly into a database **(sqlite, sql, mongodb, etc..)**. For now  let's save our data into a **`csv`** file.

In [None]:
import csv # import csv module for python

# folder path to save the csv file
csv_file_path = 'additional_files/web_scraping/bproperty_apartment_sell.csv'

# Writing the data to the CSV file
with open(csv_file_path, mode='w', newline='', encoding='utf-8') as csv_file:
    fieldnames = ['Ad_Title', 'Price', 'Advert_Area', 'Apartment_Size', 'Bed_Rooms', 'Bath_Rooms']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()

    for container in contents:
        Ad_Title = container.a["title"]
        price_data = container.findAll("span", {"class": "f343d9ce"})
        price = price_data[0].text
        Advert_area = container.findAll("div", {"class": "_00a37089"})
        property_area = Advert_area[0].text
        apartment_size = container.findAll("span", {"class": "_4bdd430c"})
        apartment_sqft = apartment_size[0].text
        room_info = container.findAll("span", {"class": "_6acf48d3 fd8e0e18"})
        bed_rooms = room_info[1].text
        bath_rooms = room_info[2].text

        writer.writerow({
                'Ad_Title': Ad_Title,
                'Price': price,
                'Advert_Area': property_area,
                'Apartment_Size': apartment_sqft,
                'Bed_Rooms': bed_rooms,
                'Bath_Rooms': bath_rooms
            })

This code will write the scraped data into a **CSV** file named **`'bproperty_apartment_sell.csv'`** with the specified columns. Each row will represent a single container's data.

Sometimes it's more convinient to store the scraped **`data`** directly into a **`database`**. We're going to use **`sqlite3`** as our database engine. **`SQLite3`** is the default database engine for many web development framework, it has a easy to use python package and it is best suited for **`small to mid`** sized database creation.
<br>
<img src="https://github.com/EDGE-Programe/Python-Basics/blob/master/Python_edge_program/logo_images/sqlite.png?raw=1" alt="python" width=31% height=21% title="SQLite logo">
<br>

In [None]:
# importing the sqlite module
import sqlite3

# path to empty database
database = "additional_files/web_scraping/bproperty.sqlite3"

# connect to sqlite3 database
conn = sqlite3.connect(database)


# create table to store the data that we've scraped
create_table_query = '''
    CREATE TABLE IF NOT EXISTS bproperty_adverts (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "Ad_Title" TEXT NOT NULL,
    "Price" REAL NOT NULL,
    "Numerical_Suffix" TEXT,
    "Advert_Area" TEXT,
    "Apartment_Size" TEXT,
    "Bed_Rooms" INTEGER NOT NULL,
    "Bath_Rooms" INTEGER NOT NULL
);
'''

cursor = conn.cursor()
cursor.execute(create_table_query)

# scrape required data and change data type if required
for container in contents:
        Ad_Title = container.a["title"]
        price_data = container.findAll("span", {"class": "f343d9ce"})
        price_text = price_data[0].text
        Price = float(price_text.split(" ")[0])
        price_suffix = price_text.split(" ")[1]
        # print(type(Price))
        Advert_Area = container.findAll("div", {"class": "_00a37089"})
        property_area = Advert_area[0].text
        apartment_size = container.findAll("span", {"class": "_4bdd430c"})
        apartment_sqft = apartment_size[0].text
        room_info = container.findAll("span", {"class": "_6acf48d3 fd8e0e18"})
        bed_rooms = int(room_info[1].text)
        bath_rooms = int(room_info[2].text)

        # insert data into sqlite3 database
        sqlite_insert = """INSERT INTO bproperty_adverts
                         (Ad_Title, Price, Numerical_Suffix, Advert_Area, Apartment_Size, Bed_Rooms,
                          Bath_Rooms)
                          VALUES (?, ?, ?, ?, ?, ?, ?);"""

        data_tuple = (Ad_Title, Price, price_suffix, property_area, apartment_sqft, bed_rooms, bath_rooms)

        cursor.execute(sqlite_insert, data_tuple)
        conn.commit()

cursor.close()
conn.close()

### Code Explanation
- **`Importing the sqlite3 module:`**

```python
import sqlite3
```
This line imports the **`sqlite3`** module, which provides an interface to interact with **`SQLite`** databases using Python.

- **`Specifying the path to the SQLite database file:`**

```python
database = "additional_files/web_scraping/bproperty.sqlite3"
```
This line assigns the file path of the SQLite database to the variable database. The database file will be created at this location if it doesn't exist.

- **`Connecting to the SQLite database:`**

```python
conn = sqlite3.connect(database)
```
This line **`establishes a connection`** to the SQLite database using the file path specified in the **`database`** variable.

- **`Creating the table to store scraped data:`**

```python
create_table_query = '''
    CREATE TABLE IF NOT EXISTS bproperty_adverts (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "Ad_Title" TEXT NOT NULL,
    "Price" REAL NOT NULL,
    "Numerical_Suffix" TEXT,
    "Advert_Area" TEXT,
    "Apartment_Size" TEXT,
    "Bed_Rooms" INTEGER NOT NULL,
    "Bath_Rooms" INTEGER NOT NULL
);
'''
cursor = conn.cursor()
cursor.execute(create_table_query)
```
These lines define the **`CREATE TABLE`** SQL statement to create the table named **bproperty_adverts**. The table has several **`columns`** with different data types, including **`id`**, **`Ad_Title`**, **`Price`**, **`Numerical_Suffix`**, **`Advert_Area`**, **`Apartment_Size`**, **`Bed_Rooms`**, and **`Bath_Rooms`**.

The **`IF NOT EXISTS`** clause ensures that the table is only created if it doesn't already exist in the database. After defining the **`SQL statement`**, a **`cursor`** object is created, and the execute method is used to execute the SQL query, creating the table.


- **`Scraping data and inserting it into the database:`**

```python
for container in contents:
    # ... (code to scrape data and assign values to variables)

    # insert data into sqlite3 database
    sqlite_insert = """INSERT INTO bproperty_adverts
                       (Ad_Title, Price, Numerical_Suffix, Advert_Area, Apartment_Size, Bed_Rooms,
                        Bath_Rooms)
                        VALUES (?, ?, ?, ?, ?, ?, ?);"""

    data_tuple = (Ad_Title, Price, price_suffix, property_area, apartment_sqft, bed_rooms, bath_rooms)
    
    cursor.execute(sqlite_insert, data_tuple)
    conn.commit()
```
This loop iterates through the **`contents`** list, which contains **`scraped data`**. The code inside the loop extracts the required information, assigns values to variables (Ad_Title, Price, price_suffix, Advert_Area, apartment_sqft, bed_rooms, bath_rooms), and then inserts this data into the bproperty_adverts table using SQL's **`INSERT INTO statement`**.

The **`sqlite_insert`** query uses parameter **placeholders** **`(?)`** to safely **insert data** into the database, preventing SQL injection vulnerabilities. The **`data_tuple`** contains the values to be inserted, and the **`cursor.execute`** method executes the query with the provided data.

The **`conn.commit()`** statement is used to commit the changes to the database, making the inserted data permanent.

- **`Closing the cursor and the database connection:`**

```python
cursor.close()
conn.close()
```
After the loop finishes executing, the cursor is closed using the **`cursor.close()`** method. Finally, the connection to the database is closed using the **`conn.close()`** method to **release any resources and ensure proper cleanup**.

## Web Scraping For Image

Image scaraping is one of the most popular use cases for web scraping, many public machine learning dataset was created by web scraping images from the internet. Let's scrape the images of listed properties in bproperty.

In [None]:
# the variable image will hold the value of the "src" attribute of the "img" tag
for container in contents:
        image = container.img["src"]
        print(image)

https://images-cdn.bproperty.com/thumbnails/1584797-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1586649-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1586263-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1587862-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1586188-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1587646-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1587658-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1587106-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1587107-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1586317-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1579914-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1582202-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1586669-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1583891-400x300.jpeg
https://images-cdn.bproperty.com/thumbnails/1587272-400x300.jpeg
https://images-cdn.bprope

Now let's save these images into a databse using **`sqlite3`**.

In [None]:
!pip install requests

Collecting requests
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.6/62.6 kB[0m [31m317.2 kB/s[0m eta [36m0:00:00[0m1m234.7 kB/s[0m eta [36m0:00:01[0m
[?25hCollecting charset-normalizer<4,>=2 (from requests)
  Downloading charset_normalizer-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m kB/s[0m eta [36m0:00:01[0m:01[0m
Collecting urllib3<3,>=1.21.1 (from requests)
  Downloading urllib3-2.0.3-py3-none-any.whl (123 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m123.6/123.6 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m
[?25hCollecting certifi>=2017.4.17 (from requests)
  Downloading certifi-2023.5.7-py3-none-any.whl (156 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# importing the sqlite module
import sqlite3
import requests


# function to retrive image byte data from image url
def image_link_to_blob(image_link):
    response = requests.get(image_link)
    if response.status_code == 200:
        # Get the image content as binary data
        image_data = response.content

        return image_data


# path to empty database
database = "additional_files/web_scraping/bproperty_with_images.sqlite3"

# connect to sqlite3 database
conn = sqlite3.connect(database)


# create table to store the data that we've scraped
create_table_query = '''
    CREATE TABLE IF NOT EXISTS bproperty_adverts (
    "id" INTEGER PRIMARY KEY AUTOINCREMENT,
    "Ad_Title" TEXT NOT NULL,
    "Price" REAL NOT NULL,
    "Numerical_Suffix" TEXT,
    "Advert_Area" TEXT,
    "Apartment_Size" TEXT,
    "Bed_Rooms" INTEGER NOT NULL,
    "Bath_Rooms" INTEGER NOT NULL,
    "Picture_Link" TEXT,
    "Picture_Blob" BLOB
);
'''

cursor = conn.cursor()
cursor.execute(create_table_query)

# scrape required data and change data type if required
for container in contents:
        Ad_Title = container.a["title"]
        price_data = container.findAll("span", {"class": "f343d9ce"})
        price_text = price_data[0].text
        Price = float(price_text.split(" ")[0])
        price_suffix = price_text.split(" ")[1]
        # print(type(Price))
        Advert_Area = container.findAll("div", {"class": "_00a37089"})
        property_area = Advert_area[0].text
        apartment_size = container.findAll("span", {"class": "_4bdd430c"})
        apartment_sqft = apartment_size[0].text
        room_info = container.findAll("span", {"class": "_6acf48d3 fd8e0e18"})
        bed_rooms = int(room_info[1].text)
        bath_rooms = int(room_info[2].text)
        image_link = container.img["src"]
        image_blob = image_link_to_blob(image_link)

        # insert data into sqlite3 database
        sqlite_insert = """INSERT INTO bproperty_adverts
                         (Ad_Title, Price, Numerical_Suffix, Advert_Area, Apartment_Size, Bed_Rooms,
                          Bath_Rooms, Picture_Link, Picture_Blob)
                          VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?);"""

        data_tuple = (Ad_Title, Price, price_suffix, property_area, apartment_sqft, bed_rooms, bath_rooms, image_link, image_blob)

        cursor.execute(sqlite_insert, data_tuple)
        conn.commit()

cursor.close()
conn.close()

We can store image **url** as **`text`** or alternatively we can get the content of the image with it's **`url`** and store it into our database as **`BLOB`** data. We can then use the image **`url`** to download the images later using the **`Picture_Link`** database column. Each method has it's own pros and cons: <br>

**`Storing Image URLs:`**

- **Pros:**
1. Smaller Database Size: Storing image URLs keeps the database size smaller since the actual image data is not stored in the database. This can be beneficial for databases with limited storage capacity or for optimizing performance.
<br>

2. Faster Database Operations: Retrieving image URLs from the database is generally faster compared to fetching Blobs, especially for large images.
<br>

3. Separation of Concerns: Storing URLs allows you to separate image storage from the database, simplifying database management and backups.
<br>

- **Cons:**
        
1. Dependency on External Storage: Image URLs are dependent on external servers or resources. If the image hosting server goes down or the URL changes, the images may become unavailable.
<br>
2. Extra Request Overhead: Each time you need to display the image, you need to fetch it from the URL, which adds some network overhead and latency.
<br>
4. Potential Image Changes: Images may change or be deleted from the external source, affecting the image displayed in your application.
<br>


**`Storing Images as Blobs:`**

- **Pros:**

1. Data Integrity: Storing images as Blobs ensures that the image data is directly associated with the database record. This ensures data integrity, as the image is stored along with the rest of the data in the same transaction.
<br>
        
2. Offline Access: Blobs provide offline access to images, even if the image hosting server is temporarily unavailable.
<br>

3. Simplified Image Management: All image data is self-contained within the database, making it easier to manage and maintain.
<br>
    
- **Cons:**
        
1. Database Size: Storing images as Blobs can significantly increase the database size, which may become a concern for large numbers of images or databases with limited storage capacity.
<br>

2. Slower Database Operations: Fetching Blobs can be slower compared to retrieving image URLs, especially for large images, which might affect the application's performance.
<br>

## Web Scraping Using API

Web scraping using an API is a method of extracting data from websites by accessing their Application Programming Interfaces (APIs). Many websites offer APIs that allow developers to programmatically retrieve data in a structured format, such as JSON or XML, making it easier to extract specific information without parsing HTML directly.

Here's a step-by-step guide on how to perform web scraping using an API:

1. **Identify the API:**
   First, you need to identify the website's API you want to use for web scraping. Check the website's documentation or developer resources to find information about available APIs, their endpoints, and how to authenticate (if required).
<br>   

2. **Obtain an API Key (if needed):**
   Some APIs may require an API key or authentication token to access their data. If so, sign up for an account and obtain the necessary credentials to use the API.
<br>   


3. **Choose the Appropriate HTTP Library:**
   Python provides several libraries to make HTTP requests to APIs. Popular choices include requests, http.client, urllib, or third-party libraries like httpx. Install the desired library using pip.
<br>   


4. **Make HTTP Requests to the API:**
   Use the chosen library to make HTTP requests to the API's endpoints. You may need to include additional headers or parameters, such as the API key or specific query parameters to retrieve the desired data.
<br>   


5. **Parse the API Response:**
   The API response is typically in a structured format like JSON or XML. Parse the response data to extract the information you need using Python's built-in JSON or XML libraries, or third-party libraries like json, xml.etree.ElementTree, or xmltodict.
<br>   


6. **Process and Store Data:**
   Once you have extracted the desired data from the API response, you can process it further, analyze, and store it in a file, database, or any other storage solution.
<br>   

7. **Handle Pagination and Rate Limiting (if applicable):**
   Some APIs may return paginated data, meaning you'll need to handle multiple API calls to retrieve all the data. Additionally, some APIs have rate limits that restrict the number of requests you can make within a certain time period. Make sure to follow the API's guidelines to avoid getting blocked.
<br>   

8. **Error Handling and Data Validation:**
   Implement proper error handling to deal with potential issues, such as network errors or malformed API responses. Validate the retrieved data to ensure its correctness and avoid errors in downstream processing.
<br>   


In [None]:
import requests
import json

def fetch_posts():
    api_url = "https://jsonplaceholder.typicode.com/posts"

    try:
        # Send a GET request to the API
        response = requests.get(api_url)
        response.raise_for_status()  # Raise an exception if the request was not successful

        # Parse the JSON response
        posts_data = response.json()

        # Return the extracted data (list of posts)
        return posts_data
    except requests.exceptions.RequestException as e:
        print("Error during API request:", e)
        return None

# Example usage
if __name__ == "__main__":
    posts = fetch_posts()
    if posts:
        for post in posts:
            print(f"Post {post['id']}: {post['title']}")
            print(f"Body: {post['body']}"+"\n")
    else:
        print("Failed to fetch posts.")

Post 1: sunt aut facere repellat provident occaecati excepturi optio reprehenderit
Body: quia et suscipit
suscipit recusandae consequuntur expedita et cum
reprehenderit molestiae ut ut quas totam
nostrum rerum est autem sunt rem eveniet architecto

Post 2: qui est esse
Body: est rerum tempore vitae
sequi sint nihil reprehenderit dolor beatae ea dolores neque
fugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis
qui aperiam non debitis possimus qui neque nisi nulla

Post 3: ea molestias quasi exercitationem repellat qui ipsa sit aut
Body: et iusto sed quo iure
voluptatem occaecati omnis eligendi aut ad
voluptatem doloribus vel accusantium quis pariatur
molestiae porro eius odio et labore et velit aut

Post 4: eum et est occaecati
Body: ullam et saepe reiciendis voluptatem adipisci
sit amet autem assumenda provident rerum culpa
quis hic commodi nesciunt rem tenetur doloremque ipsam iure
quis sunt voluptatem rerum illo velit

Post 5: nesciunt quas odio
Body: repudiandae veni

Image **`url`** scraping using api. For demonstration purposes we'll use the **`unsplash`** api; which provides access to high-quality and free-to-use images. <br>

First, sign up for an account on Unsplash and obtain an access key (API key) to use their API. You can register and get an access key from the Unsplash API documentation (https://unsplash.com/documentation).

In [None]:
import requests
import json

def fetch_random_images(count):
    api_url = "https://api.unsplash.com/photos/random"
    access_key = "TgN2TZv2i3QC5kGOkem8oXEiVwk__IZ-34UwHsiJHh8"  # Replace with your actual Unsplash API access key
    params = {
        "client_id": access_key,
        "count": count
    }

    try:
        # Send a GET request to the Unsplash API
        response = requests.get(api_url, params=params)
        response.raise_for_status()  # Raise an exception if the request was not successful

        # Parse the JSON response
        images_data = response.json()
        print(images_data)
        print()

        # Extract image URLs from the response using the correct key inside the json data
        image_links_regular = [image["urls"]["regular"] for image in images_data]

        image_links_raw = [image["urls"]["raw"] for image in images_data]

        image_width = [image["width"] for image in images_data]

        image_height = [image["height"] for image in images_data]

        return image_links_regular, image_links_raw, image_width, image_height
    except requests.exceptions.RequestException as e:
        print("Error during API request:", e)
        return None

# Example usage
if __name__ == "__main__":
    image_links_regular, image_links_raw, w, h = fetch_random_images(4)
    if image_links_regular:
        print("Random Regular Image Links:")
        for idx, link in enumerate(image_links_regular):
            print(f"{idx}. {link}")
            print(f"height:{h[idx]}, width:{w[idx]}")
            print()
    else:
        print("Failed to fetch image links.")

    if image_links_raw:
        print("Random Raw Image Links:")
        for idx, link in enumerate(image_links_raw):
            print(f"{idx}. {link}")
            print(f"height:{h[idx]}, width:{w[idx]}")
            print()


[{'id': '8jWZLEHv4Mw', 'slug': '8jWZLEHv4Mw', 'created_at': '2023-05-31T06:28:33Z', 'updated_at': '2023-07-18T14:51:41Z', 'promoted_at': '2023-06-22T07:40:01Z', 'width': 4584, 'height': 6880, 'color': '#404026', 'blur_hash': 'LcIqfgoJayWV0Joff6WUS~WVWBoL', 'description': None, 'alt_description': 'a table topped with a plate of food next to a vase of flowers', 'breadcrumbs': [], 'urls': {'raw': 'https://images.unsplash.com/photo-1685514473556-c983a5971d13?ixid=M3w0NzcwNDB8MHwxfHJhbmRvbXx8fHx8fHx8fDE2ODk3NTg2NDN8&ixlib=rb-4.0.3', 'full': 'https://images.unsplash.com/photo-1685514473556-c983a5971d13?crop=entropy&cs=srgb&fm=jpg&ixid=M3w0NzcwNDB8MHwxfHJhbmRvbXx8fHx8fHx8fDE2ODk3NTg2NDN8&ixlib=rb-4.0.3&q=85', 'regular': 'https://images.unsplash.com/photo-1685514473556-c983a5971d13?crop=entropy&cs=tinysrgb&fit=max&fm=jpg&ixid=M3w0NzcwNDB8MHwxfHJhbmRvbXx8fHx8fHx8fDE2ODk3NTg2NDN8&ixlib=rb-4.0.3&q=80&w=1080', 'small': 'https://images.unsplash.com/photo-1685514473556-c983a5971d13?crop=entropy&cs=t

We can store this data using the same approach as before, following the familiar procedure. Additionally, we can utilize the existing function designed to convert image URLs to BLOB data to save the images in BLOB format.