# Build a Python web scraper with Beautiful Soup


# What’s web scraping?
-Web scraping is a technique used to collect data from the internet. Web scrapers are programmed to go to websites, get the relevant pages and extract the information needed. The automation of this process allows tons of data to be extracted at a high speed.


# Scraping data from 2b website


In this tutorial, we will build a web scraper using the libraries Beautiful Soup and Requests in Python to extract the following data from 2b website.

· Product Type

· Product Price

· Product Rating

Let's Goooooo

# Case study : 2b Samsung Phone
URL :https://2b.com.eg/en/catalogsearch/result/?q=sumsung How Do you scrape Data From ex website ?

1- find The URL that you want to scrape

2- inspecting the page

3- find the data you want to extract

4- Write the code 5-run and extract

6-export the data in the required format

# importing libraries

In [1]:
import pandas as pd 
import requests

# import libraries & methods 

In [2]:
#pip install bs4 

In [3]:
from bs4 import BeautifulSoup as bs #for Scrapping 

In [4]:
from urllib.request import urlopen #for Connection (open )


# import URL

In [5]:
url ="https://2b.com.eg/en/catalogsearch/result/?q=samsung"


# Create a client - based request to get the request (open con.)

We start by downloading the pages using the method get in the Python requests library. It sends a request to the specified URL.

In [6]:
client = urlopen(url)

 So our request is not denied.

# Getting HTML code of the full website

Once the request was successful, we want to convert it into a code using .read(). The output is a huge mess, HTML content that we cannot make sense of. That’s when Beautiful Soup Python library comes into play.

In [7]:
page = client.read() #to read html code 

In [8]:
#page 

In [9]:
client.close () #for closing html code


# Creating an HTML parser using bs.soup

Beautiful Soup is a Python library for parsing structured data. 

When you add the last line of code, a Beautiful Soup object is created. It takes page.text, which is the HTML content you scraped earlier, as its input. The second argument, “html.parser”, makes sure that you use the right parser for HTML content.

In [10]:
soup = bs (page,"html") # kind of parser = html
#soup is container 

In [11]:
#soup


# Before we go any further let’s look at the Beautiful Soup Functions used in this tutorial.

find_all — returns a list containing all results matching the search criteria defined.

find — returns the first result matching a search criterion that we applied on a Beautiful Soup object.


# Creating a container For target Data


In [12]:
container = soup.find_all ('div',{'class':"product details product-item-details"})

In [13]:
container[0].text

'\n\n\r\n                                Samsung Galaxy A23 - 4GB RAM - 128GB                            \n\n\n\nRating:\n\n0%\n\n\n\n\n\nAs low as\nEGP6,199.00\n\n\n \n\n\n\nAdd to Wish List\n\n\n\n\nAdd to Cart\n\n\n\n\nAdd to Compare\n\n\n\n'

In [14]:
len (container)

48

# Extracting product type and price details


Clicking on inspect product type reveals the following code. we can either use the find method passing the tag <a> class=”product-item-link”.

In [15]:
item_name = soup.find_all ('a',{'class':'product-item-link'})


In [16]:
item_name [0].text.strip()

'Samsung Galaxy A23 - 4GB RAM - 128GB'

In [17]:
item_rating = soup.find_all ('div',{'class':'rating-result'})

In [18]:
item_rating[0].text.strip()

'0%'

In [19]:
item_price = soup.find_all('span',{'class':'price-wrapper'})

In [20]:
item_price[0].text

'EGP6,199.00'

# compine it together


In [21]:
file = open ("Desktop/case_study/2b_samsung.csv","w")
header = 'item_name,item_rating,item_price \n'
file.write(header)

34

In [22]:
#for looping multible pages
for j in range (1,7) :
    url= f'https://2b.com.eg/en/catalogsearch/result?p={j}&q=samsung+' #f= format 
    res=requests.get(url)
    soup=bs(res.text)
    container = soup.find_all ('div',{'class':"product details product-item-details"})
    for i in container :
        item_name = i.find_all ('a',{'class':'product-item-link'})
        item_name =item_name [0].text.strip()
        item_rating = i.find_all ('div',{'class':'rating-result'})
        item_rating = item_rating[0].text.strip()
        item_price = i.find_all('div',{'class':'price-box'})
        item_price =item_price[0].text.strip().replace ("\n"," ").replace(",","")
        file.write(item_name+" , "+item_rating+" , "+item_price +"\n")
        print(item_name+" , "+item_rating+" , "+item_price)
        print( )


Samsung Galaxy A23 - 4GB RAM - 128GB , 0% , As low as EGP6199.00

Samsung Galaxy A04s - 4GB RAM - 64GB , 0% , As low as EGP4399.00

Samsung Galaxy S23 Ultra - 12GB RAM - 256GB , 0% , As low as EGP51999.00

Samsung Galaxy A04s - 4GB RAM - 128GB , 0% , As low as EGP4899.00

Samsung Galaxy A13 - 4GB RAM - 64GB , 100% , As low as EGP4899.00

Samsung Galaxy A13 - 4GB RAM - 128GB , 0% , As low as EGP5399.00

Samsung Galaxy A23 - 4GB RAM - 128GB - Light Blue , 0% , was EGP7199.00     Special Price EGP6199.00

Samsung Galaxy A23 - 4GB RAM - 128GB - Black , 0% , was EGP7199.00     Special Price EGP6199.00

Samsung Galaxy A04s - 4GB RAM - 128GB - Copper , 0% , was EGP6499.00     Special Price EGP4899.00

Samsung Galaxy A04s - 4GB RAM - 128GB - White , 0% , was EGP6499.00     Special Price EGP4899.00

Samsung Galaxy A04s - 4GB RAM - 64GB - Copper , 0% , was EGP5199.00     Special Price EGP4399.00

Samsung Galaxy A04s - 4GB RAM - 128GB - Black , 0% , was EGP6499.00     Special Price EGP4899.00

Sa

In [23]:
#note : the price is incorrect :)))))))

In [24]:
#now we can import data as CSV file :))))))

# Thank you 


In [25]:
#finally we must close file connection 

In [26]:
file.close ()