# WEB SCRAPING WITH BEAUTIFUL SOUP

Web scraping is the process of extracting data from websites, and Python provides powerful libraries 
like Beautiful Soup 4 that make it easier to scrape and parse HTML and XML content.

Imports

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

We are collecting data of computer peripherals from "MD Computers" site using Beautiful Soup 4

Enter the item which you wanted to collect information.

In [2]:
search_item = input("Which item do you want to search? ")

From the dynamic url of the website we are formatting the search result to the url.

In [3]:
url = f"https://mdcomputers.in/index.php?submit_search=&route=product%2Fsearch&&search={search_item}"

Parsing the html content of the page.

In [4]:
page_ = requests.get(url).text
doc = BeautifulSoup(page_, "html.parser")

In [5]:
url

'https://mdcomputers.in/index.php?submit_search=&route=product%2Fsearch&&search=mouse'

Finding the total number of result pages by filtering the html tags through class name

In [6]:
#total_pages_x = doc.find(class_="col-sm-6 text-right")
total_pages_x = doc.find(class_="col-md-6 text-md-end mb-2")


In [8]:
total_pages_x.text.split("(")[1].split(" ")[0]

'8'

In [9]:
total_pages = int(total_pages_x.text.split("(")[1].split(" ")[0])

In [10]:
total_pages

8

Total page results obtained for the search

In [11]:
total_pages 

8

We are going to gather the item's name , its link , its price (old and new) below

Creating empty lists for the data

In [12]:
link = []
item_name = []
new_price = []
old_price = []

Here is the main code which scrape the information from the html code

From this url , we can search for the products we need , through iterating the pages till the total page results we obtained earlier.

In [13]:
for page in range(1, total_pages + 1):
    url = f"https://mdcomputers.in/index.php?route=product/search&page={page}&search={search_item}"
    print(url)
    page_ = requests.get(url).text
    doc = BeautifulSoup(page_, "html.parser")
    
    items = doc.find_all(class_ = "product-wrapper") #Finding the main div tag for each product which holds the data. 
    print(items)
    for item in items:
        
        link.append((item.find("h3")).find("a")['href']) #finding the link through href attribute
        item_name.append(((item.find("h3")).find("a")).text) #finding the name in the h3 tag
        old_price.append(((item.find("span",class_ = "del")).find("span",class_ = "amount")).text[1:]) #finding the old price
        new_price.append(((item.find("span",class_ = "ins")).find("span",class_ = "amount")).text[1:])
        #try:
            #In some Products, there is no old price so in order to clear the error , finding the old price in this try block
        #    old_price.append(((item.find("div",class_ = "price")).find("span",class_ = "price-old")).string[1:])    
        #except:
        #    #For the products having no old price , it is declared as null
        #    old_price.append(None)
    

https://mdcomputers.in/index.php?route=product/search&page=1&search=mouse
[<div class="product-wrapper"><div class="product-element-top product-quick-shop"><a class="product-image-link" href="https://mdcomputers.in/product/hp-m10-mouse-7ya10pa"><div class="product-labels labels-rectangular"><span class="onsale product-label">-55%</span></div><img alt="HP M10 Black Wired Mouse" class="attachment-large size-large" decoding="async" fetchpriority="low" loading="lazy" src="https://mdcomputers.in/cdn-cgi/image/width=500,height=500,quality=75/image/catalog/mouse/h-p/7ya10pa/7ya10pa-image-main.jpg"/></a><div class="wrapp-buttons"><div class="product-buttons"><div class="product-add-btn product-action-btn product-style-icon product-add-cart-icon"><button class="button product_type_simple add_to_cart_button ajax_add_to_cart add-to-cart-loop" onclick="cart.add('20559');"><span>Add to Cart</span><i class="bi bi-cart-plus"></i></button></div><div class="quick-view product-action-btn product-style-i

In [14]:
len(item_name)

141

In [15]:
len(link)

141

In [16]:
sorted(item_name)

['ANT ESPORTS KM550 Keyboard and Mouse Combo',
 'ANT ESPORTS Thunder 10 RGB Keyboard and Mouse Combo with Brown Switches',
 'ANT ESPORTS Thunder 30 RGB Keyboard and Mouse Combo',
 'ASUS ROG Keris II ACE Wireless RGB Moonlight White Gaming Mouse',
 'AVerMedia Elena Chan Mouse Pad (Large)',
 'Acer Wireless White and Green Mouse',
 'Acer ZC.A01SI.2DP Wireless Keyboard and Mouse Combo',
 'Adata XPG Battleground L Exoskeleton Totem Edition Gaming Mouse Pad (Large)',
 'Adata XPG Battleground L Gaming Mouse Pad (Large)',
 'Adata XPG Infarex M20 RGB',
 'Adata XPG Primer RGB Gaming Mouse (Black)',
 'Adata XPG Slingshot RGB Gaming Mouse',
 'Ant Esports AEC410 Type-C To USB Hub',
 'Ant Esports GM100 Gaming Mouse, Black-White',
 'Ant Esports KM1600 Gaming Keyboard and Mouse Combo',
 'Ant Esports KM1650 Pro Gaming Keyboard and Mouse Combo (Black)',
 'Ant Esports KM500W Gaming Keyboard And Mouse Combo',
 'Ant Esports KM500W Pro Wireless Gaming Keyboard Mouse Combo',
 'Ant Esports KM540 Keyboard Mous

Converting the lists to Dictionary and then to DataFrame.

In [17]:
my_dict = {'item_name': item_name , 'link': link , 'new_price': new_price , 'old_price': old_price}

In [18]:
len(old_price)

141

In [19]:
for key in my_dict:
    print(len(my_dict[key]))

141
141
141
141


In [20]:
df = pd.DataFrame(my_dict)

Here is our DataFrame with required data

In [21]:
df

Unnamed: 0,item_name,link,new_price,old_price
0,HP M10 Black Wired Mouse,https://mdcomputers.in/product/hp-m10-mouse-7y...,₹270,599
1,Ant Esports MP265 Gaming Mouse Pad (Medium),https://mdcomputers.in/product/ant-esports-mp2...,₹280,799
2,Dell D SELECT DS111 Mouse,https://mdcomputers.in/product/dell-mouse-d-se...,₹280,499
3,Fingers MegaHit Wired Mouse,https://mdcomputers.in/product/fingers-megahit...,₹300,425
4,Ant Esports MP 290 Gaming Mouse Pad (Large),https://mdcomputers.in/product/ant-esports-mp-...,₹315,599
...,...,...,...,...
136,Razer Cobra Pro Wireless Gaming Mouse (Black),https://mdcomputers.in/product/razer-cobra-pro...,"₹11,430",21299
137,Razer Cobra Pro Wireless Gaming Mouse (White),https://mdcomputers.in/product/razer-cobra-pro...,"₹11,490",21999
138,Razer Viper V3 Pro Wireless Gaming Mouse (Black),https://mdcomputers.in/product/razer-viper-v3-...,"₹12,750",25999
139,Razer Basilisk V3 Pro 35k White Gaming Mouse,https://mdcomputers.in/product/razer-basilisk-...,"₹13,110",24999


Converting the Dataframe to excel document with file name as the item we searched 

In [22]:
df.to_excel(f'{search_item}_list_mdcomputers.xlsx')

Finally we scraped the data from website(MD Computers) successfully through BeautifulSoup4.