### Web Scraping with Python Using Beautiful Soup

#### Learn how to scrape the web with Python!

The internet is an absolutely massive source of data — data that we can access using web scraping and Python!

In fact, web scraping is often the only way we can access data. There is a lot of information out there that isn’t available in convenient CSV exports or easy-to-connect APIs. And websites themselves are often valuable sources of data — consider, for example, the kinds of analysis you could do if you could download every post on a web forum.

To access those sorts of on-page datasets, we’ll have to use web scraping.

we’ll work through an actual web scraping project, focusing on Flipkart Data.



### The Fundamentals of Web Scraping:

#### What is Web Scraping in Python?

Some websites offer data sets that are downloadable in CSV format, or accessible via an Application Programming Interface (API). But many websites with useful data don’t offer these convenient options.


If we wanted to analyze this data, or download it for use in some other app, we wouldn’t want to painstakingly copy-paste everything. Web scraping is a technique that lets us use programming to do the heavy lifting. We’ll write some code that looks at the NWS site, grabs just the data we want to work with, and outputs it in the format we need.

Show you how to perform web scraping using Python 3 and the Beautiful Soup library. We’ll be scraping Data from the Flipkart, and then analyzing them using the Pandas library.

But to be clear, lots of programming languages can be used to scrape the web!

#### How Does Web Scraping Work?

When we scrape the web, we write code that sends a request to the server that’s hosting the page we specified. The server will return the source code — HTML, mostly — for the page (or pages) we requested.

So far, we’re essentially doing the same thing a web browser does — sending a server request with a specific URL and asking the server to return the code for that page.

But unlike a web browser, our web scraping code won’t interpret the page’s source code and display the page visually. Instead, we’ll write some custom code that filters through the page’s source code looking for specific elements we’ve specified, and extracting whatever content we’ve instructed it to extract.

For example, if we wanted to get all of the data from inside a table that was displayed on a web page, our code would be written to go through these steps in sequence:

Request the content (source code) of a specific URL from the server
Download the content that is returned
Identify the elements of the page that are part of the table we want
Extract and (if necessary) reformat those elements into a dataset we can analyze or use in whatever way we require.
If that all sounds very complicated, don’t worry! Python and Beautiful Soup have built-in features designed to make this relatively straightforward.

One thing that’s important to note: from a server’s perspective, requesting a page via web scraping is the same as loading it in a web browser. When we use code to submit these requests, we might be “loading” pages much faster than a regular user, and thus quickly eating up the website owner’s server resources.

#### Why Use Python for Web Scraping?

As previously mentioned, it’s possible to do web scraping with many programming languages.

However, one of the most popular approaches is to use Python and the Beautiful Soup library.

#### Is Web Scraping Legal?

Unfortunately, there’s not a cut-and-dry answer here. Some websites explicitly allow web scraping. Others explicitly forbid it. Many websites don’t offer any clear guidance one way or the other.

Before scraping any website, we should look for a terms and conditions page to see if there are explicit rules about scraping. If there are, we should follow them. If there are not, then it becomes more of a judgement call.

Remember, though, that web scraping consumes server resources for the host website. If we’re just scraping one page once, that isn’t going to cause a problem. But if our code is scraping 1,000 pages once every ten minutes, that could quickly get expensive for the website owner.

Thus, in addition to following any and all explicit rules about web scraping.

#### Web Scraping Best Practices:

Never scrape more frequently than you need to.
Consider caching the content you scrape so that it’s only downloaded once.
Build pauses into your code using functions like time.sleep() to keep from overwhelming servers with too many requests too quickly.


#### The Components of a Web Page

Before we start writing code, we need to understand a little bit about the structure of a web page. We’ll use the site’s structure to write code that gets us the data we want to scrape, so understanding that structure is an important first step for any web scraping project.
When we visit a web page, our web browser makes a request to a web server. This request is called a GET request, since we’re getting files from the server. The server then sends back files that tell our browser how to render the page for us. These files will typically include:

HTML — the main content of the page.
CSS — used to add styling to make the page look nicer.
JS — Javascript files add interactivity to web pages.
Images — image formats, such as JPG and PNG, allow web pages to show pictures.
After our browser receives all the files, it renders the page and displays it to us.

There’s a lot that happens behind the scenes to render a page nicely, but we don’t need to worry about most of it when we’re web scraping. When we perform web scraping, we’re interested in the main content of the web page, so we look primarily at the HTML.




In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and its numerical extension NumPy
import seaborn as sns
# sns will be used to visualize random distributions.
from bs4 import BeautifulSoup # pip install beautifulsoup4
# Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files
from selenium import webdriver# pip install selenium
# Selenium is an open-source tool that automates web browsers
%matplotlib inline


In [2]:
%pip install -q beautifulsoup4 selenium

Note: you may need to restart the kernel to use updated packages.


In [4]:
import requests
import re
import time

In [11]:
import random
from urllib.parse import urlencode

SESSION = requests.Session()
DEFAULT_HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache',
    'Connection': 'keep-alive',
}

PROXY = None  # set to 'http://user:pass@host:port' if needed


def fetch_url(url: str, max_retries: int = 3, base_delay: float = 1.0, per_attempt_timeout: float = 10.0) -> requests.Response:
    """Fetch URL with headers and exponential backoff; raises with context instead of hanging."""
    for attempt in range(1, max_retries + 1):
        try:
            response = SESSION.get(
                url,
                headers=DEFAULT_HEADERS,
                timeout=per_attempt_timeout,
                proxies={'http': PROXY, 'https': PROXY} if PROXY else None,
            )
            if response.status_code == 200:
                return response
            if response.status_code in {403, 429, 529, 503}:
                if attempt == max_retries:
                    raise RuntimeError(f"Blocked or rate-limited (status {response.status_code}) after {attempt} attempts")
                delay = base_delay * (2 ** (attempt - 1)) + random.uniform(0, 0.5)
                print(f"Attempt {attempt}: got {response.status_code}, retrying in {delay:.1f}s...")
                time.sleep(delay)
                continue
            response.raise_for_status()
        except requests.RequestException as e:
            if attempt == max_retries:
                raise RuntimeError(f"Request failed after {attempt} attempts: {e}")
            delay = base_delay * (2 ** (attempt - 1)) + random.uniform(0, 0.5)
            print(f"Attempt {attempt}: exception {type(e).__name__}, retrying in {delay:.1f}s...")
            time.sleep(delay)
            continue
    raise RuntimeError(f"Failed to fetch after {max_retries} retries: {url}")


In [12]:
URL='https://www.flipkart.com/search?q=mobile+less+than+50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&as-pos=1&as-type=HISTORY'

In [13]:
page = fetch_url(URL)

Attempt 1: exception ReadTimeout, retrying in 1.4s...
Attempt 2: exception ReadTimeout, retrying in 2.3s...


RuntimeError: Request failed after 3 attempts: HTTPSConnectionPool(host='www.flipkart.com', port=443): Read timed out. (read timeout=10.0)

In [7]:
page.status_code

529

In [None]:
pagecontent = page.text

In [None]:
soup = BeautifulSoup(pagecontent, 'html.parser')

In [None]:
soup.find_all('ul',attrs={'class':"_1xgFaf"})

[<ul class="_1xgFaf"><li class="rgWa7D">32 MB RAM | 32 MB ROM | Expandable Upto 32 GB</li><li class="rgWa7D">4.5 cm (1.77 inch) Display</li><li class="rgWa7D">0.8MP + 0.8MP</li><li class="rgWa7D">2800 mAh Battery</li><li class="rgWa7D">1 Year manufacturer warranty for device and 6 months manufacturer warranty for in-box accessories including batteries from the date of purchase</li></ul>,
 <ul class="_1xgFaf"><li class="rgWa7D">48 MB RAM | 128 MB ROM | Expandable Upto 128 GB</li><li class="rgWa7D">6.1 cm (2.4 inch) Display</li><li class="rgWa7D">0.8MP Rear Camera</li><li class="rgWa7D">1800 mAh Battery</li><li class="rgWa7D">1 Year warranty for device and 6 Months for box accessories</li></ul>,
 <ul class="_1xgFaf"><li class="rgWa7D">4 MB RAM | 2 GB ROM | Expandable Upto 8 GB</li><li class="rgWa7D">4.57 cm (1.8 inch) NA Display</li><li class="rgWa7D">1800 mAh Battery</li><li class="rgWa7D">One Year domestic warranty against manufacturing defects</li></ul>,
 <ul class="_1xgFaf"><li class

In [None]:
# contents attribute of a BeautifulSoup object is a list with all its children elements

In [None]:
soup.find_all('div',attrs={'class':'_1_WHN1'})

[<div class="_30jeq3 _1_WHN1">₹1,169</div>,
 <div class="_30jeq3 _1_WHN1">₹1,899</div>,
 <div class="_30jeq3 _1_WHN1">₹1,990</div>,
 <div class="_30jeq3 _1_WHN1">₹1,750</div>,
 <div class="_30jeq3 _1_WHN1">₹1,750</div>,
 <div class="_30jeq3 _1_WHN1">₹1,499</div>,
 <div class="_30jeq3 _1_WHN1">₹2,490</div>,
 <div class="_30jeq3 _1_WHN1">₹1,750</div>,
 <div class="_30jeq3 _1_WHN1">₹1,750</div>,
 <div class="_30jeq3 _1_WHN1">₹1,199</div>,
 <div class="_30jeq3 _1_WHN1">₹1,750</div>,
 <div class="_30jeq3 _1_WHN1">₹2,490</div>,
 <div class="_30jeq3 _1_WHN1">₹1,399</div>,
 <div class="_30jeq3 _1_WHN1">₹2,490</div>,
 <div class="_30jeq3 _1_WHN1">₹2,490</div>,
 <div class="_30jeq3 _1_WHN1">₹1,990</div>,
 <div class="_30jeq3 _1_WHN1">₹1,990</div>,
 <div class="_30jeq3 _1_WHN1">₹999</div>,
 <div class="_30jeq3 _1_WHN1">₹1,320</div>,
 <div class="_30jeq3 _1_WHN1">₹1,169</div>,
 <div class="_30jeq3 _1_WHN1">₹1,212</div>,
 <div class="_30jeq3 _1_WHN1">₹999</div>,
 <div class="_30jeq3 _1_WHN1">₹1,499

In [None]:
soup.find('div',attrs={'class':'_1_WHN1'})

<div class="_30jeq3 _1_WHN1">₹1,169</div>

In [None]:
soup.find_all('div',attrs={'class':'_4rR01T'})

[<div class="_4rR01T">IAIR Basic Feature Dual Sim Mobile Phone with 2800mAh Big Battery, 1.77 inch Display Screen, 0.8 mp Sm...</div>,
 <div class="_4rR01T">I Kall K88 PRO 4G Keypad Mobile</div>,
 <div class="_4rR01T">SAREGAMA Carvaan Mobile M11(CM181) with 1500 pre-loaded songs</div>,
 <div class="_4rR01T">SAREGAMA Carvaan Keypad Mobile Tamil Don M12 with 1000 Pre-Loaded Songs</div>,
 <div class="_4rR01T">SAREGAMA Carvaan Mobile Hindi Don M12 with 1000 Pre-Loaded Songs</div>,
 <div class="_4rR01T">I Kall K3310 Combo Of Two Mobile</div>,
 <div class="_4rR01T">SAREGAMA Carvaan Mobile Malayalam M21 with 1500 pre-loaded songs</div>,
 <div class="_4rR01T">SAREGAMA Carvaan Mobile Hindi Don M12 with 1000 Pre-Loaded Songs</div>,
 <div class="_4rR01T">SAREGAMA Carvaan Keypad Mobile Tamil Don M12 with 1000 Pre-Loaded Songs</div>,
 <div class="_4rR01T">IAIR Basic Feature Dual Sim Mobile Phone with 2800mAh Battery, 2.4 inch Display Screen, 0.8 mp Camera ...</div>,
 <div class="_4rR01T">SAREGAMA C

In [None]:
for x in soup.find_all('div',attrs={'class':'_3LWZlK'}):
    print(x.text)


4
3.5
3.8
4.3
3.6
4.3
3.8
4.3
4
4.1
4.1
4
4
3.7
3.7
3.9
3.8
3.8
3.8
3.6
4.1
1
1
3.8
1
1
3.9
5
5
3.8
5
4
3.8
3
3


In [None]:
soup.find_all('li',attrs={'class':'rgWa7D'})

[<li class="rgWa7D">32 MB RAM | 32 MB ROM | Expandable Upto 32 GB</li>,
 <li class="rgWa7D">4.5 cm (1.77 inch) Display</li>,
 <li class="rgWa7D">0.8MP + 0.8MP</li>,
 <li class="rgWa7D">2800 mAh Battery</li>,
 <li class="rgWa7D">1 Year manufacturer warranty for device and 6 months manufacturer warranty for in-box accessories including batteries from the date of purchase</li>,
 <li class="rgWa7D">48 MB RAM | 128 MB ROM | Expandable Upto 128 GB</li>,
 <li class="rgWa7D">6.1 cm (2.4 inch) Display</li>,
 <li class="rgWa7D">0.8MP Rear Camera</li>,
 <li class="rgWa7D">1800 mAh Battery</li>,
 <li class="rgWa7D">1 Year warranty for device and 6 Months for box accessories</li>,
 <li class="rgWa7D">4 MB RAM | 2 GB ROM | Expandable Upto 8 GB</li>,
 <li class="rgWa7D">4.57 cm (1.8 inch) NA Display</li>,
 <li class="rgWa7D">1800 mAh Battery</li>,
 <li class="rgWa7D">One Year domestic warranty against manufacturing defects</li>,
 <li class="rgWa7D">32 MB RAM | 1.3 GB ROM</li>,
 <li class="rgWa7D">4.5

In [None]:
soup.find_all('div',attrs={'class':'_3I9_wc _27UcVY'})

[<div class="_3I9_wc _27UcVY">₹<!-- -->1,499</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->2,599</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,799</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,999</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,499</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->2,099</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,499</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,649</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,299</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,999</div>,
 <div class="_3I9_wc _27UcVY">₹<!-- -->1,290</div>]

In [None]:
productname = []
price = []
rating = []
pagenum= []
features=[]
MRP_PRICE=[]


for i in range(1,14):
    start_time = time.time()
    URL = 'https://www.flipkart.com/search?q=mobile+less+than+50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&as-pos=1&as-type=HISTORY'.format(i).format(i)
    page = requests.get(URL)
    pagecontent = page.text
    soup = BeautifulSoup(pagecontent, 'html.parser')

    for x in soup.find_all('div',attrs={'class':'_3pLy-c row'}):
        pname = x.find('div',attrs={'class':'_4rR01T'})
        cost = x.find('div',attrs={'class':'_1_WHN1'})
        rat = x.find('div',attrs={'class':'_3LWZlK'})
        feat=x.find('ul',attrs={'class':'_1xgFaf'})
        MRP=x.find('div',attrs={'class':'_3I9_wc _27UcVY'})

        if pname is None:
            productname.append(np.NaN)
        else:
            productname.append(pname.text)

        if cost is None:
            price.append(np.NaN)
        else:
            price.append(cost.text)

        if rat is None:
            rating.append(np.NaN)
        else:
            rating.append(rat.text)

        if feat is None:
            features.append(np.NaN)
        else:
            features.append(feat.text)

        if MRP is None:
           MRP_PRICE.append(np.NaN)
        else:
            MRP_PRICE.append(MRP.text)




            pagenum.append(i)

    print('Page {} completed in {} seconds'.format(i, time.time() - start_time))


Page 1 completed in 0.5420999526977539 seconds
Page 2 completed in 0.811453104019165 seconds
Page 3 completed in 0.969210147857666 seconds
Page 4 completed in 0.8156700134277344 seconds
Page 5 completed in 0.5351982116699219 seconds
Page 6 completed in 0.7650818824768066 seconds
Page 7 completed in 0.7476880550384521 seconds
Page 8 completed in 1.1049418449401855 seconds
Page 9 completed in 0.40053677558898926 seconds
Page 10 completed in 0.36174726486206055 seconds
Page 11 completed in 0.6494948863983154 seconds
Page 12 completed in 0.7566580772399902 seconds
Page 13 completed in 0.8981707096099854 seconds


In [None]:
len(productname)

312

In [None]:
a={'ProductName':productname, 'Price' :price, 'Rating':rating,'Features':features,'MRP_PRICE':MRP_PRICE}
mobile_df=pd.DataFrame.from_dict(a, orient='index')
mobile=mobile_df.transpose()

In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499"
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599"
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,


In [None]:
# Ensure string types to avoid regex errors on NaN
for col in ['ProductName', 'Price', 'Rating', 'Features', 'MRP_PRICE']:
    if col in mobile.columns:
        mobile[col] = mobile[col].fillna('').astype(str)


In [None]:
# battery

regex = '[0-9\s]+ mAh'
mobile['Battery'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))


In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh]


In [None]:
mobile['Features'][0]

'32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4.5 cm (1.77 inch) Display0.8MP + 0.8MP2800 mAh Battery1 Year manufacturer warranty for device and 6 months manufacturer warranty for in-box accessories including batteries from the date of purchase'

In [None]:
#display

regex = '[0-9.\s]+(?:cm|inch)'
mobile['Display'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))


In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]"
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]"
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]"
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]"
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]"


In [None]:
#RAM

regex = '(^\d GB)'
mobile['RAM(GB)'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))

In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB)
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[]


In [None]:
#RoM

regex = '(\d{2,} GB )'
mobile['R0M(GB)'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))


In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB)
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[],[]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[],[]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[],[]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[]


In [None]:
#processor details

regex = '( \D\d\d\s+Processor| \d\d\d\s+Processor)'

mobile['Processor'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))

In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB),Processor
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[],[],[]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[],[],[]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[],[],[]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[]


In [None]:
mobile['Features'][0]


'32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4.5 cm (1.77 inch) Display0.8MP + 0.8MP2800 mAh Battery1 Year manufacturer warranty for device and 6 months manufacturer warranty for in-box accessories including batteries from the date of purchase'

In [None]:
# waranty for mobile

regex = '(\d\s(?:Year))'
mobile['Mobile Waranty'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))

In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB),Processor,Mobile Waranty
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[],[],[],[1 Year]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[],[],[],[1 Year]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[]


In [None]:
# waranty for Accessories

regex = '(\d\s(?:Months))'
mobile['Accessories'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))

In [None]:
mobile['Features'][3]

'32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Display800 mAh Battery1 year warranty against manufacturing defects'

In [None]:
# Expandable

regex = 'Upto\s(\d\d\d\s)'
mobile['Expandable'] = mobile['Features'].apply(lambda x: re.compile(regex).findall(x))

In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB),Processor,Mobile Waranty,Accessories,Expandable
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[],[],[],[1 Year],[],[]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[],[],[],[1 Year],[6 Months],[128 ]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[]


In [None]:
mobile['ProductName'][1]

'I Kall K88 PRO 4G Keypad Mobile'

In [None]:
#Name

regex ='^\w+'
mobile['Name'] = mobile['ProductName'].apply(lambda x: re.compile(regex).findall(x))

In [None]:
mobile.head()

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB),Processor,Mobile Waranty,Accessories,Expandable,Name
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[],[],[],[1 Year],[],[],[IAIR]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[],[],[],[1 Year],[6 Months],[128 ],[I]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]


In [None]:
mobile['ProductName'][0]

'IAIR Basic Feature Dual Sim Mobile Phone with 2800mAh Big Battery, 1.77 inch Display Screen, 0.8 mp Sm...'

In [None]:
mobile.head(2)

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB),Processor,Mobile Waranty,Accessories,Expandable,Name
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[],[],[],[1 Year],[],[],[IAIR]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[],[],[],[1 Year],[6 Months],[128 ],[I]


In [None]:
df=mobile.head(312)
df

Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB),Processor,Mobile Waranty,Accessories,Expandable,Name
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,"₹1,169",4,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,"₹1,499",[2800 mAh],"[4.5 cm, 1.77 inch]",[],[],[],[1 Year],[],[],[IAIR]
1,I Kall K88 PRO 4G Keypad Mobile,"₹1,899",3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,"₹2,599",[1800 mAh],"[6.1 cm, 2.4 inch]",[],[],[],[1 Year],[6 Months],[128 ],[I]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,"₹1,990",3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,"₹1,750",,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,"₹1,750",4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
307,"realme X50 Pro (Moss Green, 256 GB)","₹47,999",4.4,12 GB RAM | 256 GB ROM16.36 cm (6.44 inch) Ful...,,[4200 mAh],"[16.36 cm, 6.44 inch]",[],"[12 GB , 256 GB ]",[ 865 Processor],[1 Year],[6 Months],[],[realme]
308,"APPLE iPhone 11 (Purple, 128 GB)","₹45,999",4.6,128 GB ROM15.49 cm (6.1 inch) Liquid Retina HD...,"₹48,900",[],"[15.49 cm, 6.1 inch]",[],[128 GB ],[],[1 Year],[],[],[APPLE]
309,"Xiaomi 12 Pro 5G (Noir Black, 256 GB)","₹49,999",4.2,8 GB RAM | 256 GB ROM17.09 cm (6.73 inch) Full...,"₹79,999",[4600 mAh],"[17.09 cm, 6.73 inch]",[8 GB],[256 GB ],[],[],[2 Months],[],[Xiaomi]
310,"IQOO 9T 5G (LEGEND, 128 GB)","₹49,499",4.1,8 GB RAM | 128 GB ROM17.22 cm (6.78 inch) Disp...,"₹50,999",[4700 mAh],"[17.22 cm, 6.78 inch]",[8 GB],[128 GB ],[],[1 Year],[],[],[IQOO]


In [None]:
df['Price'] = df['Price'].str.lstrip('₹')
df['Price']=df['Price'].replace({',':''},regex=True)
df['MRP_PRICE'] = df['MRP_PRICE'].str.lstrip('₹')
df['MRP_PRICE']=df['MRP_PRICE'].replace({',':''},regex=True)
df.head()


Unnamed: 0,ProductName,Price,Rating,Features,MRP_PRICE,Battery,Display,RAM(GB),R0M(GB),Processor,Mobile Waranty,Accessories,Expandable,Name
0,IAIR Basic Feature Dual Sim Mobile Phone with ...,1169,4.0,32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4...,1499.0,[2800 mAh],"[4.5 cm, 1.77 inch]",[],[],[],[1 Year],[],[],[IAIR]
1,I Kall K88 PRO 4G Keypad Mobile,1899,3.5,48 MB RAM | 128 MB ROM | Expandable Upto 128 G...,2599.0,[1800 mAh],"[6.1 cm, 2.4 inch]",[],[],[],[1 Year],[6 Months],[128 ],[I]
2,SAREGAMA Carvaan Mobile M11(CM181) with 1500 p...,1990,3.8,4 MB RAM | 2 GB ROM | Expandable Upto 8 GB4.57...,,[1800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]
3,SAREGAMA Carvaan Keypad Mobile Tamil Don M12 w...,1750,,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]
4,SAREGAMA Carvaan Mobile Hindi Don M12 with 100...,1750,4.3,32 MB RAM | 1.3 GB ROM4.57 cm (1.8 inch) Displ...,,[800 mAh],"[4.57 cm, 1.8 inch]",[],[],[],[],[],[],[SAREGAMA]


In [None]:
df.columns

Index(['ProductName', 'Price', 'Rating', 'Features', 'MRP_PRICE', 'Battery',
       'Display', 'RAM(GB)', 'R0M(GB)', 'Processor', 'Mobile Waranty',
       'Accessories', 'Expandable', 'Name'],
      dtype='object')

In [None]:
#RAM
r=re.compile(r"(\d+)\sGB\sRAM")
r.findall(df['Features'][0])

[]

In [None]:
df['RAM(GB)']=df['Features'].apply(lambda x:r.findall(x))

In [None]:
#ROM
r=re.compile(r"(\d+)\sGB\sROM")
r.findall(df['Features'][1])

[]

In [None]:
df['R0M(GB)']=df['Features'].apply(lambda x:r.findall(x))

In [None]:
#Expandable
r=re.compile(r" Upto (\d+)\sGB")
r.findall(df['Features'][0])

['32']

In [None]:
df['Expandable']=df['Features'].apply(lambda x:r.findall(x))

In [None]:
#display
r=re.compile(r"(\d+\.\d+)\scm")
r.findall(df['Features'][0])

['4.5']

In [None]:
df['Display']=df['Features'].apply(lambda x:r.findall(x))

In [None]:
#Battery
r=re.compile(r"(\d+)\smAh")
r.findall(df['Features'][0])

['2800']

In [None]:
df['Battery']=df['Features'].apply(lambda x:r.findall(x))

In [None]:
#Accessories
r=re.compile(r"(\d+) Months")
r.findall(df['Features'][0])

[]

In [None]:
df['Accessories']=df['Features'].apply(lambda x:r.findall(x))

In [None]:
df['Features'][0]

'32 MB RAM | 32 MB ROM | Expandable Upto 32 GB4.5 cm (1.77 inch) Display0.8MP + 0.8MP2800 mAh Battery1 Year manufacturer warranty for device and 6 months manufacturer warranty for in-box accessories including batteries from the date of purchase'

In [None]:
#Mobile Waranty
r=re.compile(r"Processor(\D+|\d+) Year")
r.findall(df['Features'][1])

[]

In [None]:
df['Mobile Waranty']=df['Features'].apply(lambda x:r.findall(x))

In [None]:
df['ProductName'] = df['ProductName'].apply(lambda x : ''.join(x))
#df['Rating'] = df['Rating'].apply(lambda x : ''.join(x))
df['Features'] = df['Features'].apply(lambda x : ''.join(x))
df['Name'] = df['Name'].apply(lambda x : ''.join(x))
df['Battery'] = df['Battery'].apply(lambda x : ''.join(x))
df['Display'] = df['Display'].apply(lambda x : ''.join(x))
df['RAM(GB)'] = df['RAM(GB)'].apply(lambda x : ''.join(x))
df['R0M(GB)']=df['R0M(GB)'].apply(lambda x:''.join(x))
df['Processor'] = df['Processor'].apply(lambda x : ''.join(x))
df['Mobile Waranty'] = df['Mobile Waranty'].apply(lambda x : ''.join(x))
df['Accessories'] = df['Accessories'].apply(lambda x : ''.join(x))
df['Expandable'] = df['Expandable'].apply(lambda x : ''.join(x))


In [None]:
df.to_csv("/Users/vignesh/Documents/vignesh.csv")

In [None]:
# Example: Scrape quotes.toscrape.com (site made for scraping)
example_url = "https://quotes.toscrape.com/"
resp = requests.get(example_url, timeout=10)
resp.raise_for_status()
example_soup = BeautifulSoup(resp.text, 'html.parser')

quotes = []
authors = []
tags_list = []

for q in example_soup.select('div.quote'):
    text_el = q.select_one('span.text')
    author_el = q.select_one('small.author')
    tag_els = q.select('div.tags a.tag')
    quotes.append(text_el.get_text(strip=True) if text_el else '')
    authors.append(author_el.get_text(strip=True) if author_el else '')
    tags_list.append([t.get_text(strip=True) for t in tag_els])

example_df = pd.DataFrame({
    'quote': quotes,
    'author': authors,
    'tags': tags_list,
})
example_df.head()
