<h1><center>Data Analysis Web Scraping</center></h1>

### Introduction to Web Scraping

Consider the following scenario: you need to pull an enormous volume of data from websites as rapidly as workable. How would you do it if you didn’t go to each website and manually collect the data? Well, the answer is “web scraping”. Scraping the web makes this process a lot easier and faster.

![1.svg](attachment:1.svg)

### What is Web Scraping?

Web scraping is a computer software technique of extracting information from websites. This technique mostly focuses on the transformation of unstructured data (HTML format) on the web into structured data (database or spreadsheet).

We can perform web scraping in various ways, including use of Google Docs to almost every programming language. 


### Now the question is how do we get data from websites?

When any web scraping code is run the request is sent to the URL that you have mentioned. The website responds to the request by sending data and allows it to read the XML or HTML page. The code will then extract the required data from that XML or HTML page.

Any web scraping code you need to follow the basic steps:

1. Find the URL(address) of web page you want to scrape
2. Inspect the page and find the data you want to extract
3. Write the logic for extracting the data
4. Store extracted data into structured form(E.g. Pandas DataFrame)

Now we’ll perform web scraping with various libraries and framework.

### Step-by-Step process to Scrape Data From A Website:

Web scraping is gaining data from web pages using HTML parsing. Something data is available in CSV or JSON format from some websites, but this is not always the case, causing the use of web scraping.

When you run the web scraping code, it sends a request to the URL you specified. The server provides the data in response to your request, allowing you to see the HTML or XML page. The code then parses the HTML or XML page, locating and extracting the data.

### Why is python used for web scraping?

1) Python includes many libraries, such as Numpy, Matplotlib, Pandas, and others, that provide methods and functions for a variety of uses. As a result, it’s suitable for web crawling and additional data manipulation.

2) Python is an easy language to program in. There are no semi-colons “;” or curly-braces “{}” required anywhere. So it is easier to use and less noisy.

3) Dynamically typed: You don’t have to define data types for variables in Python; you can just use them wherever they’re needed. This saves you time and speeds up your work.

4) Small code, long process: Web scraping is a technique for saving time. But what good is it if you waste more time writing code? You don’t have to, though. We can write small codes in Python to accomplish large tasks. As a result, even while writing the code, you save time.

5) Python syntax is simple to learn because reading Python code is quite understandable compared to reading a statement in English. Python’s indentation helps the user distinguish between distinct scopes/blocks in the code, making it expressive and easy to understand.

### Libraries used for Web Scraping in Python

You’ll come across multiple libraries and frameworks in Python for web scraping. 

**Here are three popular Libraries**

##### BeautifulSoup

- BeautifulSoup is an amazing parsing library in Python that enables the web scraping from HTML and XML documents.

- BeautifulSoup automatically detects encodings and gracefully handles HTML documents even with special characters. We can navigate a parsed document and find what we need which makes it quick and painless to extract the data from the webpages.

##### Scrapy

- Scrapy is a Python framework for large scale web scraping. It gives you all the tools you need to efficiently extract data from websites, process them as you want, and store them in your preferred structure and format.

##### Selenium

- Selenium is another popular tool for automating browsers. It’s primarily used for testing in the industry but is also very handy for web scraping. 

# Implementation using BeautifulSoup

**Install BeautifulSoup:** pip install beautifulsoup4

In [1]:
import requests  # Importing the requests library for making HTTP requests
import pandas as pd  # Importing the pandas library for data manipulation
from bs4 import BeautifulSoup  # Importing BeautifulSoup for web scraping

- Get URL from Flipkart based upon your product

In [2]:
url = "https://www.flipkart.com/search?q=mi%20brand%20phone&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off"
url

'https://www.flipkart.com/search?q=mi%20brand%20phone&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'

In [3]:
# sends an HTTP GET request to the specified URL and assigns the response object to the variable req
req = requests.get(url)

In [4]:
# parses the HTML content of the response obtained from the req object using BeautifulSoup library and assigns it to the variable content
content = BeautifulSoup(req.content, 'html.parser')
content

<!DOCTYPE html>
<html lang="en"><head><link href="https://rukminim2.flixcart.com" rel="preconnect"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app_modules.chunk.09b0e9.css" rel="stylesheet"/><link href="//static-assets-web.flixcart.com/fk-p-linchpin-web/fk-cp-zion/css/app.chunk.e82689.css" rel="stylesheet"/><meta content="text/html; charset=utf-8" http-equiv="Content-type"/><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><meta content="102988293558" property="fb:page_id"/><meta content="658873552,624500995,100000233612389" property="fb:admins"/><link href="https:///www/promos/new/20150528-140547-favicon-retina.ico" rel="shortcut icon"/><link href="/osdd.xml?v=2" rel="search" type="application/opensearchdescription+xml"/><meta content="website" property="og:type"/><meta content="Flipkart.com" name="og_site_name" property="og:site_name"/><link href="/apple-touch-icon-57x57.png" rel="apple-touch-icon" sizes="57x57"/><link href="/apple-touch-icon-72

In [5]:
# finds all <div> elements with the class _2kHMtA within the parsed HTML content and assigns them to the variable data
data = content.find_all('div', {'class': '_2kHMtA'})
data

[<div class="_2kHMtA"><a class="_1fQZEK" href="/mi-11-lite-vinyl-black-128-gb/p/itmac6203bae9394?pid=MOBG3VSKHGETEBVM&amp;lid=LSTMOBG3VSKHGETEBVM0TO8SZ&amp;marketplace=FLIPKART&amp;q=mi+brand+phone&amp;store=tyy%2F4io&amp;srno=s_1_1&amp;otracker=search&amp;otracker1=search&amp;fm=organic&amp;iid=bf3fb93f-d735-43d3-b741-cf21d6caa08a.MOBG3VSKHGETEBVM.SEARCH&amp;ppt=None&amp;ppn=None&amp;ssid=9za61uo1hs0000001711295490538&amp;qH=4aa7f10791b88761" rel="noopener noreferrer" target="_blank"><div class="MIXNux"><div class="_2QcLo-"><div><div class="CXW8mj" style="height:200px;width:200px"><img alt="Mi 11 Lite (Vinyl Black, 128 GB)" class="_396cs4" loading="eager" src="https://rukminim2.flixcart.com/image/312/312/kq6yefk0/mobile/b/2/f/11-lite-m2101k9ai-mi-original-imag496egxryygvz.jpeg?q=70"/></div></div></div><div class="_3wLduG"><div class="_3PzNI-"><span class="f3A4_V"><label class="_2iDkf8"><input class="_30VH1S" readonly="" type="checkbox"/><div class="_24_Dny"></div></label></span><label

In [6]:
# finds all <div> elements with the class _2kHMtA within the parsed HTML content and assigns them to the variable dat
data = content.find_all('div', {'class': '_2kHMtA'})
data[0]

<div class="_2kHMtA"><a class="_1fQZEK" href="/mi-11-lite-vinyl-black-128-gb/p/itmac6203bae9394?pid=MOBG3VSKHGETEBVM&amp;lid=LSTMOBG3VSKHGETEBVM0TO8SZ&amp;marketplace=FLIPKART&amp;q=mi+brand+phone&amp;store=tyy%2F4io&amp;srno=s_1_1&amp;otracker=search&amp;otracker1=search&amp;fm=organic&amp;iid=bf3fb93f-d735-43d3-b741-cf21d6caa08a.MOBG3VSKHGETEBVM.SEARCH&amp;ppt=None&amp;ppn=None&amp;ssid=9za61uo1hs0000001711295490538&amp;qH=4aa7f10791b88761" rel="noopener noreferrer" target="_blank"><div class="MIXNux"><div class="_2QcLo-"><div><div class="CXW8mj" style="height:200px;width:200px"><img alt="Mi 11 Lite (Vinyl Black, 128 GB)" class="_396cs4" loading="eager" src="https://rukminim2.flixcart.com/image/312/312/kq6yefk0/mobile/b/2/f/11-lite-m2101k9ai-mi-original-imag496egxryygvz.jpeg?q=70"/></div></div></div><div class="_3wLduG"><div class="_3PzNI-"><span class="f3A4_V"><label class="_2iDkf8"><input class="_30VH1S" readonly="" type="checkbox"/><div class="_24_Dny"></div></label></span><label 

In [7]:
# Initialize empty lists to store links and phone names
links = []
phone_name = []

# Base URL for constructing complete links
start_link = "https://www.flipkart.com"

# Iterate through each item in the 'data' list
for items in data:
    #  Extract the relative link for each item
    rest_link = items.find('a')['href']
    
    #     # Find the name of the phone
    name = items.find('div', attrs = {'class': '_4rR01T'})
    
    # Append the text of the 'name' element to the 'phone_name' list
    phone_name.append(name.text)
    
    # Construct the complete URL by combining the base URL and the relative link, then append it to the 'links' list
    links.append(start_link + rest_link)

In [8]:
print(phone_name[0])  # Print the first phone name
print(links[0]) # Print the corresponding link

Mi 11 Lite (Vinyl Black, 128 GB)
https://www.flipkart.com/mi-11-lite-vinyl-black-128-gb/p/itmac6203bae9394?pid=MOBG3VSKHGETEBVM&lid=LSTMOBG3VSKHGETEBVM0TO8SZ&marketplace=FLIPKART&q=mi+brand+phone&store=tyy%2F4io&srno=s_1_1&otracker=search&otracker1=search&fm=organic&iid=bf3fb93f-d735-43d3-b741-cf21d6caa08a.MOBG3VSKHGETEBVM.SEARCH&ppt=None&ppn=None&ssid=9za61uo1hs0000001711295490538&qH=4aa7f10791b88761


In [9]:
# Create a dictionary containing the phone names and links
dataframe = {"Phone_names": phone_name, 
             "Links": links}

# Create a DataFrame from the dictionary
final_dataframe = pd.DataFrame(dataframe)

# Print the DataFrame
print(final_dataframe)

                               Phone_names  \
0         Mi 11 Lite (Vinyl Black, 128 GB)   
1           Mi 11 Lite (Jazz Blue, 128 GB)   
2         Mi 11 Lite (Vinyl Black, 128 GB)   
3      Xiaomi 11i 5G (Purple Mist, 128 GB)   
4        Mi 10T Pro (Cosmic Black, 128 GB)   
5    Xiaomi 11i 5G (Stealth Black, 128 GB)   
6             Mi 11X (Lunar White, 128 GB)   
7                  Redmi 6A (Black, 16 GB)   
8       Xiaomi 11i 5G (Camo Green, 128 GB)   
9               Redmi 9A (Sea Blue, 32 GB)   
10         Redmi Note 4 (Dark Grey, 32 GB)   
11                               Peace Mi2   
12                               Peace Mi4   
13             Redmi 6A (Rose Gold, 32 GB)   
14              Redmi Note 4 (Gold, 32 GB)   
15                  Redmi 6A (Gold, 16 GB)   
16           Mi 10T (Cosmic Black, 128 GB)   
17        Redmi 9A (Midnight Black, 32 GB)   
18          Redmi 9A (Nature Green, 32 GB)   
19          Xiaomi 14 (Jade Green, 512 GB)   
20          Redmi 9A (Nature Green

In [10]:
# Save the DataFrame 'final_dataframe' to a CSV file named "data.csv"
final_dataframe.to_csv("dataset\data.csv")

### Get Product Image URL

In [11]:
# Initialize empty lists to store links, phone names, and image URLs
links = []
phone_name = []
image_urls = []

# Base URL for constructing complete links
start_link = "https://www.flipkart.com"

# Iterate through each item in the 'data' list
for items in data:
    # Extract the relative link for each item
    rest_link = items.find('a')['href']
    
    # Find the name of the phone
    name = items.find('div', attrs = {'class': '_4rR01T'})
    
    # Append the extracted name to the 'phone_name' list
    phone_name.append(name.text)
    
    # Append the complete URL to the 'links' list
    links.append(start_link + rest_link)
    
    # Find the image tag for each item
    image_tag = items.find('img', class_='_396cs4')
    
    # Check if image tag exists
    if image_tag:
        
        # Extract the source URL of the image
        image_url = image_tag['src']
        
        # Append the image URL to the 'image_urls' list
        image_urls.append(image_url)
    else:
        # If image tag doesn't exist, append None to indicate no image
        image_urls.append(None)

In [12]:
print(phone_name[0]) # Print the first phone name
print(links[0]) # Print the corresponding link
print(image_urls[2]) # Print the image URL for the third item

Mi 11 Lite (Vinyl Black, 128 GB)
https://www.flipkart.com/mi-11-lite-vinyl-black-128-gb/p/itmac6203bae9394?pid=MOBG3VSKHGETEBVM&lid=LSTMOBG3VSKHGETEBVM0TO8SZ&marketplace=FLIPKART&q=mi+brand+phone&store=tyy%2F4io&srno=s_1_1&otracker=search&otracker1=search&fm=organic&iid=bf3fb93f-d735-43d3-b741-cf21d6caa08a.MOBG3VSKHGETEBVM.SEARCH&ppt=None&ppn=None&ssid=9za61uo1hs0000001711295490538&qH=4aa7f10791b88761
https://rukminim2.flixcart.com/image/312/312/kq6yefk0/mobile/b/2/f/11-lite-m2101k9ai-mi-original-imag496egxryygvz.jpeg?q=70


In [13]:
# Create a dictionary containing the phone names and links
dataframe = {"Phone_names": phone_name, 
             "Links": links, 
             "Imaages": image_urls}

# Create a DataFrame from the dictionary
final_dataframe = pd.DataFrame(dataframe)

# Print the DataFrame
print(final_dataframe)

                               Phone_names  \
0         Mi 11 Lite (Vinyl Black, 128 GB)   
1           Mi 11 Lite (Jazz Blue, 128 GB)   
2         Mi 11 Lite (Vinyl Black, 128 GB)   
3      Xiaomi 11i 5G (Purple Mist, 128 GB)   
4        Mi 10T Pro (Cosmic Black, 128 GB)   
5    Xiaomi 11i 5G (Stealth Black, 128 GB)   
6             Mi 11X (Lunar White, 128 GB)   
7                  Redmi 6A (Black, 16 GB)   
8       Xiaomi 11i 5G (Camo Green, 128 GB)   
9               Redmi 9A (Sea Blue, 32 GB)   
10         Redmi Note 4 (Dark Grey, 32 GB)   
11                               Peace Mi2   
12                               Peace Mi4   
13             Redmi 6A (Rose Gold, 32 GB)   
14              Redmi Note 4 (Gold, 32 GB)   
15                  Redmi 6A (Gold, 16 GB)   
16           Mi 10T (Cosmic Black, 128 GB)   
17        Redmi 9A (Midnight Black, 32 GB)   
18          Redmi 9A (Nature Green, 32 GB)   
19          Xiaomi 14 (Jade Green, 512 GB)   
20          Redmi 9A (Nature Green

In [14]:
# Save the DataFrame 'final_dataframe' to a CSV file named "data_image.csv"
final_dataframe.to_csv("dataset\data_image.csv")

### Get Product Discount Price

In [15]:
# Initialize empty lists to store links, phone names, image URLs, selling prices
links = []
phone_name = []
image_urls = []
prices = []

# Base URL for constructing complete links
start_link = "https://www.flipkart.com"

# Iterate through each item in the 'data' list
for items in data:
    # Extract the relative link for each item
    rest_link = items.find('a')['href']
    
    # Find the name of the phone
    name = items.find('div', attrs = {'class': '_4rR01T'})
    
    # Append the extracted name to the 'phone_name' list
    phone_name.append(name.text)
    
    # Combine the base URL and the relative link to form the complete URL
    links.append(start_link + rest_link)
    
    # Find the image tag for each item
    image_tag = items.find('img', class_='_396cs4')
    
    # Check if image tag exists
    if image_tag:
        # Extract the source URL of the image
        image_url = image_tag['src']
        
        # Append the image URL to the 'image_urls' list
        image_urls.append(image_url)
    else:
        # If image tag doesn't exist, append None to indicate no image
        image_urls.append(None)
        
    # Extracting price
    price_tag = items.find('div', class_='_30jeq3 _1_WHN1')
    if price_tag:
        # Extract the price text and strip any leading/trailing whitespaces
        price = price_tag.text.strip()
        
        # Append the price to the 'prices' list
        prices.append(price)
    else:
        # If price tag doesn't exist, append None or any placeholder value
        prices.append(None)

In [16]:
print(phone_name[0])     # Print the phone name of the first item
print(links[0])          # Print the corresponding link of the first item
print(image_urls[0])     # Print the image URL of the first item
print(prices[0])         # Print the price of the first item

Mi 11 Lite (Vinyl Black, 128 GB)
https://www.flipkart.com/mi-11-lite-vinyl-black-128-gb/p/itmac6203bae9394?pid=MOBG3VSKHGETEBVM&lid=LSTMOBG3VSKHGETEBVM0TO8SZ&marketplace=FLIPKART&q=mi+brand+phone&store=tyy%2F4io&srno=s_1_1&otracker=search&otracker1=search&fm=organic&iid=bf3fb93f-d735-43d3-b741-cf21d6caa08a.MOBG3VSKHGETEBVM.SEARCH&ppt=None&ppn=None&ssid=9za61uo1hs0000001711295490538&qH=4aa7f10791b88761
https://rukminim2.flixcart.com/image/312/312/kq6yefk0/mobile/b/2/f/11-lite-m2101k9ai-mi-original-imag496egxryygvz.jpeg?q=70
₹13,990


In [17]:
print(image_urls[2])  # Print the image URL of the third item

https://rukminim2.flixcart.com/image/312/312/kq6yefk0/mobile/b/2/f/11-lite-m2101k9ai-mi-original-imag496egxryygvz.jpeg?q=70


In [18]:
# Create a dictionary containing phone names, links, image URLs, and prices
dataframe = {"Phone_names": phone_name, 
             "Links": links, 
             "Images": image_urls, 
             "Prices": prices}

# Create a DataFrame from the dictionary
final_dataframe = pd.DataFrame(dataframe)

In [19]:
final_dataframe.to_csv("dataset\data_price.csv")

### Get Product Original Price

In [20]:
# Initialize empty lists to store links, phone names, image URLs, selling prices, and original prices
links = []
phone_name = []
image_urls = []
selling_prices = []
original_price = []

# Base URL for constructing complete links
start_link = "https://www.flipkart.com"

# Iterate through each item in the 'data' list
for items in data:
    # Extract the relative link for each item
    rest_link = items.find('a')['href']
    
    # Find the name of the phone
    name = items.find('div', attrs = {'class': '_4rR01T'})
    
    # Append the extracted name to the 'phone_name' list
    phone_name.append(name.text)
    
    # Combine the base URL and the relative link to form the complete URL
    links.append(start_link + rest_link)
    
    # Find the image tag for each item
    image_tag = items.find('img', class_='_396cs4')
    
    # Check if image tag exists
    if image_tag:
        
        # Extract the source URL of the image
        image_url = image_tag['src']
        
        # Append the image URL to the 'image_urls' list
        image_urls.append(image_url)
    else:
        # If image tag doesn't exist, append None to indicate no image
        image_urls.append(None)
        
    # Extracting selling price
    price_tag = items.find('div', class_='_30jeq3 _1_WHN1')
    if price_tag:
        price = price_tag.text.strip()
        selling_prices.append(price)
    else:
        selling_prices.append(None)
        
    # Extracting original price
    price_tag = items.find('div', class_='_3I9_wc _27UcVY')
    if price_tag:
        price = price_tag.text.strip()
        original_price.append(price)
    else:
        original_price.append(None)  # or any placeholder value if price is not found

In [21]:
print(phone_name[0]) # Print the first phone name
print(links[0]) # Print the corresponding link
print(image_urls[0]) # Print the image URL for the first item
print(selling_prices[0]) # Print the selling price for the first item
print(original_price[0]) # Print the original price for the first item

Mi 11 Lite (Vinyl Black, 128 GB)
https://www.flipkart.com/mi-11-lite-vinyl-black-128-gb/p/itmac6203bae9394?pid=MOBG3VSKHGETEBVM&lid=LSTMOBG3VSKHGETEBVM0TO8SZ&marketplace=FLIPKART&q=mi+brand+phone&store=tyy%2F4io&srno=s_1_1&otracker=search&otracker1=search&fm=organic&iid=bf3fb93f-d735-43d3-b741-cf21d6caa08a.MOBG3VSKHGETEBVM.SEARCH&ppt=None&ppn=None&ssid=9za61uo1hs0000001711295490538&qH=4aa7f10791b88761
https://rukminim2.flixcart.com/image/312/312/kq6yefk0/mobile/b/2/f/11-lite-m2101k9ai-mi-original-imag496egxryygvz.jpeg?q=70
₹13,990
₹24,999


In [22]:
# Create a dictionary containing phone names, links, images, selling prices, and original prices
dataframe = {"Phone_names": phone_name, 
             "Links": links, 
             "Images": image_urls, 
             "Selling Prices": selling_prices, 
             "Original Prie": original_price}

# Create a DataFrame from the dictionary
final_dataframe = pd.DataFrame(dataframe)

In [23]:
# Save the DataFrame 'final_dataframe' to a CSV file named "data_price_sales.csv"
final_dataframe.to_csv("dataset\data_sales_original_price.csv")

### Get Product Review

In [24]:
# Initialize empty lists to store links, phone names, image URLs, selling prices, original prices, and reviews
links = []
phone_name = []
image_urls = []
selling_prices = []
original_price = []
reviews = []

# Base URL for constructing complete links
start_link = "https://www.flipkart.com"

# Iterate through each item in the 'data' list
for items in data:
    # Extract the relative link for each item
    rest_link = items.find('a')['href']
    
    # Find the name of the phone
    name = items.find('div', attrs = {'class': '_4rR01T'})
    
    # Append the extracted name to the 'phone_name' list
    phone_name.append(name.text)
    
    # Combine the base URL and the relative link to form the complete URL
    links.append(start_link + rest_link)
    
    # Find the image tag for each item
    image_tag = items.find('img', class_='_396cs4')
    
    # Check if image tag exists
    if image_tag:
        # Extract the source URL of the image
        image_url = image_tag['src']
        # Append the image URL to the 'image_urls' list
        image_urls.append(image_url)
    else:
        # If image tag doesn't exist, append None to indicate no image
        image_urls.append(None)
        
    # Extracting selling price
    price_tag = items.find('div', class_='_30jeq3 _1_WHN1')
    if price_tag:
        price = price_tag.text.strip()
        selling_prices.append(price)
    else:
        selling_prices.append(None)
        
    # Extracting original price
    price_tag = items.find('div', class_='_3I9_wc _27UcVY')
    if price_tag:
        price = price_tag.text.strip()
        original_price.append(price)
    else:
        original_price.append(None)
        
    # Extracting reviews
    review_tag = items.find('span', class_='_1lRcqv')
    if review_tag:
        review = review_tag.text.strip()
        reviews.append(review)
    else:
        reviews.append(None)

In [25]:
print(phone_name[5])         # Print the phone name for the sixth item
print(links[5])              # Print the corresponding link for the sixth item
print(image_urls[5])         # Print the image URL for the sixth item
print(selling_prices[5])     # Print the selling price for the sixth item
print(original_price[5])     # Print the original price for the sixth item
print(reviews[5])            # Print the reviews for the sixth item

Xiaomi 11i 5G (Stealth Black, 128 GB)
https://www.flipkart.com/xiaomi-11i-5g-stealth-black-128-gb/p/itmf5300d828d19f?pid=MOBG9QXPQ2F8KGQD&lid=LSTMOBG9QXPQ2F8KGQDBEHA2F&marketplace=FLIPKART&q=mi+brand+phone&store=tyy%2F4io&srno=s_1_6&otracker=search&otracker1=search&fm=organic&iid=bf3fb93f-d735-43d3-b741-cf21d6caa08a.MOBG9QXPQ2F8KGQD.SEARCH&ppt=None&ppn=None&ssid=9za61uo1hs0000001711295490538&qH=4aa7f10791b88761
https://rukminim2.flixcart.com/image/312/312/ky7lci80/mobile/4/n/d/-original-imagag2gdzpdfsdf.jpeg?q=70
₹22,999
₹31,999
4.2


In [26]:
# Create a dictionary containing phone names, links, images, selling prices, original prices, and reviews
dataframe = {"Phone_names": phone_name, 
             "Links": links, 
             "Images": image_urls, 
             "Selling Prices": selling_prices, 
             "Original Prie": original_price,
             "Reviews": reviews}

# Create a DataFrame from the dictionary
final_dataframe = pd.DataFrame(dataframe)

In [28]:
# Save the DataFrame 'final_dataframe' to a CSV file named "reviews.csv"
final_dataframe.to_csv("reviews.csv")

In [29]:
# Read the CSV file into a DataFrame
df = pd.read_csv("reviews.csv")

# Display the first few rows of the DataFrame
df.head()

Unnamed: 0.1,Unnamed: 0,Phone_names,Links,Images,Selling Prices,Original Prie,Reviews
0,0,"Mi 11 Lite (Vinyl Black, 128 GB)",https://www.flipkart.com/mi-11-lite-vinyl-blac...,https://rukminim2.flixcart.com/image/312/312/k...,"₹13,990","₹24,999",4.2
1,1,"Mi 11 Lite (Jazz Blue, 128 GB)",https://www.flipkart.com/mi-11-lite-jazz-blue-...,https://rukminim2.flixcart.com/image/312/312/k...,"₹13,949","₹24,999",4.2
2,2,"Mi 11 Lite (Vinyl Black, 128 GB)",https://www.flipkart.com/mi-11-lite-vinyl-blac...,https://rukminim2.flixcart.com/image/312/312/k...,"₹23,999","₹25,999",4.2
3,3,"Xiaomi 11i 5G (Purple Mist, 128 GB)",https://www.flipkart.com/xiaomi-11i-5g-purple-...,https://rukminim2.flixcart.com/image/312/312/k...,"₹22,999","₹31,999",4.2
4,4,"Mi 10T Pro (Cosmic Black, 128 GB)",https://www.flipkart.com/mi-10t-pro-cosmic-bla...,https://rukminim2.flixcart.com/image/312/312/k...,"₹22,999","₹47,999",4.2


In [41]:
# Save the DataFrame 'final_dataframe' to a CSV file named "final_data.csv"
final_dataframe.to_csv("final_data.csv")

# Multiple Pages Webscraping

In [30]:
import requests  # Importing the requests library for making HTTP requests
import pandas as pd  # Importing the pandas library for data manipulation
from bs4 import BeautifulSoup  # Importing BeautifulSoup for web scraping

In [31]:
page_number = input("Enter number of pages: ") # Prompt the user to input the number of pages

# Iterate through the range from 1 to the input number of pages
for i in range(1, int(page_number) + 1):
    
    # Construct the URL based on the current page number
    url = "https://www.flipkart.com/search?q=apple+mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&as-pos=1&as-type=RECENT&suggestionId=apple+mobiles%7CMobiles&requestId=74513d15-9fcc-41ea-a7e0-16499e689ee3&as-backfill=on&otracker=nmenu_sub_Electronics_0_Apple&page="+str(i)
    
    # Send a GET request to the URL
    req = requests.get(url)
    
    # Parse the HTML content of the response
    content = BeautifulSoup(req.content, 'html.parser')
    #print(content)

Enter number of pages: 4


In [32]:
data = content.find_all('div', {'class': '_2kHMtA'}) # Extract all div elements with class '_2kHMtA'
data[0] # Access the first element in the 'data' list

<div class="_2kHMtA"><a class="_1fQZEK" href="/apple-iphone-15-plus-pink-512-gb/p/itme9cc36fe09419?pid=MOBGTAGP7JCP6YBB&amp;lid=LSTMOBGTAGP7JCP6YBB3TZGS0&amp;marketplace=FLIPKART&amp;q=apple+mobiles&amp;store=tyy%2F4io&amp;srno=s_4_73&amp;otracker=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&amp;otracker1=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&amp;fm=organic&amp;iid=06d587e5-ee48-4b2c-b39d-571570e8b5c8.MOBGTAGP7JCP6YBB.SEARCH&amp;ppt=None&amp;ppn=None&amp;ssid=42eqnylvq80000001711295584132&amp;qH=cb603b9543d774e1" rel="noopener noreferrer" target="_blank"><div class="MIXNux"><div class="_2QcLo-"><div><div class="CXW8mj" style="height:200px;width:200px"><img alt="Apple iPhone 15 Plus (Pink, 512 GB)" class="_396cs4" loading="eager" src="https://rukminim2.flixcart.com/image/312/312/xif0q/mobile/c/6/r/-original-imagtc6fn8fecysv.jpeg?q=70"/></div></div></div><div class="_3wLduG"><div class="_3PzNI-"><span class="f3A4_V"><label class="_2iDkf8"><input class="_30VH1S" readonly="" type=

In [33]:
page_number = input("Enter number of pages: ") # Prompt the user to input the number of pages

# Iterate through the range from 1 to the input number of pages
for i in range(1, int(page_number) + 1):
    
    # Construct the URL based on the current page number
    url = "https://www.flipkart.com/search?q=apple+mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&as-pos=1&as-type=RECENT&suggestionId=apple+mobiles%7CMobiles&requestId=74513d15-9fcc-41ea-a7e0-16499e689ee3&as-backfill=on&otracker=nmenu_sub_Electronics_0_Apple&page="+str(i)
    
    # Send a GET request to the URL
    req = requests.get(url)
    
    # Parse the HTML content of the response
    content = BeautifulSoup(req.content, 'html.parser')
    
    # Find all elements with class '_4rR01T' (phone names)
    name = content.find_all('div', {'class': '_4rR01T'})
    
    # Find all elements with class '_30jeq3 _1_WHN1' (selling prices)
    price = content.find_all('div', {'class': '_30jeq3 _1_WHN1'})
    
    # Print the number of phones in the current page
    print("Phone in page "+str(i))
    print(len(name))

Enter number of pages: 4
Phone in page 1
24
Phone in page 2
24
Phone in page 3
24
Phone in page 4
24


In [34]:
page_number = input("Enter number of pages: ")# Prompt the user to input the number of pages

# Iterate through the range from 1 to the input number of pages
for i in range(1, int(page_number) + 1):
    
    # Construct the URL based on the current page number
    url = "https://www.flipkart.com/search?q=apple+mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&as-pos=1&as-type=RECENT&suggestionId=apple+mobiles%7CMobiles&requestId=74513d15-9fcc-41ea-a7e0-16499e689ee3&as-backfill=on&otracker=nmenu_sub_Electronics_0_Apple&page="+str(i)
    
    # Send a GET request to the URL
    req = requests.get(url)
    
    # Parse the HTML content of the response
    content = BeautifulSoup(req.content, 'html.parser')
    
    # Find all elements with class '_4rR01T' (phone names)
    name = content.find_all('div', {'class': '_4rR01T'})
    
    # Find all elements with class '_30jeq3 _1_WHN1' (selling prices)
    price = content.find_all('div', {'class': '_30jeq3 _1_WHN1'})
    
    # Print the label indicating the current page number
    print("Phone in page "+str(i))
    
    # Print the number of phones in the current page
    print(len(name))

Enter number of pages: 4
Phone in page 1
24
Phone in page 2
24
Phone in page 3
24
Phone in page 4
24


In [35]:
phone_name = [] # Initialize an empty list to store phone names
phone_price = [] # Initialize an empty list to store phone prices

# Prompt the user to input the number of pages
page_number = input("Enter number of pages: ")

# Iterate through the range from 1 to the input number of pages
for i in range(1, int(page_number) + 1):
    
    # Construct the URL based on the current page number
    url = "https://www.flipkart.com/search?q=apple+mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&as-pos=1&as-type=RECENT&suggestionId=apple+mobiles%7CMobiles&requestId=74513d15-9fcc-41ea-a7e0-16499e689ee3&as-backfill=on&otracker=nmenu_sub_Electronics_0_Apple&page="+str(i)
    
    # Send a GET request to the URL
    req = requests.get(url)
    
    # Parse the HTML content of the response
    content = BeautifulSoup(req.content, 'html.parser')
    
    # Find all elements with class '_4rR01T' (phone names)
    name = content.find_all('div', {'class': '_4rR01T'})
    
    # Find all elements with class '_30jeq3 _1_WHN1' (selling prices)
    price = content.find_all('div', {'class': '_30jeq3 _1_WHN1'})
    
    # Print the label indicating the current page number
    print("Phone in page "+str(i))
    
    # Print the number of phones in the current page
    print(len(name))
    
    # Iterate through each phone name element and append its text to the 'phone_name' list
    for i in name:
        phone_name.append(i.text)

    # Iterate through each phone price element and append its text to the 'phone_price' list
    for i in price:
        phone_price.append(i.text)
    

Enter number of pages: 4
Phone in page 1
24
Phone in page 2
24
Phone in page 3
24
Phone in page 4
24


In [36]:
# Iterate through each item in the 'phone_price' list and print each item (representing the price of a phone)
for i in phone_price:
    print(i)

₹67,999
₹67,999
₹56,999
₹56,999
₹67,999
₹56,999
₹56,999
₹52,999
₹52,999
₹68,999
₹52,999
₹66,999
₹52,999
₹52,999
₹56,999
₹66,999
₹66,999
₹82,999
₹66,999
₹82,999
₹96,999
₹80,999
₹82,999
₹80,999
₹68,999
₹1,27,990
₹1,58,990
₹66,999
₹80,999
₹82,999
₹1,27,990
₹96,999
₹1,37,990
₹76,999
₹1,58,990
₹80,999
₹92,999
₹82,999
₹1,37,990
₹86,999
₹44,999
₹69,999
₹57,999
₹44,999
₹44,999
₹43,900
₹76,999
₹69,999
₹96,999
₹1,48,900
₹92,999
₹43,900
₹92,999
₹1,12,999
₹1,37,990
₹44,999
₹43,900
₹1,48,900
₹48,900
₹1,77,990
₹1,48,900
₹50,999
₹1,27,990
₹1,77,990
₹86,999
₹69,999
₹80,999
₹69,999
₹92,999
₹1,58,990
₹1,77,990
₹1,12,999
₹76,999
₹1,77,990
₹96,999
₹97,999
₹86,999
₹59,900
₹97,999
₹59,900
₹97,999
₹60,999
₹43,900
₹1,79,900
₹48,900
₹1,79,900
₹1,12,999
₹1,48,900
₹64,900
₹43,900
₹48,900
₹48,999
₹1,12,999
₹48,900
₹1,37,990
₹60,999


In [37]:
# Iterate through each item in the 'phone_name' list and print each item
for i in phone_name:
    print(i)

Apple iPhone 15 (Blue, 128 GB)
Apple iPhone 15 (Green, 128 GB)
Apple iPhone 14 (Blue, 128 GB)
Apple iPhone 14 (Starlight, 128 GB)
Apple iPhone 15 (Black, 128 GB)
Apple iPhone 14 (Purple, 128 GB)
Apple iPhone 14 (Midnight, 128 GB)
Apple iPhone 13 (Pink, 128 GB)
Apple iPhone 13 (Starlight, 128 GB)
Apple iPhone 15 (Pink, 128 GB)
Apple iPhone 13 (Green, 128 GB)
Apple iPhone 14 Plus (Midnight, 128 GB)
Apple iPhone 13 (Midnight, 128 GB)
Apple iPhone 13 (Blue, 128 GB)
Apple iPhone 14 ((PRODUCT)RED, 128 GB)
Apple iPhone 14 Plus (Starlight, 128 GB)
Apple iPhone 14 Plus (Blue, 128 GB)
Apple iPhone 15 Plus (Green, 128 GB)
Apple iPhone 14 Plus (Purple, 128 GB)
Apple iPhone 15 Plus (Yellow, 128 GB)
Apple iPhone 14 Plus (Yellow, 512 GB)
Apple iPhone 15 (Pink, 256 GB)
Apple iPhone 15 Plus (Black, 128 GB)
Apple iPhone 15 (Green, 256 GB)
Apple iPhone 15 (Yellow, 128 GB)
Apple iPhone 15 Pro (White Titanium, 128 GB)
Apple iPhone 15 Pro (Black Titanium, 512 GB)
Apple iPhone 14 Plus (Yellow, 128 GB)
Apple 

In [38]:
# Create a dictionary containing phone names and prices
data = {"Phone_name":phone_name,
       "Phone_Price": phone_price}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Print the DataFrame
print(df)

                                        Phone_name Phone_Price
0                   Apple iPhone 15 (Blue, 128 GB)     ₹67,999
1                  Apple iPhone 15 (Green, 128 GB)     ₹67,999
2                   Apple iPhone 14 (Blue, 128 GB)     ₹56,999
3              Apple iPhone 14 (Starlight, 128 GB)     ₹56,999
4                  Apple iPhone 15 (Black, 128 GB)     ₹67,999
..                                             ...         ...
91             Apple iPhone 6 Plus (Silver, 64 GB)     ₹48,999
92           Apple iPhone 15 Plus (Yellow, 512 GB)   ₹1,12,999
93                 Apple iPhone 11 (Green, 128 GB)     ₹48,900
94  Apple iPhone 15 Pro (Natural Titanium, 256 GB)   ₹1,37,990
95                Apple iPhone 12 (Purple, 256 GB)     ₹60,999

[96 rows x 2 columns]
