#Introduction to Mobile Data Web Scraping

In this we are using Python to scrape data related to mobile phones from an e-commerce platform, **PriceOye.pk**. The goal is to extract relevant product information, such as name, price, ratings, specifications, and more, to create a structured dataset for analysis.

##We utilize libraries like:

requests: To fetch the HTML content from the website.
BeautifulSoup (from bs4): For parsing the HTML and extracting specific elements like product details.

pandas: To organize and structure the data into a DataFrame, making it easier to analyze or export for further use.

The process starts by constructing the URL for multiple pages of mobile phone listings. Then, the program scrapes data from each product page, focusing on attributes such as name, price, rating, release date, memory, RAM, battery, and camera specifications. This data is then stored in a pandas DataFrame, allowing for further analysis or use in research, business insights, or comparison purposes.

In [12]:
# libraries
import re
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [14]:
# Base URL without the page number
base_url = "https://priceoye.pk/mobiles?page="

link = []



for page_num in range(1, 24):

    url = base_url + str(page_num)  # Construct the full URL for each page
    page = requests.get(url)

    # Parse the page content
    suop_1 = BeautifulSoup(page.text, "html.parser")

    # Find all divs with the class 'productBox b-productBox'
    p = suop_1.find_all("div", class_="productBox b-productBox")

    # Loop through each div and extract the href from <a> tags

    for product in p:

        a_tags = product.find_all("a")  # Find all <a> tags

        for a in a_tags:

            href = a.get('href')  # Extract the href attribute

        link.append(href)

In [16]:
# Lists to store data
List_name = []
List_price = []
List_rating = []
List_release_date = []
List_weight = []
List_Memory = []
List_Ram = []
List_Screen_size = []
List_Battery = []
List_Front_Camera = []
List_Back_Camera = []

# iterlate every page
for i in link:
    page = requests.get(i)

    # Parse the page content

    suop_2 = BeautifulSoup(page.text, "html.parser")

    # Calling functions and store the output on list
    List_name.append(Name(suop_2))
    List_price.append(Price(suop_2))
    List_rating.append(Rating(suop_2))
    List_release_date.append(Extract_release_date(suop_2))
    List_weight.append(Weight_phone(suop_2))
    List_Memory.append(Memory(suop_2))
    List_Ram.append(Ram_Size(suop_2))
    List_Screen_size.append(Screen_Size(suop_2))
    List_Battery.append(Battery_Size(suop_2))
    List_Front_Camera.append(Front_Camera(suop_2))
    List_Back_Camera.append(Back_Camera(suop_2))

# Making DataFrame
mobile_data = {
    "Name": List_name,
    "Price": List_price,
    "Rating" :List_rating,
    "Url" : link ,
    "Release_Date" : List_release_date,
    "Weight" : List_weight,
    "Memory" : List_Memory,
    "Ram" : List_Ram,
    "Screen_size" : List_Screen_size,
    "Battery_size" :List_Battery,
    "Front_Camera" : List_Front_Camera,
    "Back_Camera" : List_Back_Camera
}

mobile_data=pd.DataFrame(mobile_data)


In [15]:
def Name(suop_2):
    """
    Extract the name of the product from the HTML content.

    This function looks for the product title within a specified class and extracts the text
    inside an <h3> tag with the class 'h2'. The result is stripped of extra whitespace.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    name (str): The product's name as a string.
    """
    product_class = suop_2.find("div", class_="product-title")
    h3_tag = product_class.find("h3", class_="h2")
    name = h3_tag.text.strip()
    return name

def Price(suop_2):
    """
    Extract the price of the product from the HTML content.

    The function looks for a span with the class 'summary-price' and extracts the price
    as text, then strips any extra spaces or formatting.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    price_phone (str): The price of the product as a string.
    """
    price = suop_2.find("span", class_="summary-price text-black price-size-lg bold")
    if price:
        price_phone = price.text.strip()
        return price_phone
    else:
        # Handle cases where price is not found
        # You can return a default value, skip the item, or raise an exception
        # Here, we return an empty string
        return ""

def Rating(suop_2):
    """
    Extract the rating of the product from the HTML content.

    The function searches for a div with the class 'rating-count' that contains the rating value.
    It uses regular expressions (regex) to extract numeric values from the text.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    rating_numbers[0] (str): The first numeric value extracted, which is the product's rating.
    """
    rating_div = suop_2.find("div", class_="semi-bold rating-count")
    rating_text = rating_div.text.strip()
    rating_numbers = re.findall(r'\d+', rating_text)  # Extract numeric ratings using regex
    return rating_numbers[0]

def Extract_release_date(suop_2):
    """
    Extract the release date of the product from the HTML content.

    The function finds the table header (th) tag with the string "Release Date" and retrieves
    the corresponding date from the next table data (td) tag.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    release_date (str): The product's release date as a string.
    """
    th_tag = suop_2.find("th", string="Release Date")
    td_tag = th_tag.find_next_sibling("td")
    release_date = td_tag.text.strip()
    return release_date

def Weight_phone(suop_2):
    """
    Extract the weight of the phone from the HTML content.

    Similar to the release date, this function looks for the table header (th) with "Phone Weight"
    and retrieves the weight from the next table data (td) tag.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    weight (str): The phone's weight as a string.
    """
    th_tag = suop_2.find("th", string="Phone Weight")
    td_tag = th_tag.find_next_sibling("td")
    weight = td_tag.text.strip()
    return weight

def Memory(suop_2):
    """
    Extract the internal memory of the product from the HTML content.

    This function finds the memory details by locating the "Internal Memory" table row
    and retrieving the value from the corresponding table data.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    Memory (str): The internal memory of the product as a string.
    """
    th_tag = suop_2.find("th", string="Internal Memory")
    td_tag = th_tag.find_next_sibling("td")
    Memory = td_tag.text.strip()
    return Memory

def Ram_Size(suop_2):
    """
    Extract the RAM size of the product from the HTML content.

    Finds the "RAM" row and retrieves the size from the next table data tag.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    size (str): The RAM size as a string.
    """
    th_tag = suop_2.find("th", string="RAM")
    td_tag = th_tag.find_next_sibling("td")
    size = td_tag.text.strip()
    return size

def Screen_Size(suop_2):
    """
    Extract the screen size of the product from the HTML content.

    The function finds the "Screen Size" row and retrieves the corresponding value from the
    next table data (td) tag.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    size (str): The screen size as a string.
    """
    th_tag = suop_2.find("th", string="Screen Size")
    td_tag = th_tag.find_next_sibling("td")
    size = td_tag.text.strip()
    return size

def Battery_Size(suop_2):
    """
    Extract the battery size/type of the product from the HTML content.

    This function locates the "Type" row in the table and retrieves the value from the corresponding
    table data tag.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    size (str): The battery type/size as a string.
    """
    th_tag = suop_2.find("th", string="Type")
    td_tag = th_tag.find_next_sibling("td")
    size = td_tag.text.strip()
    return size

def Front_Camera(suop_2):
    """
    Extract the front camera specifications of the product from the HTML content.

    The function finds the "Front Camera" row and retrieves the value from the next table data (td) tag.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    size (str): The front camera specifications as a string.
    """
    th_tag = suop_2.find("th", string="Front Camera")
    td_tag = th_tag.find_next_sibling("td")
    size = td_tag.text.strip()
    return size

def Back_Camera(suop_2):
    """
    Extract the back camera specifications of the product from the HTML content.

    Similar to the front camera, this function finds the "Back Camera" row and retrieves the value
    from the corresponding table data tag.

    Parameters:
    suop_2 (BeautifulSoup object): Parsed HTML content of the product page.

    Returns:
    size (str): The back camera specifications as a string.
    """
    th_tag = suop_2.find("th", string="Back Camera")
    td_tag = th_tag.find_next_sibling("td")
    size = td_tag.text.strip()
    return size


In [17]:
mobile_data

Unnamed: 0,Name,Price,Rating,Url,Release_Date,Weight,Memory,Ram,Screen_size,Battery_size,Front_Camera,Back_Camera
0,Infinix Note 40,"Rs 46,999",156,https://priceoye.pk/mobiles/infinix/infinix-no...,18 Mar 2024,190 g,256 GB,8 GB,6.78 inches,5000 mAh,32 MP,108 MP OIS
1,Xiaomi Redmi Note 13,"Rs 44,499",362,https://priceoye.pk/mobiles/xiaomi/xiaomi-redm...,26 Jan 2024,188.5g,"128GB, 256GB","6GB, 8GB (Expandable storage up to 1TB)",6.67 Inches,5000mAh,16MP,108MP+8MP+2MP
2,AllCall Shine 12 Pro,"Rs 20,199",1,https://priceoye.pk/mobiles/all-call/allcall-s...,29 Feb 2024,,128GB,4GB,6.52,4500 mAh,5MP,13Mp
3,Infinix Zero 40 4G,"Rs 69,999",1,https://priceoye.pk/mobiles/infinix/infinix-ze...,21 Sept 2024,180 g,256GB,8GB + 8GB Extended RAM,6.78 Inches,5000mAh,50 MP,108 MP + 50MP + 2 MP
4,VGO TEL Note 23,"Rs 27,199",286,https://priceoye.pk/mobiles/vgo-tel/vgo-tel-no...,14 Nov 2023,,256 GB,8 GB + 8 GB Extended,6.78 Inches,5000 mAh,16 MP,50 MP Triple Ai
...,...,...,...,...,...,...,...,...,...,...,...,...
778,XMobile X6 Lite,,1,https://priceoye.pk/mobiles/xmobile/xmobile-x6...,25 Jan 2024,,Standard,Standard,2.4,3000mAh,No,No
779,E-tachi B13 Vip,,1,https://priceoye.pk/mobiles/e-tachi/e-tachi-b1...,14 Jun 2024,,Standard,Standard,1.8 inch,3200mah,No,Yes
780,Gfive Music 100,,1,https://priceoye.pk/mobiles/gfive/gfive-music-100,26 Aug 2024,,,,1.8 inches,3000 mAh,,0.08MP
781,E-Tachi E7 pro,,1,https://priceoye.pk/mobiles/e-tachi/e-tachi-e7...,07 Jun 2024,205g,"128GB , 256GB","6GB RAM , 8GB RAM",6.79 inches,"5030 mAh, 33W wired charging","13 MP, f/2.5, (wide)","108 MP, f/1.8, (wide), 1/1.67"
