# End-to-End Web Scraping & Analysis Project: Flipkart iPhone 16

## Project Objective
The goal of this project is to extract detailed product data of the iPhone 16 from Flipkart using web scraping techniques, clean and organize the data using Python, and analyze it through visualizations to derive valuable business insights.

---

## Workflow Overview
## `Data Collection → Data Understanding → Data Cleaning → Data Visualization → Insights`


---

## 1. Data Collection (Web Scraping)

We used Python libraries like `requests` and `BeautifulSoup` to scrape product details of iPhone 16 models from multiple pages on Flipkart.

### Key Tasks:
- Sent HTTP GET requests to Flipkart search result pages.
- Parsed the HTML response to extract:
  - Product Name
  - ROM / Storage
  - Display Type
  - Camera Specs
  - Processor Details
  - Warranty Info
  - Ratings
  - Number of Reviews
  - Price

---

## 2. Data Understanding

Once the raw data was scraped:
- We reviewed the structure of the data.
- Checked for missing fields, inconsistent formats, and irregular symbols (e.g., `\xa0`, `&`).
- Verified if all lists (columns) were of equal length and properly aligned.

---

## 3. Data Cleaning (Using Pandas & Regex)

We used the `pandas`  library to structure the data into a DataFrame and applied regular expressions for text cleaning.

### Cleaning Steps:
- Removed unwanted characters and spaces.
- Extracted rating, review count, and price using regex.
- Replaced missing or invalid entries with `NaN` for consistency.
- Converted data types (e.g., strings to floats/integers where required).

---

## 4. Data Visualization (EDA)

We used `matplotlib` and `seaborn` for creating plots to better understand and analyze the data.

### Univariate Analysis:
- Distribution of product ratings
- Most common ROM variants
- Frequency of different price points

### Bivariate Analysis:
- Relationship between ratings and number of reviews
- Price vs rating comparison

### Multivariate Analysis:
- Combined analysis of rating, price, and storage
- Heatmaps to explore correlations

---

## 5. Insights & Observations

From our analysis:
- Some variants of iPhone 16 consistently received higher ratings.
- Products with more reviews generally had higher visibility and better ratings.
- Price differences were observed across different storage configurations.

---

## Final Output

- A clean and structured dataset saved as:  
  `Flipkart_iPhone16_Specs.csv`
- Visual charts and graphs for insights
- Ready for use in dashboards, presentations, or machine learning models

---

## Tools & Libraries Used
- Python
- Requests (for HTTP requests)
- BeautifulSoup (for HTML parsing)
- Pandas (for data manipulation)
- Regex (for text extraction)
- Matplotlib & Seaborn (for data visualization)

---



In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import time

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

# Request headers to mimic a real browser
request_header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0'
}

# Lists to store data
product_name = []
ratings = []
reviews = []
rom_list = []
display_list = []
camera_list = []
processor_list = []
warranty_list = []

# Loop through multiple pages
for page in range(1, 6):  # You can increase the range for more pages
    url = f"https://www.flipkart.com/search?q=iphone+16&page={page}"
    response = requests.get(url, headers=request_header)
    print(f"Page {page} Status Code:", response.status_code)

    if response.status_code != 200:
        continue

    soup = BeautifulSoup(response.text, "html.parser")

    # Extract product names
    for i in soup.find_all("div", class_="KzDlHZ"):
        product_name.append(i.text.strip())

    # Extract ratings and reviews
    for rating in soup.find_all('span', class_="Wphh3N"):
        raw_text = rating.text.replace("\xa0", "").strip()
        match = re.search(r"([\d,]+)\s*Ratings&\s*(\d+)\s*Reviews", raw_text)
        if match:
            ratings.append(match.group(1))
            reviews.append(match.group(2))
        else:
            ratings.append(None)
            reviews.append(None)

    # Extract features
    tags = soup.find_all("li", class_="J+igdf")

    for item in tags:
        text = item.get_text(strip=True)

        if "ROM" in text:
            rom_list.append(text)
        elif "inch" in text or "Display" in text:
            display_list.append(text)
        elif "Camera" in text:
            camera_list.append(text)
        elif "Processor" in text or "Chip" in text:
            processor_list.append(text)
        elif "warranty" in text.lower():
            warranty_list.append(text)

# Normalize the lengths of lists by padding with None
max_len = len(product_name)
ratings += [None] * (max_len - len(ratings))
reviews += [None] * (max_len - len(reviews))
rom_list += [None] * (max_len - len(rom_list))
display_list += [None] * (max_len - len(display_list))
camera_list += [None] * (max_len - len(camera_list))
processor_list += [None] * (max_len - len(processor_list))
warranty_list += [None] * (max_len - len(warranty_list))

# Create DataFrame
df = pd.DataFrame({
    "Product Name": product_name,
    "Ratings": ratings,
    "Reviews": reviews,
    "ROM": rom_list,
    "Display": display_list,
    "Camera": camera_list,
    "Processor": processor_list,
    "Warranty": warranty_list
})

# Save to CSV
df.to_csv("flipkart_iphone_data.csv", index=False)

print(df.head())


Page 1 Status Code: 200
Page 2 Status Code: 200
Page 3 Status Code: 200
Page 4 Status Code: 200
Page 5 Status Code: 200
                            Product Name Ratings Reviews         ROM  \
0  Apple iPhone 16 (Ultramarine, 128 GB)  19,106     793  128 GB ROM   
1        Apple iPhone 16 (Black, 128 GB)  19,106     793  128 GB ROM   
2        Apple iPhone 16 (White, 128 GB)  19,106     793  128 GB ROM   
3        Apple iPhone 16 (White, 256 GB)  19,106     793  256 GB ROM   
4        Apple iPhone 16 (Black, 256 GB)  19,106     793  256 GB ROM   

                                        Display  \
0  15.49 cm (6.1 inch) Super Retina XDR Display   
1  15.49 cm (6.1 inch) Super Retina XDR Display   
2  15.49 cm (6.1 inch) Super Retina XDR Display   
3  15.49 cm (6.1 inch) Super Retina XDR Display   
4  15.49 cm (6.1 inch) Super Retina XDR Display   

                            Camera                             Processor  \
0  48MP + 12MP | 12MP Front Camera  A18 Chip, 6 Core Processor P

In [3]:
df.head()

Unnamed: 0,Product Name,Ratings,Reviews,ROM,Display,Camera,Processor,Warranty
0,"Apple iPhone 16 (Ultramarine, 128 GB)",19106,793,128 GB ROM,15.49 cm (6.1 inch) Super Retina XDR Display,48MP + 12MP | 12MP Front Camera,"A18 Chip, 6 Core Processor Processor",1 year warranty for phone and 1 year warranty ...
1,"Apple iPhone 16 (Black, 128 GB)",19106,793,128 GB ROM,15.49 cm (6.1 inch) Super Retina XDR Display,48MP + 12MP | 12MP Front Camera,"A18 Chip, 6 Core Processor Processor",1 year warranty for phone and 1 year warranty ...
2,"Apple iPhone 16 (White, 128 GB)",19106,793,128 GB ROM,15.49 cm (6.1 inch) Super Retina XDR Display,48MP + 12MP | 12MP Front Camera,"A18 Chip, 6 Core Processor Processor",1 year warranty for phone and 1 year warranty ...
3,"Apple iPhone 16 (White, 256 GB)",19106,793,256 GB ROM,15.49 cm (6.1 inch) Super Retina XDR Display,48MP + 12MP | 12MP Front Camera,"A18 Chip, 6 Core Processor Processor",1 year warranty for phone and 1 year warranty ...
4,"Apple iPhone 16 (Black, 256 GB)",19106,793,256 GB ROM,15.49 cm (6.1 inch) Super Retina XDR Display,48MP + 12MP | 12MP Front Camera,"A18 Chip, 6 Core Processor Processor",1 year warranty for phone and 1 year warranty ...
