# Exploratory Data Analysis (EDA)
## Mobile Phones Dataset

### About Dataset

This dataset contains detailed specifications and official launch prices of various mobile phone models from different companies. It provides insights into smartphone hardware, pricing trends, and brand competitiveness across multiple countries. The dataset includes key features such as RAM, camera specifications, battery capacity, processor details, and screen size.

One important aspect of this dataset is the pricing information. The recorded prices represent the official launch prices of the mobile phones at the time they were first introduced in the market. Prices vary based on the country and the launch period, meaning older models reflect their original launch prices, while newer models include their most recent launch prices. This makes the dataset valuable for studying price trends over time and comparing smartphone affordability across different regions.

**Features:**
- **Company Name**: The brand or manufacturer of the mobile phone.
- **Model Name**: The specific model of the smartphone.
- **Mobile Weight**: The weight of the mobile phone (in grams).
- **RAM**: The amount of Random Access Memory (RAM) in the device (in GB).
- **Front Camera**: The resolution of the front (selfie) camera (in MP).
- **Back Camera**: The resolution of the primary rear camera (in MP).
- **Processor**: The chipset or processor used in the device.
- **Battery Capacity**: The battery size of the smartphone (in mAh).
- **Screen Size**: The display size of the smartphone (in inches).
- **Launched Price (Pakistan, India, China, USA, Dubai)**: The official launch price of the mobile in the respective country at the time of its release. Prices vary based on the year the mobile was launched.
- **Launched Year**: The year the mobile phone was officially launched.

In [1]:
# Import required libraries
import pandas as pd
import numpy as np

In [2]:
# Load the dataset
# Using latin1 encoding to handle special characters (byte 0xa5)
#df = pd.read_csv('Mobiles Dataset (2025).csv', encoding='latin1')
df = pd.read_csv('normlized_transformed.csv', encoding='latin1')

In [None]:
# Save the transformed dataset
output_filename = 'normlized_transformed.csv'
df.to_csv(output_filename, index=False, encoding='utf-8')
print(f"âœ… Transformed dataset saved to: {output_filename}")
print(f"ðŸ“Š Total rows: {len(df)}, Total columns: {len(df.columns)}")
print(f"\nAll column names:")
print(list(df.columns))


## 0. Basic Dataset Information

In [None]:
# Dataset shape
print("Dataset Shape:")
print(df.shape)
print(f"\nRows: {df.shape[0]}, Columns: {df.shape[1]}")

Dataset Shape:
(930, 19)

Rows: 930, Columns: 19


In [None]:
# Dataset info
print("Dataset Info:")
df.info()

Dataset Info:
<class 'pandas.DataFrame'>
RangeIndex: 930 entries, 0 to 929
Data columns (total 19 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Company Name        930 non-null    str    
 1   Model Name          930 non-null    str    
 2   Processor           930 non-null    str    
 3   Price USD_Pakistan  929 non-null    float64
 4   Price USD_India     930 non-null    float64
 5   Price USD_China     930 non-null    float64
 6   Price USD_USA       930 non-null    float64
 7   Price USD_Dubai     930 non-null    float64
 8   device_age          930 non-null    int64  
 9   weight_g            930 non-null    float64
 10  ram_gb              930 non-null    float64
 11  screen_in           930 non-null    float64
 12  battery_mah         930 non-null    float64
 13  front_mp_max        930 non-null    float64
 14  front_mp_sum        930 non-null    float64
 15  front_cam_count     930 non-null    int64  
 16  back_

In [None]:
# First few rows
print("First 5 rows:")
df.head()

First 5 rows:


Unnamed: 0,Company Name,Model Name,Processor,Price USD_Pakistan,Price USD_India,Price USD_China,Price USD_USA,Price USD_Dubai,device_age,weight_g,ram_gb,screen_in,battery_mah,front_mp_max,front_mp_sum,front_cam_count,back_mp_max,back_mp_sum,back_cam_count
0,Apple,iPhone 16 128GB,A17 Bionic,803.567857,963.843373,805.416667,799.0,762.6703,2,174.0,6.0,6.1,3600.0,12.0,12.0,1,48.0,48.0,1
1,Apple,iPhone 16 256GB,A17 Bionic,839.282143,1024.084337,847.083333,849.0,817.166213,2,174.0,6.0,6.1,3600.0,12.0,12.0,1,48.0,48.0,1
2,Apple,iPhone 16 512GB,A17 Bionic,874.996429,1084.325301,902.638889,899.0,871.662125,2,174.0,6.0,6.1,3600.0,12.0,12.0,1,48.0,48.0,1
3,Apple,iPhone 16 Plus 128GB,A17 Bionic,892.853571,1084.325301,860.972222,899.0,871.662125,2,203.0,6.0,6.7,4200.0,12.0,12.0,1,48.0,48.0,1
4,Apple,iPhone 16 Plus 256GB,A17 Bionic,928.567857,1144.566265,902.638889,949.0,926.158038,2,203.0,6.0,6.7,4200.0,12.0,12.0,1,48.0,48.0,1


## 1. Rows Count

In [None]:
# Total number of rows
print(f"Total number of rows: {len(df)}")
print(f"Total number of rows: {df.shape[0]}")

Total number of rows: 930
Total number of rows: 930


## 2. Null Values in Each Column

In [None]:
# Check for null values
null_counts = df.isnull().sum()
print("Null values in each column:")
print(null_counts)
print("\nColumns with null values:")
print(null_counts[null_counts > 0])

Null values in each column:
Company Name          0
Model Name            0
Processor             0
Price USD_Pakistan    1
Price USD_India       0
Price USD_China       0
Price USD_USA         0
Price USD_Dubai       0
device_age            0
weight_g              0
ram_gb                0
screen_in             0
battery_mah           0
front_mp_max          0
front_mp_sum          0
front_cam_count       0
back_mp_max           0
back_mp_sum           0
back_cam_count        0
dtype: int64

Columns with null values:
Price USD_Pakistan    1
dtype: int64


## 3. Categorical and Numeric Columns

In [None]:
# Identify categorical and numeric columns
categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()

print("Categorical Columns:")
print(categorical_cols)
print(f"\nNumber of categorical columns: {len(categorical_cols)}")

print("\n" + "="*50)
print("\nNumeric Columns:")
print(numeric_cols)
print(f"\nNumber of numeric columns: {len(numeric_cols)}")

print("\n" + "="*50)
print("\nData Types:")
print(df.dtypes)

Categorical Columns:
['Company Name', 'Model Name', 'Processor']

Number of categorical columns: 3


Numeric Columns:
['Price USD_Pakistan', 'Price USD_India', 'Price USD_China', 'Price USD_USA', 'Price USD_Dubai', 'device_age', 'weight_g', 'ram_gb', 'screen_in', 'battery_mah', 'front_mp_max', 'front_mp_sum', 'front_cam_count', 'back_mp_max', 'back_mp_sum', 'back_cam_count']

Number of numeric columns: 16


Data Types:
Company Name              str
Model Name                str
Processor                 str
Price USD_Pakistan    float64
Price USD_India       float64
Price USD_China       float64
Price USD_USA         float64
Price USD_Dubai       float64
device_age              int64
weight_g              float64
ram_gb                float64
screen_in             float64
battery_mah           float64
front_mp_max          float64
front_mp_sum          float64
front_cam_count         int64
back_mp_max           float64
back_mp_sum           float64
back_cam_count          int64
dtype: 

See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3.
  categorical_cols = df.select_dtypes(include=['object']).columns.tolist()


## 4. Columns with Missing Values

In [None]:
# Columns with missing values
missing_cols = df.columns[df.isnull().any()].tolist()
if missing_cols:
    print("Columns with missing values:")
    for col in missing_cols:
        missing_count = df[col].isnull().sum()
        missing_pct = (missing_count / len(df)) * 100
        print(f"  - {col}: {missing_count} missing values ({missing_pct:.2f}%)")
else:
    print("No columns have missing values.")

Columns with missing values:
  - Price USD_Pakistan: 1 missing values (0.11%)


## 5. Numerical Columns: Statistical Summary

In [None]:
# Statistical summary for numerical columns
print("Statistical Summary (mean, median, std, min, max, quartiles):")
print("="*70)
df.describe()

Statistical Summary (mean, median, std, min, max, quartiles):


Unnamed: 0,Price USD_Pakistan,Price USD_India,Price USD_China,Price USD_USA,Price USD_Dubai,device_age,weight_g,ram_gb,screen_in,battery_mah,front_mp_max,front_mp_sum,front_cam_count,back_mp_max,back_mp_sum,back_cam_count
count,929.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0,930.0
mean,447.985941,609.359256,543.716219,625.515763,595.077203,3.806452,228.267097,7.784946,7.083796,5026.163441,18.163011,18.261935,1.012903,46.854624,55.490108,1.633333
std,362.834009,493.496137,583.549535,1347.561211,426.602627,1.86208,105.432503,3.179673,1.53369,1355.548264,11.986228,12.15074,0.112918,31.0681,36.81858,0.806304
min,57.139286,72.277108,69.305556,79.0,81.47139,1.0,135.0,1.0,5.0,2000.0,2.0,2.0,1.0,5.0,5.0,1.0
25%,196.425,240.951807,236.006944,250.0,272.479564,2.0,185.0,6.0,6.5,4402.5,8.0,8.0,1.0,16.0,25.0,1.0
50%,303.571429,421.674699,388.888889,449.0,456.40327,3.0,194.0,8.0,6.67,5000.0,16.0,16.0,1.0,50.0,50.0,1.0
75%,642.853571,902.409639,763.75,849.0,871.662125,5.0,208.0,8.0,6.78,5091.25,32.0,32.0,1.0,50.0,64.0,2.0
max,2160.710714,3313.240964,13999.0,39622.0,3024.250681,12.0,732.0,16.0,14.6,11200.0,60.0,68.0,2.0,200.0,212.0,4.0


In [4]:
# Save the transformed dataset
output_filename = 'normlized_transformed.csv'
df.to_csv(output_filename, index=False, encoding='utf-8')
print(f"âœ… Transformed dataset saved to: {output_filename}")
print(f"ðŸ“Š Total rows: {len(df)}, Total columns: {len(df.columns)}")
print(f"\nAll column names:")
print(list(df.columns))

âœ… Transformed dataset saved to: normlized_transformed.csv
ðŸ“Š Total rows: 930, Total columns: 19

All column names:
['Company Name', 'Model Name', 'Processor', 'Price USD_Pakistan', 'Price USD_India', 'Price USD_China', 'Price USD_USA', 'Price USD_Dubai', 'device_age', 'weight_g', 'ram_gb', 'screen_in', 'battery_mah', 'front_mp_max', 'front_mp_sum', 'front_cam_count', 'back_mp_max', 'back_mp_sum', 'back_cam_count']
