# Global_Mobile_Data

# Introduction 

This dataset contains detailed specifications of 1,000 smartphones released around 2024–2025 by major brands such as Apple, Samsung, Xiaomi, Oppo, Vivo, and Realme. Each entry includes key mobile features like price, RAM, storage, battery capacity, camera resolution, display size, charging speed, processor type, operating system, 5G support, user ratings, and release details. The dataset provides a realistic overview of the modern smartphone market, combining numerical and categorical attributes that make it suitable for EDA, brand comparison, trend analysis, price prediction, feature correlation, and various machine learning applications.


##  Import the necessary libraries

In [1]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

## Import dataset 

In [2]:
df=pd.read_csv("D:\DataScience\dataset\csv\global_mobile.csv")

## Display the first 5 rows of the dataset 

In [3]:
print(df.head())

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   
3    Vivo               V29e 744        837       6         512         48   
4   Apple  iPhone 16 Pro Max 927        335      12         128        200   

   battery_mah  display_size_inch  charging_watt 5g_support       os  \
0         6000                6.6             33        Yes  Android   
1         4500                6.9            100        Yes  Android   
2         4000                6.8             44        Yes  Android   
3         4500                6.0             65        Yes  Android   
4         5000                6.9            100        Yes      iOS   

        processor  rating release_month  year  
0       Helio G99     3.8      February  2025  
1 

## Display the last 5 rows of the dataset 

In [4]:
print(df.tail())

       brand            model  price_usd  ram_gb  storage_gb  camera_mp  \
995   Google       Pixel 7a 2        961       8         256         12   
996  OnePlus  OnePlus 13R 423        158      16          64         64   
997   Xiaomi  Poco X6 Pro 796       1164       6         128        200   
998   Realme     Narzo 70 809        895       8          64         48   
999   Xiaomi  Mi 13 Ultra 429        458      16         512         64   

     battery_mah  display_size_inch  charging_watt 5g_support       os  \
995         4000                5.9             44        Yes  Android   
996         5500                5.6             65        Yes  Android   
997         4500                5.7            120         No  Android   
998         5000                7.0             65         No  Android   
999         4500                5.8             18         No  Android   

          processor  rating release_month  year  
995       Helio G99     4.0      November  2025  
996 

## Data cleaning and Understanding 

## shape of the dataset

In [5]:
print("Shape of the dataset (rows, columns):", df.shape)

Shape of the dataset (rows, columns): (1000, 15)


## Datatypes of the each columns 

In [6]:
df.dtypes

brand                 object
model                 object
price_usd              int64
ram_gb                 int64
storage_gb             int64
camera_mp              int64
battery_mah            int64
display_size_inch    float64
charging_watt          int64
5g_support            object
os                    object
processor             object
rating               float64
release_month         object
year                   int64
dtype: object

## Duplicate records in the dataset 

In [7]:
duplicates=df.duplicated().sum()
print("Number of duplicate records:",duplicates)

duplicate_rows=df[df.duplicated()]
print(duplicate_rows)

Number of duplicate records: 0
Empty DataFrame
Columns: [brand, model, price_usd, ram_gb, storage_gb, camera_mp, battery_mah, display_size_inch, charging_watt, 5g_support, os, processor, rating, release_month, year]
Index: []


In [8]:
df.count()

brand                1000
model                1000
price_usd            1000
ram_gb               1000
storage_gb           1000
camera_mp            1000
battery_mah          1000
display_size_inch    1000
charging_watt        1000
5g_support           1000
os                   1000
processor            1000
rating               1000
release_month        1000
year                 1000
dtype: int64

## Checking the missing value 

In [11]:
print(df.isna().sum())

brand                0
model                0
price_usd            0
ram_gb               0
storage_gb           0
camera_mp            0
battery_mah          0
display_size_inch    0
charging_watt        0
5g_support           0
os                   0
processor            0
rating               0
release_month        0
year                 0
dtype: int64


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   brand              1000 non-null   object 
 1   model              1000 non-null   object 
 2   price_usd          1000 non-null   int64  
 3   ram_gb             1000 non-null   int64  
 4   storage_gb         1000 non-null   int64  
 5   camera_mp          1000 non-null   int64  
 6   battery_mah        1000 non-null   int64  
 7   display_size_inch  1000 non-null   float64
 8   charging_watt      1000 non-null   int64  
 9   5g_support         1000 non-null   object 
 10  os                 1000 non-null   object 
 11  processor          1000 non-null   object 
 12  rating             1000 non-null   float64
 13  release_month      1000 non-null   object 
 14  year               1000 non-null   int64  
dtypes: float64(2), int64(7), object(6)
memory usage: 117.3+ KB


In [13]:
df.columns = df.columns.str.replace(' ','_')
print(df)

       brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0       Oppo                A98 111        855      16         128        108   
1     Realme            11 Pro+ 843        618       6         128         64   
2     Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   
3       Vivo               V29e 744        837       6         512         48   
4      Apple  iPhone 16 Pro Max 927        335      12         128        200   
..       ...                    ...        ...     ...         ...        ...   
995   Google             Pixel 7a 2        961       8         256         12   
996  OnePlus        OnePlus 13R 423        158      16          64         64   
997   Xiaomi        Poco X6 Pro 796       1164       6         128        200   
998   Realme           Narzo 70 809        895       8          64         48   
999   Xiaomi        Mi 13 Ultra 429        458      16         512         64   

     battery_mah  display_s

In [14]:
df.isnull().values.any()

False

## Feature Engineering 

#### Label Encoding For Categorical Columns

In [15]:
from sklearn.preprocessing import LabelEncoder

label_brand   = LabelEncoder()
label_os      = LabelEncoder()
label_proc    = LabelEncoder()
label_model   = LabelEncoder()
label_5g      = LabelEncoder()

df["brand_encoded"]      = label_brand.fit_transform(df["brand"])
df["os_encoded"]         = label_os.fit_transform(df["os"])
df["processor_encoded"]  = label_proc.fit_transform(df["processor"])
df["model_encoded"]      = label_model.fit_transform(df["model"])
df["support_5g_encoded"] = label_5g.fit_transform(df["5g_support"])

brand_mapping     = dict(zip(label_brand.classes_,     range(len(label_brand.classes_))))
os_mapping        = dict(zip(label_os.classes_,        range(len(label_os.classes_))))
processor_mapping = dict(zip(label_proc.classes_,      range(len(label_proc.classes_))))
model_mapping     = dict(zip(label_model.classes_,     range(len(label_model.classes_))))
support5g_mapping = dict(zip(label_5g.classes_,        range(len(label_5g.classes_))))

brand_inverse     = {v:k for k,v in brand_mapping.items()}
os_inverse        = {v:k for k,v in os_mapping.items()}
processor_inverse = {v:k for k,v in processor_mapping.items()}
model_inverse     = {v:k for k,v in model_mapping.items()}
support5g_inverse = {v:k for k,v in support5g_mapping.items()}

print("Brand Mapping: ", brand_mapping)
print("OS Mapping: ", os_mapping)
print("Processor Mapping: ", processor_mapping)
print("5G Support Mapping: ", support5g_mapping)


Brand Mapping:  {'Apple': 0, 'Google': 1, 'Infinix': 2, 'OnePlus': 3, 'Oppo': 4, 'Realme': 5, 'Samsung': 6, 'Vivo': 7, 'Xiaomi': 8}
OS Mapping:  {'Android': 0, 'iOS': 1}
Processor Mapping:  {'A18 Pro': 0, 'Dimensity 9300': 1, 'Exynos 2400': 2, 'Helio G99': 3, 'Snapdragon 6 Gen 1': 4, 'Snapdragon 7+ Gen 2': 5, 'Snapdragon 8 Gen 3': 6, 'Tensor G4': 7}
5G Support Mapping:  {'No': 0, 'Yes': 1}


#### Creating Price Categories from Price 

In [16]:
df['price_category'] = pd.cut(
    df["price_usd"],
    bins=[0,200,400,700,1200,5000],
    labels=["Budget", "Lower Mid", "Mid Range", "Premium", "Ultra Premium"]
)

label_price = LabelEncoder()
df["price_category_encoded"] = label_price.fit_transform(df["price_category"].astype(str))
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ... rating  \
0         6000                6.6             33        Yes  ...    3.8   
1         4500                6.9            100        Yes  ...    4.4   
2         4000                6.8             44        Yes  ...    4.1   

  release_month  year brand_encoded  os_encoded  processor_encoded  \
0      February  2025             4           0                  3   
1        August  2025             5           0                  7   
2         March  2025             8           0                  0   

   model_encoded  support_5g_encoded  price_category  price_category_encoded  
0             19          

#### Creating Performance Score

In [17]:
df["performance_score"] = df["ram_gb"] * df["storage_gb"]
df["is_high_performance"] = (df["ram_gb"] >= 8).astype(int)
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ...  year  \
0         6000                6.6             33        Yes  ...  2025   
1         4500                6.9            100        Yes  ...  2025   
2         4000                6.8             44        Yes  ...  2025   

  brand_encoded  os_encoded processor_encoded  model_encoded  \
0             4           0                 3             19   
1             5           0                 7             12   
2             8           0                 0            663   

   support_5g_encoded  price_category  price_category_encoded  \
0                   1         Premium                       3   
1  

#### Creating Camera Quality Category and Display Features

In [18]:
df["camera_quality"] = pd.cut(
    df["camera_mp"],
    bins=[0,12,48,108,300],
    labels=["Low","Good","Very Good","Flagship"]
)

label_cam = LabelEncoder()
df["camera_quality_encoded"] = label_cam.fit_transform(df["camera_quality"].astype(str))
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ... os_encoded  \
0         6000                6.6             33        Yes  ...          0   
1         4500                6.9            100        Yes  ...          0   
2         4000                6.8             44        Yes  ...          0   

  processor_encoded  model_encoded support_5g_encoded  price_category  \
0                 3             19                  1         Premium   
1                 7             12                  1       Mid Range   
2                 0            663                  1       Lower Mid   

   price_category_encoded  performance_score  is_high_performance  \
0       

In [19]:
df["display_score"] = df["display_size_inch"] * df["camera_mp"]
df["is_large_display"] = (df["display_size_inch"] > 6.5).astype(int)
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ...  \
0         6000                6.6             33        Yes  ...   
1         4500                6.9            100        Yes  ...   
2         4000                6.8             44        Yes  ...   

  model_encoded support_5g_encoded  price_category price_category_encoded  \
0            19                  1         Premium                      3   
1            12                  1       Mid Range                      2   
2           663                  1       Lower Mid                      1   

   performance_score  is_high_performance  camera_quality  \
0               2048                    1   

#### Creating Battery Efficiency Score  and Creating Charging Speed Category 

In [20]:
df["battery_efficiency"] = df["battery_mah"] / df["display_size_inch"]
df["is_fast_charging"] = (df["charging_watt"] >= 30).astype(int)
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ...  \
0         6000                6.6             33        Yes  ...   
1         4500                6.9            100        Yes  ...   
2         4000                6.8             44        Yes  ...   

  price_category price_category_encoded  performance_score  \
0        Premium                      3               2048   
1      Mid Range                      2                768   
2      Lower Mid                      1               1024   

  is_high_performance  camera_quality  camera_quality_encoded  display_score  \
0                   1       Very Good                       3          712.8   
1    

#### Brand Popularity and Flagship Brand Feature Creation

In [21]:
df["brand_popularity"] = df["brand"].map(df["brand"].value_counts())

flagship_brands = ["Apple", "Samsung", "OnePlus", "Google"]
df["is_flagship_brand"] = df["brand"].isin(flagship_brands).astype(int)
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ...  \
0         6000                6.6             33        Yes  ...   
1         4500                6.9            100        Yes  ...   
2         4000                6.8             44        Yes  ...   

  performance_score is_high_performance  camera_quality  \
0              2048                   1       Very Good   
1               768                   0       Very Good   
2              1024                   1       Very Good   

  camera_quality_encoded  display_score  is_large_display  battery_efficiency  \
0                      3          712.8                 1          909.090909   
1              

#### Extracting Time-Based Features 

In [22]:
df["release_season"] = df["release_month"].map({
    'January':"Winter", 'February':"Winter", 'December':"Winter",
    'March':"Spring", 'April':"Spring", 'May':"Spring",
    'June':"Summer", 'July':"Summer", 'August':"Summer",
    'September':"Autumn", 'October':"Autumn", 'November':"Autumn"
})

label_season = LabelEncoder()
df["release_season_encoded"] = label_season.fit_transform(df["release_season"].astype(str))
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ...  \
0         6000                6.6             33        Yes  ...   
1         4500                6.9            100        Yes  ...   
2         4000                6.8             44        Yes  ...   

  camera_quality camera_quality_encoded  display_score is_large_display  \
0      Very Good                      3          712.8                1   
1      Very Good                      3          441.6                1   
2      Very Good                      3          435.2                1   

   battery_efficiency  is_fast_charging  brand_popularity  is_flagship_brand  \
0          909.090909            

#### Creating Rating Category 

In [23]:
df["rating_category"] = pd.cut(
    df["rating"],
    bins=[0,2,3.5,5],
    labels=["Poor","Average","Excellent"]
)

label_rating = LabelEncoder()
df["rating_category_encoded"] = label_rating.fit_transform(df["rating_category"].astype(str))
print(df.head(3))

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   

   battery_mah  display_size_inch  charging_watt 5g_support  ...  \
0         6000                6.6             33        Yes  ...   
1         4500                6.9            100        Yes  ...   
2         4000                6.8             44        Yes  ...   

  display_score is_large_display  battery_efficiency is_fast_charging  \
0         712.8                1          909.090909                1   
1         441.6                1          652.173913                1   
2         435.2                1          588.235294                1   

   brand_popularity  is_flagship_brand  release_season  \
0               110                  0          Winter   
1    

#####  Removing Duplicate or Irrelevant Columns

In [24]:
df=df.drop_duplicates()

In [25]:
print(df.head())

    brand                  model  price_usd  ram_gb  storage_gb  camera_mp  \
0    Oppo                A98 111        855      16         128        108   
1  Realme            11 Pro+ 843        618       6         128         64   
2  Xiaomi  Redmi Note 14 Pro 461        258      16          64         64   
3    Vivo               V29e 744        837       6         512         48   
4   Apple  iPhone 16 Pro Max 927        335      12         128        200   

   battery_mah  display_size_inch  charging_watt 5g_support  ...  \
0         6000                6.6             33        Yes  ...   
1         4500                6.9            100        Yes  ...   
2         4000                6.8             44        Yes  ...   
3         4500                6.0             65        Yes  ...   
4         5000                6.9            100        Yes  ...   

  display_score is_large_display  battery_efficiency is_fast_charging  \
0         712.8                1          909.090

## Relation and Trend Analysis 

#### How does RAM affect mobile price 

In [26]:
df.groupby("ram_gb")["price_usd"].mean().sort_values()

ram_gb
12    788.785000
8     791.484043
16    816.020000
6     832.295238
4     836.316832
Name: price_usd, dtype: float64

#### Does higher storage increase price 

In [27]:
df.groupby("storage_gb")["price_usd"].mean().sort_values()

storage_gb
64      783.857143
512     800.597345
1024    815.745000
256     833.198830
128     838.970000
Name: price_usd, dtype: float64

#### Which processor series has the highest average price 

In [35]:
high_processor=df.groupby("processor")["price_usd"].mean().idxmax()
high_avg_price=df.groupby("processor")["price_usd"].mean().max()

print("processor series with highest avg price:",high_processor,\
       high_avg_price)

processor series with highest avg price: Snapdragon 7+ Gen 2 877.1130434782609


#### What is the average price of mobiles by brand? 

In [36]:
avg_price_brand = df.groupby("brand")["price_usd"].mean().sort_values(ascending=False)

print("Average price by brand:")
print(avg_price_brand)
print("\nBrand with highest average price:", avg_price_brand.idxmax())
print("Highest average price:", avg_price_brand.max())

Average price by brand:
brand
Infinix    839.171429
Apple      835.691589
Xiaomi     827.736842
Oppo       826.327273
OnePlus    812.250000
Google     808.060345
Vivo       807.901639
Samsung    791.723810
Realme     771.780952
Name: price_usd, dtype: float64

Brand with highest average price: Infinix
Highest average price: 839.1714285714286


#### Do 5G phones have higher average price? 

In [37]:
avg_5g_price = df.groupby("5g_support")["price_usd"].mean()

print("Price comparison (0 = No 5G, 1 = 5G):")
print(avg_5g_price)

if avg_5g_price[1] > avg_5g_price[0]:
    print("\nFinal Answer: 5G phones are more expensive.")
else:
    print("\nFinal Answer: Non-5G phones are more expensive.")

Price comparison (0 = No 5G, 1 = 5G):
5g_support
No     821.832998
Yes    805.222664
Name: price_usd, dtype: float64

Final Answer: Non-5G phones are more expensive.


#### Which OS has the highest average rating? 

In [38]:
avg_rating_os = df.groupby("os")["rating"].mean().sort_values(ascending=False)

print("Average rating by OS:")
print(avg_rating_os)
print("\nOS with highest avg rating:", avg_rating_os.idxmax())
print("Highest rating:", avg_rating_os.max())

Average rating by OS:
os
Android    4.230683
iOS        4.223364
Name: rating, dtype: float64

OS with highest avg rating: Android
Highest rating: 4.2306830907054875


#### Which brand has the highest average battery capacity? 

In [39]:
avg_battery_brand = df.groupby("brand")["battery_mah"].mean().sort_values(ascending=False)

print("Average battery capacity by brand:")
print(avg_battery_brand)
print("\nBrand with highest avg battery:", avg_battery_brand.idxmax())
print("Highest battery capacity:", avg_battery_brand.max())

Average battery capacity by brand:
brand
Samsung    5142.857143
Oppo       5059.090909
Xiaomi     5057.017544
Google     5034.482759
Realme     5023.809524
Vivo       5000.000000
Infinix    4942.857143
OnePlus    4926.724138
Apple      4925.233645
Name: battery_mah, dtype: float64

Brand with highest avg battery: Samsung
Highest battery capacity: 5142.857142857143


#### Does higher storage increase price?

In [40]:
avg_storage_price = df.groupby("storage_gb")["price_usd"].mean().sort_values()

print("Average price by storage:")
print(avg_storage_price)

print("\nLowest storage price:", avg_storage_price.iloc[0])
print("Highest storage price:", avg_storage_price.iloc[-1])

Average price by storage:
storage_gb
64      783.857143
512     800.597345
1024    815.745000
256     833.198830
128     838.970000
Name: price_usd, dtype: float64

Lowest storage price: 783.8571428571429
Highest storage price: 838.97


#### Are newer phones (year > 2022) more expensive? 

In [41]:
new_price = df[df["year"] > 2022]["price_usd"].mean()
old_price = df[df["year"] <= 2022]["price_usd"].mean()

print("Avg price (year > 2022):", new_price)
print("Avg price (year <= 2022):", old_price)

if new_price > old_price:
    print("\nFinal Answer: Newer phones are more expensive.")
else:
    print("\nFinal Answer: Older phones are more expensive.")

Avg price (year > 2022): 813.478
Avg price (year <= 2022): nan

Final Answer: Older phones are more expensive.


#### Does higher RAM increase price? 

In [43]:
avg_ram_price = df.groupby("ram_gb")["price_usd"].mean().sort_values()

print("Average price by RAM size:")
print(avg_ram_price)

print("\nLowest RAM price:", avg_ram_price.iloc[0])
print("Highest RAM price:", avg_ram_price.iloc[-1])
print("So Higher RAM increase price")

Average price by RAM size:
ram_gb
12    788.785000
8     791.484043
16    816.020000
6     832.295238
4     836.316832
Name: price_usd, dtype: float64

Lowest RAM price: 788.785
Highest RAM price: 836.3168316831683
So Higher RAM increase price


#### Does battery capacity correlate with price?

In [44]:
corr_val = df["battery_mah"].corr(df["price_usd"])

print("Correlation between battery capacity and price:", corr_val)

if corr_val > 0:
    print("There is a positive correlation.")
else:
    print("There is a negative correlation.")

Correlation between battery capacity and price: -0.006909489285069385
There is a negative correlation.


#### Which display size range is most expensive? 

In [None]:
bins = [5, 6, 7]
labels = ["5–6 inch", "6–7 inch"]

df["display_range"] = pd.cut(df["display_size_inch"], bins=bins, labels=labels)

avg_display_price = df.groupby("display_range")["price_usd"].mean()

print("Average price by display size range:")
print(avg_display_price)

print("\nMost expensive size range:", avg_display_price.idxmax())
print("Highest avg price:", avg_display_price.max()) 


#### Which release month has the highest average price? 

In [45]:
avg_month_price = df.groupby("release_month")["price_usd"].mean().sort_values(ascending=False)

print("Average price by release month:")
print(avg_month_price)

print("\nMonth with highest avg price:", avg_month_price.idxmax())
print("Highest avg price:", avg_month_price.max())

Average price by release month:
release_month
September    879.575342
February     868.273810
July         828.387097
March        823.662338
April        818.594059
November     811.465753
January      809.463158
May          804.563380
December     801.225000
August       789.000000
October      769.744186
June         763.085366
Name: price_usd, dtype: float64

Month with highest avg price: September
Highest avg price: 879.5753424657535


#### Which season has the most expensive launches? 

In [46]:
avg_season_price = df.groupby("release_season")["price_usd"].mean().sort_values(ascending=False)

print("Average price by season:")
print(avg_season_price)

print("\nSeason with highest avg price:", avg_season_price.idxmax())
print("Highest avg price:", avg_season_price.max())

Average price by season:
release_season
Winter    825.992278
Autumn    817.431034
Spring    816.160643
Summer    794.915385
Name: price_usd, dtype: float64

Season with highest avg price: Winter
Highest avg price: 825.992277992278


#### How does price trend across years?

In [47]:
avg_year_price = df.groupby("year")["price_usd"].mean()

print("Average price by year:")
print(avg_year_price)

print("\nMost expensive release year:", avg_year_price.idxmax())
print("Highest avg price:", avg_year_price.max())

Average price by year:
year
2025    813.478
Name: price_usd, dtype: float64

Most expensive release year: 2025
Highest avg price: 813.478


#### How many phones have RAM ≥ 8 GB and Storage ≥ 256 GB? 

In [48]:
df[(df["ram_gb"] >= 8) & (df["storage_gb"] >= 256)].shape[0]

353

#### Best value phones (Rating ≥ 4 & Price < $300) 

In [49]:
df[(df["rating"] >= 4) & (df["price_usd"] < 300)].shape[0]

102

#### Is rating correlated with price?

In [50]:
df[["rating", "price_usd"]].corr()

Unnamed: 0,rating,price_usd
rating,1.0,-0.00101
price_usd,-0.00101,1.0


#### Do bigger batteries have faster charging?

In [51]:
df[["battery_mah", "charging_watt"]].corr()

Unnamed: 0,battery_mah,charging_watt
battery_mah,1.0,-0.003426
charging_watt,-0.003426,1.0
