# Mobile Price Prediction
Predict the **USA launch price** of a mobile phone from its specs.

**Dataset:** `normlized_transformed.csv` | **Model:** Random Forest Regressor | **Target:** `Price USD_USA`

## 1. Import Libraries

In [12]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder

## 2. Load Data
Using the pre-processed transformed CSV — all columns are already numeric.

In [13]:
df = pd.read_csv('normlized_transformed.csv', encoding='latin1')
print('Shape:', df.shape)
df.head()

Shape: (930, 19)


Unnamed: 0,Company Name,Model Name,Processor,Price USD_Pakistan,Price USD_India,Price USD_China,Price USD_USA,Price USD_Dubai,device_age,weight_g,ram_gb,screen_in,battery_mah,front_mp_max,front_mp_sum,front_cam_count,back_mp_max,back_mp_sum,back_cam_count
0,Apple,iPhone 16 128GB,A17 Bionic,803.567857,963.843373,805.416667,799.0,762.6703,2,174.0,6.0,6.1,3600.0,12.0,12.0,1,48.0,48.0,1
1,Apple,iPhone 16 256GB,A17 Bionic,839.282143,1024.084337,847.083333,849.0,817.166213,2,174.0,6.0,6.1,3600.0,12.0,12.0,1,48.0,48.0,1
2,Apple,iPhone 16 512GB,A17 Bionic,874.996429,1084.325301,902.638889,899.0,871.662125,2,174.0,6.0,6.1,3600.0,12.0,12.0,1,48.0,48.0,1
3,Apple,iPhone 16 Plus 128GB,A17 Bionic,892.853571,1084.325301,860.972222,899.0,871.662125,2,203.0,6.0,6.7,4200.0,12.0,12.0,1,48.0,48.0,1
4,Apple,iPhone 16 Plus 256GB,A17 Bionic,928.567857,1144.566265,902.638889,949.0,926.158038,2,203.0,6.0,6.7,4200.0,12.0,12.0,1,48.0,48.0,1


## 3. Data Cleaning
Remove the one extreme price outlier and encode brand as a numeric label.

In [14]:
# Remove extreme price outliers using 3*IQR rule
Q1, Q3 = df['Price USD_USA'].quantile(0.25), df['Price USD_USA'].quantile(0.75)
df = df[df['Price USD_USA'] <= Q3 + 3 * (Q3 - Q1)].copy()
print(f'Rows after cleaning: {len(df)}')
print(f'Price range: ${df["Price USD_USA"].min():.0f} - ${df["Price USD_USA"].max():.0f}')

# Encode brand name as integer label
le = LabelEncoder()
df['brand_enc'] = le.fit_transform(df['Company Name'])

df.select_dtypes(include='number').describe().round(2)

Rows after cleaning: 927
Price range: $79 - $2599


Unnamed: 0,Price USD_Pakistan,Price USD_India,Price USD_China,Price USD_USA,Price USD_Dubai,device_age,weight_g,ram_gb,screen_in,battery_mah,front_mp_max,front_mp_sum,front_cam_count,back_mp_max,back_mp_sum,back_cam_count,brand_enc
count,926.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0,927.0
mean,446.38,604.27,540.42,578.87,590.55,3.81,227.92,7.78,7.08,5023.34,18.18,18.28,1.01,46.89,55.49,1.63,8.14
std,361.02,479.59,578.8,409.43,413.7,1.86,105.29,3.18,1.53,1353.62,12.0,12.16,0.11,31.09,36.8,0.81,5.36
min,57.14,72.28,69.31,79.0,81.47,1.0,135.0,1.0,5.0,2000.0,2.0,2.0,1.0,5.0,5.0,1.0,0.0
25%,196.42,240.95,236.04,250.0,272.48,2.0,185.0,6.0,6.5,4400.0,8.0,8.0,1.0,16.0,26.0,1.0,3.0
50%,303.57,421.67,388.89,449.0,456.4,3.0,194.0,8.0,6.67,5000.0,16.0,16.0,1.0,50.0,50.0,1.0,9.0
75%,642.85,902.41,763.75,824.0,871.66,5.0,208.0,8.0,6.78,5082.5,32.0,32.0,1.0,50.0,64.0,2.0,13.0
max,2160.71,3132.52,13999.0,2599.0,2860.76,12.0,732.0,16.0,14.6,11200.0,60.0,68.0,2.0,200.0,212.0,4.0,18.0


## 4. Prepare Features & Target
Leveraging all numeric spec columns from the transformed dataset.

In [15]:
features = [
    'brand_enc',
    'device_age',
    'weight_g',
    'ram_gb',
    'screen_in',
    'battery_mah',
    'front_mp_max',
    'front_cam_count',
    'back_mp_max',
    'back_cam_count',
]

X = df[features]
y = df['Price USD_USA']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f'Train: {len(X_train)} samples | Test: {len(X_test)} samples')

Train: 741 samples | Test: 186 samples


## 5. Train Model

In [16]:
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print('Training complete.')

Training complete.


## 6. Evaluate Model

In [17]:
y_pred = model.predict(X_test)

mae  = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2   = r2_score(y_test, y_pred)

print(f'MAE  : ${mae:.2f}  (avg prediction error)')
print(f'RMSE : ${rmse:.2f}')
print(f'R²   : {r2:.4f}  (1.0 = perfect)')

MAE  : $106.19  (avg prediction error)
RMSE : $156.09
R²   : 0.8574  (1.0 = perfect)


## 7. Feature Importance
Which specs matter most for predicting price?

In [18]:
importance = pd.Series(model.feature_importances_, index=features)
importance.sort_values(ascending=False).round(4)

ram_gb             0.2453
weight_g           0.2083
brand_enc          0.1904
front_mp_max       0.1322
battery_mah        0.0754
screen_in          0.0644
back_cam_count     0.0446
back_mp_max        0.0290
device_age         0.0097
front_cam_count    0.0006
dtype: float64

## 8. Predict a New Phone
Edit the values below and re-run to estimate the USA price of any phone.

In [20]:
new_phone = pd.DataFrame([{
    'brand_enc'      : le.transform(['Samsung'])[0],  # change brand here
    'device_age'     : 2,       # years since launch
    'weight_g'       : 195,     # grams
    'ram_gb'         : 8,       # GB
    'screen_in'      : 6.7,     # inches
    'battery_mah'    : 5000,    # mAh
    'front_mp_max'   : 12,      # MP
    'front_cam_count': 1,       # number of front cameras
    'back_mp_max'    : 50,      # MP
    'back_cam_count' : 3,       # number of rear cameras
}])

predicted_price = model.predict(new_phone)[0]
print(f'Predicted USA Price: ${predicted_price:.2f}')
print('Available brands:', sorted(le.classes_.tolist()))

Predicted USA Price: $714.86
Available brands: ['Apple', 'Google', 'Honor', 'Huawei', 'Infinix', 'Lenovo', 'Motorola', 'Nokia', 'OnePlus', 'Oppo', 'POCO', 'Poco', 'Realme', 'Samsung', 'Sony', 'Tecno', 'Vivo', 'Xiaomi', 'iQOO']
