<b>Data Source:</b><br> 
<b>Indian smartphone price prediction dataset, Roshan Yadav.</b><br> 
<b>Retrieved from</b> https://www.kaggle.com/datasets/rohsanyadav/smartphones-dataset

In this analysis, I used a multiple linear regression model to predict smartphone prices based on various features such as brand, processor type, display type, operating system, RAM, storage, and camera quality. I cleaned the dataset, converted categorical variables into numerical values using label encoding, and built the model using scikit-learn.

## Importing Library

In [111]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import f_regression
from sklearn.preprocessing import StandardScaler

In [113]:
data = pd.read_csv('smartphones_data.csv')
data.head()

Unnamed: 0,brand_name,Name,Price,RAM,OS,storage,Battery_cap,has_fast_charging,has_fingerprints,has_nfc,has_5g,processor_brand,num_core,primery_rear_camera,Num_Rear_Cameras,primery_front_camera,num_front_camera,display_size(inch),refresh_rate(hz),display_types
0,vivo,vivo v50,34999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,50.0,1,6.77,120.0,amoled display
1,realme,realme p3 pro,21999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,16.0,1,6.83,120.0,amoled display
2,realme,realme 14 pro plus,27999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,3,32.0,1,6.83,120.0,oled display
3,samsung,samsung galaxy s25 ultra,129999,12.0,android,256.0,5000,Yes,Yes,Yes,Yes,snapdragon,8.0,200.0,4,12.0,1,6.9,120.0,amoled display
4,vivo,vivo t3 pro,22999,8.0,android,128.0,5500,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,16.0,1,6.77,120.0,amoled display


## Data Selection
As part of the data selection process, the ‘Name’ column was removed, as it does not provide relevant information for the prediction task.

In [116]:
data.drop(columns=['Name'], inplace=True)
data

Unnamed: 0,brand_name,Price,RAM,OS,storage,Battery_cap,has_fast_charging,has_fingerprints,has_nfc,has_5g,processor_brand,num_core,primery_rear_camera,Num_Rear_Cameras,primery_front_camera,num_front_camera,display_size(inch),refresh_rate(hz),display_types
0,vivo,34999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,50.0,1,6.770,120.0,amoled display
1,realme,21999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,16.0,1,6.830,120.0,amoled display
2,realme,27999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,3,32.0,1,6.830,120.0,oled display
3,samsung,129999,12.0,android,256.0,5000,Yes,Yes,Yes,Yes,snapdragon,8.0,200.0,4,12.0,1,6.900,120.0,amoled display
4,vivo,22999,8.0,android,128.0,5500,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,16.0,1,6.770,120.0,amoled display
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3255,ikall,4799,4.0,android,64.0,4000,No,Yes,No,No,tru-mediatek,8.0,13.0,1,8.0,1,6.500,,lcd display
3256,Other,5299,2.0,android,16.0,3300,No,Yes,No,No,quad,4.0,5.0,1,5.0,1,5.340,,lcd display
3257,Other,5599,2.0,android,16.0,3500,No,,,,quad,4.0,5.0,1,5.0,1,5.700,,lcd display
3258,Other,6499,2.0,android,32.0,5000,No,,,,tru-mediatek,8.0,13.0,1,5.0,1,6.088,,lcd display


## Checking the missing value
I checked the dataset for missing (null) values and removed variables with a large number of missing entries, as they could negatively affect the model's performance. This step helped ensure the remaining data was cleaner, more reliable, and better suited for building an accurate predictive model.

In [119]:
Error = data.isna().sum()
print(Error)

brand_name                 0
Price                      0
RAM                        0
OS                         0
storage                    0
Battery_cap                0
has_fast_charging          0
has_fingerprints         726
has_nfc                  726
has_5g                   726
processor_brand            0
num_core                 175
primery_rear_camera        0
Num_Rear_Cameras           0
primery_front_camera       0
num_front_camera           0
display_size(inch)         0
refresh_rate(hz)        1731
display_types              0
dtype: int64


## Imputation
Missing values were filled using the mode and median of each column. While this method may reduce data variability and introduce slight bias, it helps retain most of the dataset and ensures the model can be trained effectively without losing too much information. Note that, because refresh_rate(hz) missing value take around 20% of the dataset, it will be drop or remove.

In [122]:
categorical_cols = ['has_fingerprints', 'has_nfc', 'has_5g']
imputer = SimpleImputer(strategy='most_frequent')#mode imputating
data[categorical_cols] = imputer.fit_transform(data[categorical_cols]) #median imputating
numerical_cols = ['num_core', 'refresh_rate(hz)']
imputer = SimpleImputer(strategy='median')
data[numerical_cols] = imputer.fit_transform(data[numerical_cols])
data = data.drop(columns=['refresh_rate(hz)']) #droping refesh rate
print(data.isnull().sum())  # Check for remaining missing values
print(data.shape)  # Check dataset size after cleaning


brand_name              0
Price                   0
RAM                     0
OS                      0
storage                 0
Battery_cap             0
has_fast_charging       0
has_fingerprints        0
has_nfc                 0
has_5g                  0
processor_brand         0
num_core                0
primery_rear_camera     0
Num_Rear_Cameras        0
primery_front_camera    0
num_front_camera        0
display_size(inch)      0
display_types           0
dtype: int64
(3260, 18)


In [124]:
data

Unnamed: 0,brand_name,Price,RAM,OS,storage,Battery_cap,has_fast_charging,has_fingerprints,has_nfc,has_5g,processor_brand,num_core,primery_rear_camera,Num_Rear_Cameras,primery_front_camera,num_front_camera,display_size(inch),display_types
0,vivo,34999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,50.0,1,6.770,amoled display
1,realme,21999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,16.0,1,6.830,amoled display
2,realme,27999,8.0,android,128.0,6000,Yes,Yes,No,Yes,snapdragon,8.0,50.0,3,32.0,1,6.830,oled display
3,samsung,129999,12.0,android,256.0,5000,Yes,Yes,Yes,Yes,snapdragon,8.0,200.0,4,12.0,1,6.900,amoled display
4,vivo,22999,8.0,android,128.0,5500,Yes,Yes,No,Yes,snapdragon,8.0,50.0,2,16.0,1,6.770,amoled display
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3255,ikall,4799,4.0,android,64.0,4000,No,Yes,No,No,tru-mediatek,8.0,13.0,1,8.0,1,6.500,lcd display
3256,Other,5299,2.0,android,16.0,3300,No,Yes,No,No,quad,4.0,5.0,1,5.0,1,5.340,lcd display
3257,Other,5599,2.0,android,16.0,3500,No,Yes,No,No,quad,4.0,5.0,1,5.0,1,5.700,lcd display
3258,Other,6499,2.0,android,32.0,5000,No,Yes,No,No,tru-mediatek,8.0,13.0,1,5.0,1,6.088,lcd display


## Trun categorical data into numerical
To prepare the dataset for modeling, categorical features were converted into numerical form. The ‘brand_name’ column was encoded using LabelEncoder, and the original encoded column was subsequently dropped. Binary columns such as ‘has_fast_charging’ and ‘has_nfc’ were mapped to 1 for "Yes" and 0 for "No". Additional categorical features—‘OS’, ‘processor_brand’, and ‘display_types’—were also label encoded, with mappings stored for reference. This process ensured the dataset was fully numeric and compatible with regression algorithms. Note: The Encoder description is below.

In [127]:
# Initialize the encoder
encoder = LabelEncoder()

# 1. Encode 'brand_name'
data['brand_name_encoded'] = encoder.fit_transform(data['brand_name'])
brand_name_mapping = dict(zip(encoder.classes_, encoder.transform(encoder.classes_)))
data['brand_name'] = data['brand_name_encoded']
data = data.drop(columns=['brand_name_encoded'])

# 2. Binary columns (Yes/No → 1/0)
binary_columns = ['has_fast_charging', 'has_fingerprints', 'has_nfc', 'has_5g']
binary_mapping = {'Yes': 1, 'No': 0}
for col in binary_columns:
    data[col] = data[col].map(binary_mapping)

# 3. Categorical columns to LabelEncode
categorical_columns = ['OS', 'processor_brand', 'display_types']
category_mappings = {}

for col in categorical_columns:
    data[col + '_encoded'] = encoder.fit_transform(data[col])
    category_mappings[col] = dict(zip(encoder.classes_, encoder.transform(encoder.classes_)))
    data[col] = data[col + '_encoded']
    data = data.drop(columns=[col + '_encoded'])

# 4. Print updated DataFrame
print(" Updated Data (first 5 rows):")
print(data.head())

# 5. Show all mappings (keterangan)
print("\n===  Encoded Mappings (Keterangan) ===")
print(" Brand Name Mapping:")
print(brand_name_mapping)

print("\n Binary Columns Mapping (Yes/No):")
print(binary_mapping)

for col in categorical_columns:
    print(f"\n {col.capitalize()} Mapping:")
    print(category_mappings[col])


 Updated Data (first 5 rows):
   brand_name   Price   RAM  OS  storage  Battery_cap  has_fast_charging  \
0          30   34999   8.0   0    128.0         6000                  1   
1          26   21999   8.0   0    128.0         6000                  1   
2          26   27999   8.0   0    128.0         6000                  1   
3          27  129999  12.0   0    256.0         5000                  1   
4          30   22999   8.0   0    128.0         5500                  1   

   has_fingerprints  has_nfc  has_5g  processor_brand  num_core  \
0                 1        0       1               10       8.0   
1                 1        0       1               10       8.0   
2                 1        0       1               10       8.0   
3                 1        1       1               10       8.0   
4                 1        0       1               10       8.0   

   primery_rear_camera  Num_Rear_Cameras  primery_front_camera  \
0                 50.0                 2    

In [129]:
data.describe()

Unnamed: 0,brand_name,Price,RAM,OS,storage,Battery_cap,has_fast_charging,has_fingerprints,has_nfc,has_5g,processor_brand,num_core,primery_rear_camera,Num_Rear_Cameras,primery_front_camera,num_front_camera,display_size(inch),display_types
count,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0,3260.0
mean,18.343558,20181.384356,5.065874,0.042945,112.040893,4163.485583,0.473313,0.956135,0.256135,0.307669,7.887117,7.138037,32.655828,2.076994,12.555767,1.026994,6.09711,0.972086
std,10.600289,24145.388368,3.256896,0.217371,126.893532,1312.404904,0.499364,0.204826,0.436564,0.461599,2.901707,1.649559,29.397695,0.990856,10.564795,0.16209,0.741478,0.948903
min,0.0,2500.0,0.25,0.0,0.31,1100.0,0.0,0.0,0.0,0.0,0.0,1.0,0.3,1.0,0.3,1.0,2.4,0.0
25%,9.0,7490.0,3.0,0.0,32.0,3007.5,0.0,1.0,0.0,0.0,6.0,8.0,12.0,1.0,5.0,1.0,5.5,0.0
50%,22.0,11999.0,4.0,0.0,64.0,4500.0,0.0,1.0,0.0,0.0,8.0,8.0,16.0,2.0,8.0,1.0,6.455,1.0
75%,27.0,21999.0,8.0,0.0,128.0,5000.0,1.0,1.0,1.0,1.0,10.0,8.0,50.0,3.0,16.0,1.0,6.67,1.0
max,32.0,200999.0,24.0,2.0,1024.0,22000.0,1.0,1.0,1.0,1.0,14.0,10.0,200.0,5.0,60.0,2.0,8.03,4.0


## Data Explanation

* Smartphone prices range widely from ₹2,500 to ₹200,999, with an average of approximately ₹20,181. This indicates a broad spectrum of devices from budget-friendly to premium models.
* RAM varies from as low as 0.25 GB to 24 GB, with a mean of 5.07 GB. This reflects the presence of both entry-level and high-performance smartphones in the dataset.
* Storage capacity ranges from just 0.31 GB to 1,024 GB, with an average of about 112 GB. This shows substantial differences in internal storage options across devices.
* Battery capacity spans from 1,100 mAh to 22,000 mAh, with a mean of approximately 4,163 mAh, indicating a wide variety of battery performance levels.
* Display size ranges from 2.4 inches to 8.03 inches, averaging around 6.1 inches. Most smartphones appear to fall within the standard size for modern devices.
* Primary rear camera resolution ranges from 0.3 MP to 200 MP, with an average of 32.7 MP, suggesting that many phones prioritize high-quality rear photography.
* Primary front camera resolution varies between 0.3 MP and 60 MP, with an average of 12.6 MP, showing diversity in selfie camera capabilities.
* Number of rear cameras ranges from 1 to 5, and front cameras from 1 to 2, indicating a variety of hardware configurations.
* Number of processor cores ranges from 1 to 14, with a mean of about 7.1 cores, reflecting the wide range in processing power across models.
* Operating systems are encoded, with most devices running on a dominant system (likely Android, encoded as 0), as reflected by a low average of 0.04.
* Fast charging is supported by 47.3% of devices, while fingerprint sensors are present in over 95% of them, indicating standard security features in most smartphones.
* NFC is available in 25.6% of devices, and 5G connectivity is present in 30.8%, showing that not all smartphones in the dataset support newer wireless technologies.

## Regression

In [133]:
y = data['Price']
x_ns = data[['brand_name', 'RAM', 'OS', 'storage', 'Battery_cap', 'has_fast_charging',
          'has_fingerprints', 'has_nfc', 'has_5g', 'processor_brand', 'num_core',
          'primery_rear_camera', 'Num_Rear_Cameras', 'primery_front_camera',
          'num_front_camera', 'display_size(inch)', 'display_types']]

## Feature Scaling (Standardization)
Feature Scaling (Standardization) transform my coefficient so it show how much each coefficient really weight or influence in model.

In [136]:
scaler = StandardScaler()
scaler.fit(x_ns)
x = scaler.transform(x_ns)

## Regression

In [139]:
reg = LinearRegression()
reg.fit(x,y)

In [141]:
reg.coef_

array([ -879.23603023,  4024.2863234 ,  4429.49860057, 12199.64802632,
       -3086.07417619, -1382.95574025, -2674.96669287,  5504.85458272,
         303.12700206, -1206.76770224,  -182.00633409, -1010.48645164,
        1533.47393629, -2418.72747366,  1921.38939237,  2256.17550183,
         -75.83974399])

In [143]:
reg.intercept_

20181.384355828224

In [145]:
reg.score(x,y)

0.7343334680457039

In [147]:
x.shape

(3260, 17)

### Adjusted R-squere

In [150]:
r2 = reg.score(x,y)
n = x.shape[0] #3260 number of observation
p = x.shape[1] #17 number of predictors
adjusted_r2 = 1-(1-r2)*(n-1)/(n-p-1)
adjusted_r2

0.7329403986307677

## P-value

In [153]:
f_regression(x,y)
p_value = f_regression(x,y)[1]
p_value.round(3)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

## Summary Table

In [156]:
reg_summary = pd.DataFrame({
    'Features': x_ns.columns,            
    'Coefficient/Weights': reg.coef_,
    'P-values': p_value.round(3)
})
reg_summary

Unnamed: 0,Features,Coefficient/Weights,P-values
0,brand_name,-879.23603,0.0
1,RAM,4024.286323,0.0
2,OS,4429.498601,0.0
3,storage,12199.648026,0.0
4,Battery_cap,-3086.074176,0.0
5,has_fast_charging,-1382.95574,0.0
6,has_fingerprints,-2674.966693,0.0
7,has_nfc,5504.854583,0.0
8,has_5g,303.127002,0.0
9,processor_brand,-1206.767702,0.0


## Analysis
<b>R-squared (0.7343):</b> This value indicates that approximately 73.43% of the variability in the target variable (likely smartphone price) is explained by the selected features. This is a strong result, suggesting the model fits the data well.

<b>Adjusted R-squared (0.7329):</b> This value is slightly lower than the R-squared, accounting for the number of predictors in the model. The close proximity of the adjusted R-squared to R-squared suggests that the model’s complexity is appropriate and not overfitting.

-----

<p><b>Brand Name (-879.24):</b> The negative coefficient for the brand indicates that as the brand name encoded variable increases, the predicted price decreases. However, since the p-value is 0.0 (statistically significant), the model strongly indicates that brand has an important influence on price.</p>

<p><b>RAM (4024.29):</b> RAM has a positive coefficient, meaning higher RAM contributes to a higher smartphone price. The high statistical significance (p-value = 0.0) suggests that RAM is a key factor in determining smartphone price.</p>

<p><b>Operating System (4429.50):</b> The positive coefficient for the operating system (OS) implies that devices running certain operating systems are priced higher. This is a common trend where iOS devices tend to be priced higher compared to Android devices.</p>

<p><b>Storage (12199.65):</b> Storage is a strong predictor of price, with higher storage leading to higher prices. The very low p-value indicates that storage is a highly significant feature in the model.</p>

<p><b>Battery Capacity (-3086.07):</b> The negative coefficient for battery capacity suggests that smartphones with higher battery capacity may have a lower price, which could reflect a trade-off between battery size and other more expensive features.</p>

<p><b>Has Fast Charging (-1382.96):</b> Fast charging capability shows a negative association with price in this model. While this may seem counterintuitive, it could reflect a market trend where fast charging is more commonly featured in mid-range or budget smartphones.</p>

<p><b>Has Fingerprints (-2674.97):</b> The presence of fingerprint sensors seems to negatively impact the price, which might indicate that this feature is more common in mid-range phones rather than premium ones.</p>

<p><b>Has NFC (5504.85):</b> NFC support contributes positively to the price, suggesting that smartphones with NFC are priced higher. This aligns with trends where NFC-capable devices tend to be more feature-rich.</p>

<p><b>Has 5G (303.13):</b> 5G support has a relatively small positive effect on price, implying that while 5G-enabled smartphones might be priced slightly higher, it does not have as significant an impact as other features like RAM or storage.</p>

<p><b>Processor Brand (-1206.77):</b> The negative coefficient for processor brand suggests that certain processors may be associated with lower-priced smartphones, which may reflect the use of mid-range processors in budget models.</p>

<p><b>Number of Cores (-182.00):</b> The negative coefficient here implies that more CPU cores lead to a slight decrease in price, which might be surprising but could be due to factors like diminishing returns on performance improvements as more cores are added.</p>

<p><b>Primary Rear Camera (-1010.49):</b> Higher rear camera resolution shows a slight negative impact on price. This may be due to some budget or mid-range smartphones offering high megapixel counts, though the actual camera quality may not match that of premium models with lower megapixel.</p>

<p><b>Number of Rear Cameras (1533.47):</b> The positive coefficient indicates that a higher number of rear cameras results in a higher price, as multi-camera setups tend to be associated with premium devices.</p>

<p><b>Primary Front Camera (-2418.73):</b> Higher front camera resolution also shows a negative impact on price. Similar to rear cameras, some lower-cost smartphones may offer high megapixel front cameras, but the overall quality may still fall short of premium standards.</p>

<p><b>Number of Front Cameras (1921.39):</b> The positive coefficient indicates that devices with more front cameras are priced higher, reflecting an emphasis on enhanced selfie capabilities in premium smartphones.</p>

<p><b>Display Size (2256.18):</b> Display size has a strong positive impact on price, with larger screens often being found in higher-end models, justifying their higher price points.</p>

<p><b>Display Type (-75.84):</b> The negative coefficient suggests that certain display types are associated with slightly lower prices. This may indicate that more affordable display technologies are commonly used across various price segments.</p>

The model's results indicate that features such as RAM, storage, OS, rear cameras, and display size play a significant role in determining smartphone price. Several features, including battery capacity and processor brand, show negative coefficients, suggesting that they might not always contribute to higher prices. The high statistical significance (p-values of 0.0) for all features reinforces the model’s reliability in predicting price based on these factors.

## Making Predictions
Let's make predictions using dummy data from three fictional smartphone specifications. We’ve selected samples representing low-end, mid-range, and high-end devices, each with their probable features, to compare and evaluate how well the prediction model processes the input.

In [163]:
# Define your feature list
feature_list = ['brand_name', 'RAM', 'OS', 'storage', 'Battery_cap',
                'has_fast_charging', 'has_fingerprints', 'has_nfc', 'has_5g',
                'processor_brand', 'num_core', 'primery_rear_camera', 'Num_Rear_Cameras',
                'primery_front_camera', 'num_front_camera', 'display_size(inch)', 'display_types']

# Dummy inputs: low-end, mid-range, high-end
low_sample =  [32, 0.25, 0, 0.31, 1100, 0, 0, 0, 0, 14, 1, 0.3, 1, 0.3, 1, 2.4, 4]
mid_sample =  [22, 4, 0, 64, 4500, 0, 1, 0, 0, 8, 8, 16, 2, 8, 1, 6.455, 1]
high_sample = [1, 24, 1, 1024, 22000, 1, 1, 1, 1, 1, 10, 200, 5, 60, 2, 8.03, 0]

# Wrap into DataFrame
dummy_data = pd.DataFrame([low_sample, mid_sample, high_sample], columns=feature_list)

# Standardize using the scaler from training
dummy_scaled = scaler.transform(dummy_data)

# Predict prices
predicted_prices = reg.predict(dummy_scaled)

# Add predictions to the original dummy_data for display
dummy_data['Predicted_Price'] = predicted_prices.round(2)

# View results
print(dummy_data[['brand_name', 'RAM', 'OS', 'storage', 'Battery_cap',
                'has_fast_charging', 'has_fingerprints', 'has_nfc', 'has_5g',
                'processor_brand', 'num_core', 'primery_rear_camera', 'Num_Rear_Cameras',
                'primery_front_camera', 'num_front_camera', 'display_size(inch)', 'display_types']])
# Mapping yang sebelumnya sudah disimpan
brand_map = {0: 'Other', 1: 'apple', 2: 'asus', 3: 'coolpad', 4: 'gionee', 5: 'google',
             6: 'honor', 7: 'htc', 8: 'ikall', 9: 'infinix', 10: 'intex', 11: 'iqoo',
             12: 'itel', 13: 'karbonn', 14: 'lava', 15: 'lenovo', 16: 'lg', 17: 'lyf',
             18: 'micromax', 19: 'moto', 20: 'motorola', 21: 'nokia', 22: 'oneplus',
             23: 'oppo', 24: 'panasonic', 25: 'poco', 26: 'realme', 27: 'samsung',
             28: 'sony', 29: 'tecno', 30: 'vivo', 31: 'xiaomi', 32: 'xolo'}

binary_map = {0: 'No', 1: 'Yes'}

os_map = {0: 'android', 1: 'ios', 2: 'other'}

proc_map = {0: 'apple', 1: 'broadcom', 2: 'google', 3: 'hisilicon', 4: 'huawei', 5: 'intel',
            6: 'mediatek', 7: 'nvidia', 8: 'quad', 9: 'samsung', 10: 'snapdragon',
            11: 'spreadtrum', 12: 'st-ericsson', 13: 'tru-mediatek', 14: 'unisoc'}

display_map = {0: 'amoled display', 1: 'lcd display', 2: 'oled display',
               3: 'other display', 4: 'tft display'}



   brand_name    RAM  OS  storage  Battery_cap  has_fast_charging  \
0          32   0.25   0     0.31         1100                  0   
1          22   4.00   0    64.00         4500                  0   
2           1  24.00   1  1024.00        22000                  1   

   has_fingerprints  has_nfc  has_5g  processor_brand  num_core  \
0                 0        0       0               14         1   
1                 1        0       0                8         8   
2                 1        1       1                1        10   

   primery_rear_camera  Num_Rear_Cameras  primery_front_camera  \
0                  0.3                 1                   0.3   
1                 16.0                 2                   8.0   
2                200.0                 5                  60.0   

   num_front_camera  display_size(inch)  display_types  
0                 1               2.400              4  
1                 1               6.455              1  
2                 

In [165]:
# Dummy input as before
dummy_data = pd.DataFrame([low_sample, mid_sample, high_sample], columns=feature_list)

# Standardize and predict
dummy_scaled = scaler.transform(dummy_data)
dummy_data['Predicted_Price'] = reg.predict(dummy_scaled).round(2)

# Decode categorical features
dummy_data['brand_name'] = dummy_data['brand_name'].map(brand_map)
dummy_data['OS'] = dummy_data['OS'].map(os_map)
dummy_data['has_fast_charging'] = dummy_data['has_fast_charging'].map(binary_map)
dummy_data['has_fingerprints'] = dummy_data['has_fingerprints'].map(binary_map)
dummy_data['has_nfc'] = dummy_data['has_nfc'].map(binary_map)
dummy_data['has_5g'] = dummy_data['has_5g'].map(binary_map)
dummy_data['processor_brand'] = dummy_data['processor_brand'].map(proc_map)
dummy_data['display_types'] = dummy_data['display_types'].map(display_map)
# Conversion rate from INR to USD (example rate, you can update it)
inr_to_usd_rate = 82
dummy_data['Predicted_Price_in_USD'] = (dummy_data['Predicted_Price'] / inr_to_usd_rate).round(2)
print(dummy_data)


  brand_name    RAM       OS  storage  Battery_cap has_fast_charging  \
0       xolo   0.25  android     0.31         1100                No   
1    oneplus   4.00  android    64.00         4500                No   
2      apple  24.00      ios  1024.00        22000               Yes   

  has_fingerprints has_nfc has_5g processor_brand  num_core  \
0               No      No     No          unisoc         1   
1              Yes      No     No            quad         8   
2              Yes     Yes    Yes        broadcom        10   

   primery_rear_camera  Num_Rear_Cameras  primery_front_camera  \
0                  0.3                 1                   0.3   
1                 16.0                 2                   8.0   
2                200.0                 5                  60.0   

   num_front_camera  display_size(inch)   display_types  Predicted_Price  \
0                 1               2.400     tft display          7623.35   
1                 1               6.455  

## Conclution

The predictions based on three sample smartphones demonstrate how different features influence their prices:

* <b>Xolo (Low-end):</b> The low-end model with minimal RAM, storage, and a basic display technology predicts a price of ₹7623.35 (approx. $93). This aligns with expectations for budget smartphones, where lower specifications lead to a more affordable price.

*  <b>OnePlus (Mid-range):</b> The mid-range device, with a larger RAM, better storage, and LCD display, has a predicted price of ₹11701.94 (approx. $1423). This reflects the higher value placed on better performance and features compared to budget models.

* <b>Apple (High-end):</b> The high-end model, equipped with premium features such as high RAM, large storage, AMOLED display, and advanced camera specifications, shows a significantly higher predicted price of ₹126022.57 (approx. $1537). This is consistent with premium devices, which tend to command much higher prices due to advanced features and brand value.

These results confirm that the model successfully differentiates pricing based on varying feature sets, where higher specifications such as RAM, storage, display type, and brand contribute to a higher predicted price.