## Table of contents
- [1 - Problem Statement](#1)
- [2 - What spells success for this project](#2)
- [3 - Feature Engineering/Data Preprocessing](#3)
- [4 - ML Modeling](#4)
- [5 - Hyper parameter optimization](#5)
- [6 - Summary](#6)
- [7 - Conclusion](#7)
- [8 - Recommendations](#8)
- [9 - Limitations](#9)
- [10 - References](#10)

<a name='1'></a>
## Problem Statement
**This project aims at predicting the prices of several mobile phones.**
> This comprehensive dataset comprises information about 8277 mobile phones, meticulously gathered from the authoritative source, PhoneDB. The dataset encompasses a vast array of mobile phone specifications, ranging from essential details like brand and model to intricate technical attributes such as CPU clock, camera specifications, battery capacity, and more. This dataset is a treasure trove for mobile technology enthusiasts, researchers, and data analysts seeking to explore and analyze the evolution of mobile devices over time.

**Description of variables**

`Brand: The brand name of the mobile phone.
Model: The model name or number of the mobile phone.
Released: The date when the phone was released to the market.
Announced: The date when the phone was officially announced.
Hardware Designer: The company responsible for designing the phone's hardware.
Manufacturer: The manufacturer of the mobile phone.
General Extras: Miscellaneous additional features of the phone.
Width, Height, Depth: The physical dimensions of the phone.
Dimensions: Comprehensive dimensions of the phone.
Mass: The weight of the phone.
Platform: The platform on which the phone operates.
Operating System: The operating system running on the phone.
CPU Clock: The clock speed of the phone's CPU.
CPU: The Central Processing Unit of the phone.
RAM Type: The type of Random Access Memory used.
RAM Capacity : RAM capacity converted for uniformity.
Non-volatile Memory Interface: Interface for non-volatile memory.
Display Diagonal: The diagonal size of the phone's display.
Resolution: The display resolution in pixels.
Pixel Density: Pixels per inch on the display.
Display Area Utilization: Proportion of the device front used by display.
Display Type: The type of display technology used.
Display Refresh Rate: The refresh rate of the display.
Scratch Resistant Screen: Presence of scratch-resistant screen.
Graphical Controller: Graphics processing unit details.
GPU Clock: Clock speed of the GPU.
Expansion Interfaces: Interfaces for expansion.
USB Services: USB services provided.
USB Connector: Type of USB connector.
Max. Charging Power: Maximum charging power supported.
Bluetooth: Bluetooth version supported.
Camera Placement: Placement of the phone's camera.
Camera Image Sensor: Image sensor used in the camera.
Image Sensor Pixel Size: Size of individual image sensor pixels.
Aperture (W): Aperture width for improved photography.
Zoom: Zoom capabilities of the camera.
Video Recording: Video recording specifications.
Camera Extra Functions: Additional functions of the camera.
Secondary Video Recording: Specifications for secondary video recording.
Nominal Battery Capacity: The advertised battery capacity.
Estimated Battery Life: Approximate battery life estimation.
Market Countries Countries where the phone is available.
Market Regions: Regions where the phone is available.
Price: Price of the mobile phone.
Memory Capacity: Capacity of the device's memory.
Cam1_mp: Primary camera resolution in megapixels.
Cam2_mp: Secondary camera resolution in megapixels.`

<a name='2'></a>
## What spells success for this project
 at the end of the project, I will have done justice to the following 
 >
- Analyzing trends in mobile phone specifications over time.
- Identifying correlations between specifications and market success.
- Comparing different brands and models based on technical attributes.
- Predicting mobile phone prices based on specifications.

<a name='3'></a>
## Feature engineering/data preprocessing

In [4]:
import pandas as pd 
import numpy as np
import seaborn as sns
%matplotlib inline
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from xgboost import  XGBRFRegressor
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from warnings import filterwarnings

In [6]:
#import the data
data= pd.read_csv('mobile.csv')
data.head(3)

Unnamed: 0.1,Unnamed: 0,Brand,Model,Released,Announced,Hardware Designer,Manufacturer,General Extras,Width,Height,Depth,Dimensions,Mass,Platform,Operating System,CPU Clock,CPU,RAM Type,RAM Capacity (converted),Non-volatile Memory Interface,Display Diagonal,Resolution,Pixel Density,Display Area Utilization,Display Type,Display Refresh Rate,Scratch Resistant Screen,Graphical Controller,GPU Clock:,Expansion Interfaces,USB Services,USB Connector,Max. Charging Power,Bluetooth,Camera Placement,Camera Image Sensor,Image Sensor Pixel Size,Aperture (W),Zoom,Video Recording,Camera Extra Functions,Secondary Video Recording,Nominal Battery Capacity,Estimated Battery Life,Market Countries,Market Regions,Price,Memory Capacity,Cam1_mp,Cam2_mp
0,0,Sony,Xperia L2 LTE-A AM H3321,2018-01-26,2018-01-08,Sony,Sony,Haptic touch feedback,78.0,150.0,9.8,3.07x5.91x0.39 inches,178.0,Android,Google Android 7.1.1 (Nougat),1450.0,"MediaTek MT6737T, 2016, 64 bit, quad-core, 28...",LPDDR3 SDRAM,3 GiB RAM,eMMC 5.0,139.7,720x1280,267.0,71.3%,Color TN-TFT LCD display,,Yes,ARM Mali-T720MP2,,"TransFlash , microSD , microSDHC , microSDXC",USB charging,USB C reversible,,Bluetooth 4.2,Rear,CMOS,,f/2.00,1.0 x optical zoom,1920x1080 pixel,"HDR photo , Macro mode , Panorama Photo , Fac...",1920x1080 pixel,3300.0,,"Brazil , USA","North America , South America",,32.0,12.8,7.7
1,1,Sony,Xperia L2 Dual SIM TD-LTE EMEA H4311,2018-01-26,2018-01-08,Sony,Sony,Haptic touch feedback,78.0,150.0,9.8,3.07x5.91x0.39 inches,178.0,Android,Google Android 7.1.1 (Nougat),1450.0,"MediaTek MT6737T, 2016, 64 bit, quad-core, 28...",LPDDR3 SDRAM,3 GiB RAM,eMMC 5.0,139.7,720x1280,267.0,71.3%,Color TN-TFT LCD display,,Yes,ARM Mali-T720MP2,,"TransFlash , microSD , microSDHC , microSDXC",USB charging,USB C reversible,,Bluetooth 4.2,Rear,CMOS,,f/2.00,1.0 x optical zoom,1920x1080 pixel,"HDR photo , Macro mode , Panorama Photo , Fac...",1920x1080 pixel,3300.0,,"Czech , Germany , Hungary , Poland , Russia ,...","Eastern Europe , Europe , Middle East , West...",,32.0,12.8,7.7
2,2,LG,LMX210NMW K Series K9 2018 Dual SIM LTE EMEA,2018-03-24,2018-02-22,LG Electronics,LG Electronics,Haptic touch feedback,73.2,146.3,8.2,2.88x5.76x0.32 inches,152.0,Android,Google Android 7.1.2 (Nougat),1300.0,"Qualcomm Snapdragon 212 MSM8909v2, 2015, 32 b...",LPDDR3 SDRAM,2 GiB RAM,eMMC 4.5,127.0,720x1280,294.0,64.4%,Color IPS TFT LCD display,,Arc Glass,Qualcomm Adreno 304,409 MHz,"TransFlash , microSD , microSDHC",USB charging,USB Micro-B (Micro-USB),,Bluetooth 4.2,Rear,CMOS,1.12 micrometer,f/2.00,1.0 x optical zoom,,"HDR photo , Red-eye reduction , Burst mode , ...",1280x720 pixel,2500.0,,"Russia , Ukraine","Eastern Europe , Europe",,16.0,8.0,4.9


In [7]:
#drop the unamed column
data=data.drop(columns=['Unnamed: 0'],axis=1)
data.columns

Index(['Brand', 'Model', 'Released', 'Announced', 'Hardware Designer',
       'Manufacturer', 'General Extras', 'Width', 'Height', 'Depth',
       'Dimensions', 'Mass', 'Platform', 'Operating System', 'CPU Clock',
       'CPU', 'RAM Type', 'RAM Capacity (converted)',
       'Non-volatile Memory Interface', 'Display Diagonal', 'Resolution',
       'Pixel Density', 'Display Area Utilization', 'Display Type',
       'Display Refresh Rate', 'Scratch Resistant Screen',
       'Graphical Controller', 'GPU Clock:', 'Expansion Interfaces',
       'USB Services', 'USB Connector', 'Max. Charging Power', 'Bluetooth',
       'Camera Placement', 'Camera Image Sensor', 'Image Sensor Pixel Size',
       'Aperture (W)', 'Zoom', 'Video Recording', 'Camera Extra Functions',
       'Secondary Video Recording', 'Nominal Battery Capacity',
       'Estimated Battery Life', 'Market Countries', 'Market Regions', 'Price',
       'Memory Capacity', 'Cam1_mp', 'Cam2_mp'],
      dtype='object')

In [21]:
data.shape

(8277, 49)

In [8]:
#display the missing values in each columns
data.isnull().sum()

Brand                               0
Model                               0
Released                            7
Announced                         355
Hardware Designer                 333
Manufacturer                      902
General Extras                     22
Width                              17
Height                             17
Depth                              19
Dimensions                         19
Mass                              121
Platform                            0
Operating System                    1
CPU Clock                          14
CPU                                 8
RAM Type                            0
RAM Capacity (converted)            2
Non-volatile Memory Interface       0
Display Diagonal                    0
Resolution                          1
Pixel Density                       1
Display Area Utilization           18
Display Type                        0
Display Refresh Rate             4448
Scratch Resistant Screen            0
Graphical Co

In [9]:
#check for duplicates
data.duplicated().sum()

0

In [10]:
#check the data types
data.dtypes

Brand                             object
Model                             object
Released                          object
Announced                         object
Hardware Designer                 object
Manufacturer                      object
General Extras                    object
Width                            float64
Height                           float64
Depth                            float64
Dimensions                        object
Mass                             float64
Platform                          object
Operating System                  object
CPU Clock                        float64
CPU                               object
RAM Type                          object
RAM Capacity (converted)          object
Non-volatile Memory Interface     object
Display Diagonal                 float64
Resolution                        object
Pixel Density                    float64
Display Area Utilization          object
Display Type                      object
Display Refresh 

In [14]:
obj_data=data.select_dtypes(exclude=[np.number])
numerical_data=data.select_dtypes(include=[np.number])

In [17]:
obj_data.head(2)

Unnamed: 0,Brand,Model,Released,Announced,Hardware Designer,Manufacturer,General Extras,Dimensions,Platform,Operating System,CPU,RAM Type,RAM Capacity (converted),Non-volatile Memory Interface,Resolution,Display Area Utilization,Display Type,Scratch Resistant Screen,Graphical Controller,GPU Clock:,Expansion Interfaces,USB Services,USB Connector,Max. Charging Power,Bluetooth,Camera Placement,Camera Image Sensor,Image Sensor Pixel Size,Aperture (W),Zoom,Video Recording,Camera Extra Functions,Secondary Video Recording,Estimated Battery Life,Market Countries,Market Regions
0,Sony,Xperia L2 LTE-A AM H3321,2018-01-26,2018-01-08,Sony,Sony,Haptic touch feedback,3.07x5.91x0.39 inches,Android,Google Android 7.1.1 (Nougat),"MediaTek MT6737T, 2016, 64 bit, quad-core, 28...",LPDDR3 SDRAM,3 GiB RAM,eMMC 5.0,720x1280,71.3%,Color TN-TFT LCD display,Yes,ARM Mali-T720MP2,,"TransFlash , microSD , microSDHC , microSDXC",USB charging,USB C reversible,,Bluetooth 4.2,Rear,CMOS,,f/2.00,1.0 x optical zoom,1920x1080 pixel,"HDR photo , Macro mode , Panorama Photo , Fac...",1920x1080 pixel,,"Brazil , USA","North America , South America"
1,Sony,Xperia L2 Dual SIM TD-LTE EMEA H4311,2018-01-26,2018-01-08,Sony,Sony,Haptic touch feedback,3.07x5.91x0.39 inches,Android,Google Android 7.1.1 (Nougat),"MediaTek MT6737T, 2016, 64 bit, quad-core, 28...",LPDDR3 SDRAM,3 GiB RAM,eMMC 5.0,720x1280,71.3%,Color TN-TFT LCD display,Yes,ARM Mali-T720MP2,,"TransFlash , microSD , microSDHC , microSDXC",USB charging,USB C reversible,,Bluetooth 4.2,Rear,CMOS,,f/2.00,1.0 x optical zoom,1920x1080 pixel,"HDR photo , Macro mode , Panorama Photo , Fac...",1920x1080 pixel,,"Czech , Germany , Hungary , Poland , Russia ,...","Eastern Europe , Europe , Middle East , West..."


In [38]:
obj_data['Camera Image Sensor'].unique()

array([' CMOS', ' BSI CMOS', 'No'], dtype=object)

In [36]:
obj_data['General Extras'].unique()

array([' Haptic touch feedback',
       ' Haptic touch feedback , Tactile touch feedback',
       ' Haptic touch feedback , Active stylus',
       ' Haptic touch feedback , Passive stylus',
       ' Haptic touch feedback , Foldable screen',
       ' Haptic touch feedback , Tactile touch feedback , Display speaker',
       ' Haptic touch feedback , Display speaker',
       ' Haptic touch feedback , Built-in projector',
       ' Haptic touch feedback , Tactile touch feedback , Active stylus',
       ' Rotatable screen , Haptic touch feedback',
       ' Haptic touch feedback , Tactile touch feedback , Foldable screen',
       ' Haptic touch feedback , Active stylus , Foldable screen'],
      dtype=object)

In [35]:
obj_data['Camera Placement'].unique()

array([' Rear', ' Slide-out', ' Rotatable'], dtype=object)

In [33]:
obj_data['Scratch Resistant Screen'].unique()

array([' Yes', ' Arc Glass', ' Gorilla Glass 4', ' Gorilla Glass',
       ' Gorilla Glass 5', 'No', ' Gorilla Glass 3', ' DragonTrail',
       ' Sapphire Glass', ' Gorilla Glass 6', ' DragonTrail X',
       ' Gorilla Glass 2', ' Gorilla Glass Victus', ' Ceramic shield',
       ' Gorilla Glass Victus+', ' Gorilla Glass Victus 2'], dtype=object)

In [32]:
obj_data['Operating System'].unique()

array([' Google Android 7.1.1 (Nougat)', ' Google Android 7.1.2 (Nougat)',
       ' Google Android 5.1 (Lollipop)',
       ' Google Android 5.1.1 (Lollipop)', ' Google Android 8.0 (Oreo)',
       ' Google Android 7.0 (Nougat)',
       ' Google Android 6.0 (Marshmallow)',
       ' Google Android 5.0.1 (Lollipop)',
       ' Google Android 8.1 Oreo Go edition (Oreo)',
       ' Google Android 6.0.1 (Marshmallow)',
       ' Google Android 8.1 (Oreo)', ' Google Android 4.4.2 (KitKat)',
       ' Google Android 7.1 (Nougat)', ' Mozilla Firefox OS 2.0',
       ' Google Android 9.0 (Pie)', ' Alibaba YunOS 5.1.1',
       ' Microsoft Windows Embedded Handheld 6.5 Professional',
       ' Apple iOS 12', ' Google Android 5.0.2 (Lollipop)',
       ' Google Android 10 (Q)', ' Google Android 4.1 (Jelly Bean)',
       ' Google Android 2.1 (Eclair)', ' Google Android 4.4.4 (KitKat)',
       ' Google Android 4.2 (Jelly Bean)',
       ' Google Android 4.2.2 (Jelly Bean)',
       ' Google Android 9 Pie Go ed

In [31]:
obj_data['Platform'].unique()

array([' Android', ' Linux', ' Windows (mobile-class)', ' iOS / iPadOS'],
      dtype=object)

In [30]:
obj_data['Manufacturer'].unique()

array(['Sony', 'LG Electronics', 'BBK Electronics', 'TCL',
       'Samsung Electronics', 'FIH Precision Electronics', 'HTC', 'ZTE',
       'ASUSTeK Computer', 'Hisense', 'Meizu', 'Compal Electronics',
       'Blu Products', 'Shenzhen TINNO Mobile Technology', 'Huawei',
       'TP-Link Technologies', 'Lenovo', 'Foxconn', 'Sonim Technologies',
       'Matsunichi Digital Development', 'Yulong Computer',
       'Kyocera Communications', 'Sharp', 'OTEDA Industrial',
       'GiONEE Communications Equipment', 'Meitu Mobile', 'Xiaomi',
       'Shenzen Gotron Electronic Co. Ltd.',
       'Shenzhen Konka Telecommunications Techno', 'Fujitsu',
       'Multilaser', 'Shenzen Koobee Communication Co.,Ltd.',
       'Infinix Mobility', 'Abacus Electric',
       'Zebra Technologies Corporation',
       'Shenzhen uCloudlink Network Technology',
       'Shenzen Huafurui Technology Co. Ltd.', 'Intex Mobile',
       'InFocus Corporation', 'Optiemus Electronics Ltd.',
       'Shenzhen KVD Communication Equi

In [20]:
obj_data['Camera Placement'].value_counts()

 Rear         8238
 Rotatable      27
 Slide-out       7
Name: Camera Placement, dtype: int64

In [16]:
numerical_data.head(3)

Unnamed: 0,Width,Height,Depth,Mass,CPU Clock,Display Diagonal,Pixel Density,Display Refresh Rate,Nominal Battery Capacity,Price,Memory Capacity,Cam1_mp,Cam2_mp
0,78.0,150.0,9.8,178.0,1450.0,139.7,267.0,,3300.0,,32.0,12.8,7.7
1,78.0,150.0,9.8,178.0,1450.0,139.7,267.0,,3300.0,,32.0,12.8,7.7
2,73.2,146.3,8.2,152.0,1300.0,127.0,294.0,,2500.0,,16.0,8.0,4.9


In [23]:
#fill every missing values in each column of the object data with its mode
for cols in enumerate(obj_data.columns):
    mode = obj_data[cols[1]].mode()[0]  # Extract the mode value
    if obj_data[cols[1]].isnull().sum() != 0:
        obj_data[cols[1]].fillna(mode, inplace=True)  # Fill missing values with the mode


In [24]:
#fill every missing values in each column of the numerical data with its median
for cols in enumerate(numerical_data.columns):
    median = numerical_data[cols[1]].median()  # Calculate the mean value for the column
    if numerical_data[cols[1]].isnull().sum() != 0:
        numerical_data[cols[1]].fillna(median, inplace=True)  # Fill missing values with the mean

In [25]:
obj_data.isnull().sum()

Brand                            0
Model                            0
Released                         0
Announced                        0
Hardware Designer                0
Manufacturer                     0
General Extras                   0
Dimensions                       0
Platform                         0
Operating System                 0
CPU                              0
RAM Type                         0
RAM Capacity (converted)         0
Non-volatile Memory Interface    0
Resolution                       0
Display Area Utilization         0
Display Type                     0
Scratch Resistant Screen         0
Graphical Controller             0
GPU Clock:                       0
Expansion Interfaces             0
USB Services                     0
USB Connector                    0
Max. Charging Power              0
Bluetooth                        0
Camera Placement                 0
Camera Image Sensor              0
Image Sensor Pixel Size          0
Aperture (W)        

In [26]:
numerical_data.isnull().sum()

Width                       0
Height                      0
Depth                       0
Mass                        0
CPU Clock                   0
Display Diagonal            0
Pixel Density               0
Display Refresh Rate        0
Nominal Battery Capacity    0
Price                       0
Memory Capacity             0
Cam1_mp                     0
Cam2_mp                     0
dtype: int64