# `Devices Price Classification System using Python and Spring Boot`

Dataset columns are as follows:
*  id - ID
*  battery_power - Total energy a battery can store in one time measured in mAh
*  blue - Has Bluetooth or not
*  clock_speed - The speed at which the microprocessor executes instructions
*  dual_sim - Has dual sim support or not
*  fc - Front Camera megapixels
*  four_g - Has 4G or not
*  int_memory - Internal Memory in Gigabytes
*  m_dep - Mobile Depth in cm
*  mobile_wt - Weight of mobile phone
*  n_cores - Number of cores of the processor
*  pc - Primary Camera megapixels
*  px_height - Pixel Resolution Height
*  px_width - Pixel Resolution Width
*  ram - Random Access Memory in Megabytes
*  sc_h - Screen Height of mobile in cm
*  sc_w - Screen Width of mobile in cm
*  talk_time - longest time that a single battery charge will last when you are
*  three_g - Has 3G or not
*  touch_screen - Has touch screen or not
*  wifi - Has wifi or not
*  price_range - This is the target variable with the value of:
-   0 (low cost)
-    1 (medium cost)
-    2 (high cost)
-    3 (very high cost)


In [1]:
# Import libraries
import pandas as pd
import numpy as np 
import seaborn as sb 
import matplotlib.pyplot as plt 


In [2]:
df_train = pd.read_csv("../data/raw/train - train.csv")
EXPORT_PATH = "../data/processed/1_preprocessed_df.pkl"


In [3]:
df_train.head(10)

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1.0,0.0,7.0,0.6,188.0,2.0,...,20.0,756.0,2549.0,9.0,7.0,19,0,0,1,1
1,1021,1,0.5,1,0.0,1.0,53.0,0.7,136.0,3.0,...,905.0,1988.0,2631.0,17.0,3.0,7,1,1,0,2
2,563,1,0.5,1,2.0,1.0,41.0,0.9,145.0,5.0,...,1263.0,1716.0,2603.0,11.0,2.0,9,1,1,0,2
3,615,1,2.5,0,0.0,0.0,10.0,0.8,131.0,6.0,...,1216.0,1786.0,2769.0,16.0,8.0,11,1,0,0,2
4,1821,1,1.2,0,13.0,1.0,44.0,0.6,141.0,2.0,...,1208.0,1212.0,1411.0,8.0,2.0,15,1,1,0,1
5,1859,0,0.5,1,3.0,0.0,22.0,0.7,164.0,1.0,...,1004.0,1654.0,1067.0,17.0,1.0,10,1,0,0,1
6,1821,0,1.7,0,4.0,1.0,10.0,0.8,139.0,8.0,...,381.0,1018.0,3220.0,13.0,8.0,18,1,0,1,3
7,1954,0,0.5,1,0.0,0.0,24.0,0.8,187.0,4.0,...,512.0,1149.0,700.0,16.0,3.0,5,1,1,1,0
8,1445,1,0.5,0,0.0,0.0,53.0,0.7,174.0,7.0,...,386.0,836.0,1099.0,17.0,1.0,20,1,0,0,0
9,509,1,0.6,1,2.0,1.0,9.0,0.1,93.0,5.0,...,1137.0,1224.0,513.0,19.0,10.0,12,1,0,0,0


In [4]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   battery_power  2000 non-null   int64  
 1   blue           2000 non-null   int64  
 2   clock_speed    2000 non-null   float64
 3   dual_sim       2000 non-null   int64  
 4   fc             1995 non-null   float64
 5   four_g         1995 non-null   float64
 6   int_memory     1995 non-null   float64
 7   m_dep          1995 non-null   float64
 8   mobile_wt      1996 non-null   float64
 9   n_cores        1996 non-null   float64
 10  pc             1995 non-null   float64
 11  px_height      1996 non-null   float64
 12  px_width       1998 non-null   float64
 13  ram            1998 non-null   float64
 14  sc_h           1999 non-null   float64
 15  sc_w           1999 non-null   float64
 16  talk_time      2000 non-null   int64  
 17  three_g        2000 non-null   int64  
 18  touch_sc

In [5]:
df_train.isnull().sum()

battery_power    0
blue             0
clock_speed      0
dual_sim         0
fc               5
four_g           5
int_memory       5
m_dep            5
mobile_wt        4
n_cores          4
pc               5
px_height        4
px_width         2
ram              2
sc_h             1
sc_w             1
talk_time        0
three_g          0
touch_screen     0
wifi             0
price_range      0
dtype: int64

In [6]:
df_train.fillna(df_train.mean(), inplace=True)

In [7]:
df_train.isnull().sum()

battery_power    0
blue             0
clock_speed      0
dual_sim         0
fc               0
four_g           0
int_memory       0
m_dep            0
mobile_wt        0
n_cores          0
pc               0
px_height        0
px_width         0
ram              0
sc_h             0
sc_w             0
talk_time        0
three_g          0
touch_screen     0
wifi             0
price_range      0
dtype: int64

In [8]:
# Export Data
df_train.to_pickle(EXPORT_PATH)