<a href="https://colab.research.google.com/github/ady909/Mobile_Price_Range_Prediction/blob/main/Mobile_Price_Range_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Importing Libraries**

In [1]:
# Import Libraries

import numpy as np
import pandas as pd

# Import Visualization libraries

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Import Warnings

import warnings
warnings.filterwarnings('ignore')

# Import model selection libraries

from sklearn.model_selection import train_test_split,GridSearchCV,RandomizedSearchCV
from sklearn.preprocessing import StandardScaler,MinMaxScaler


# Importing Models

from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

# Importing Metric Evaluation Libraries

from sklearn.metrics import accuracy_score, classification_report,confusion_matrix,ConfusionMatrixDisplay,roc_curve,roc_auc_score,auc

#**Dataset Loading**

In [2]:
from google.colab import files

# Upload the .csv file
uploaded = files.upload()

Saving data_mobile_price_range.csv to data_mobile_price_range.csv


In [3]:
mobile_df = pd.read_csv("data_mobile_price_range.csv")

##**Dataset First View**

In [4]:
# Dataset First Look

mobile_df.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


##**Dataset Rows & Columns count**

In [5]:
# Dataset Rows & Columns count

mobile_df.shape

(2000, 21)

In [6]:
print(f"Row Count :{mobile_df.shape[0]}\nColumn Count:{mobile_df.shape[1]}")

Row Count :2000
Column Count:21


There are total 2000 datapoints , 20 Independent Variables and 1 Target Variable.

##**Dataset Information**

In [7]:
# Dataset Info

mobile_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   battery_power  2000 non-null   int64  
 1   blue           2000 non-null   int64  
 2   clock_speed    2000 non-null   float64
 3   dual_sim       2000 non-null   int64  
 4   fc             2000 non-null   int64  
 5   four_g         2000 non-null   int64  
 6   int_memory     2000 non-null   int64  
 7   m_dep          2000 non-null   float64
 8   mobile_wt      2000 non-null   int64  
 9   n_cores        2000 non-null   int64  
 10  pc             2000 non-null   int64  
 11  px_height      2000 non-null   int64  
 12  px_width       2000 non-null   int64  
 13  ram            2000 non-null   int64  
 14  sc_h           2000 non-null   int64  
 15  sc_w           2000 non-null   int64  
 16  talk_time      2000 non-null   int64  
 17  three_g        2000 non-null   int64  
 18  touch_sc

## **Duplicate Value Check**

In [8]:
# Dataset Duplicate Value Count
duplicated_count = mobile_df.duplicated().sum()
duplicated_count

0

**Dataset doesn't contain any duplicate values**


## **Missing Value /Null Value Check**

In [9]:
# Missing Values/Null Values Count

missing_values_count = mobile_df.isnull().sum()
missing_values_count

battery_power    0
blue             0
clock_speed      0
dual_sim         0
fc               0
four_g           0
int_memory       0
m_dep            0
mobile_wt        0
n_cores          0
pc               0
px_height        0
px_width         0
ram              0
sc_h             0
sc_w             0
talk_time        0
three_g          0
touch_screen     0
wifi             0
price_range      0
dtype: int64

**There are no null values in the dataset**

**So Dataset consists 2,000 data points across 21 columns with no null values or duplicated records.**

## **Understanding Your Variables**

In [11]:
# Dataset Describe
mobile_df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
battery_power,2000.0,1238.5185,439.418206,501.0,851.75,1226.0,1615.25,1998.0
blue,2000.0,0.495,0.5001,0.0,0.0,0.0,1.0,1.0
clock_speed,2000.0,1.52225,0.816004,0.5,0.7,1.5,2.2,3.0
dual_sim,2000.0,0.5095,0.500035,0.0,0.0,1.0,1.0,1.0
fc,2000.0,4.3095,4.341444,0.0,1.0,3.0,7.0,19.0
four_g,2000.0,0.5215,0.499662,0.0,0.0,1.0,1.0,1.0
int_memory,2000.0,32.0465,18.145715,2.0,16.0,32.0,48.0,64.0
m_dep,2000.0,0.50175,0.288416,0.1,0.2,0.5,0.8,1.0
mobile_wt,2000.0,140.249,35.399655,80.0,109.0,141.0,170.0,200.0
n_cores,2000.0,4.5205,2.287837,1.0,3.0,4.0,7.0,8.0


The presence of minimum values of 0 in attributes such as front camera, primary camera, sc_width, and px_height is inconsistent with the typical specifications of mobile devices, suggesting a potential data discrepancy. It is necessary to address and handle these anomalies to ensure the data accurately reflects realistic mobile device characteristics.

In [10]:
# Dataset Columns
mobile_df.columns

Index(['battery_power', 'blue', 'clock_speed', 'dual_sim', 'fc', 'four_g',
       'int_memory', 'm_dep', 'mobile_wt', 'n_cores', 'pc', 'px_height',
       'px_width', 'ram', 'sc_h', 'sc_w', 'talk_time', 'three_g',
       'touch_screen', 'wifi', 'price_range'],
      dtype='object')

In [12]:
# Numerical Columns

Numerical_columns = mobile_df.describe().columns
Numerical_columns

Index(['battery_power', 'blue', 'clock_speed', 'dual_sim', 'fc', 'four_g',
       'int_memory', 'm_dep', 'mobile_wt', 'n_cores', 'pc', 'px_height',
       'px_width', 'ram', 'sc_h', 'sc_w', 'talk_time', 'three_g',
       'touch_screen', 'wifi', 'price_range'],
      dtype='object')

In [13]:
# Categorical Columns

Categorical_columns = mobile_df.select_dtypes(include=['object','category']).columns
Categorical_columns

Index([], dtype='object')

**Unique Value Check**

In [14]:
# Check Unique Values for each variable.
mobile_df.nunique()

battery_power    1094
blue                2
clock_speed        26
dual_sim            2
fc                 20
four_g              2
int_memory         63
m_dep              10
mobile_wt         121
n_cores             8
pc                 21
px_height        1137
px_width         1109
ram              1562
sc_h               15
sc_w               19
talk_time          19
three_g             2
touch_screen        2
wifi                2
price_range         4
dtype: int64

Price Range which is our Target Variable has 4 Unique Values . That means its a Multiclass Category Problem.