# Classification Take Home Challenge

Bob has started his own mobile company. He wants to give tough fight to big companies like Apple,Samsung etc.

He does not know how to estimate price of mobiles his company creates. In this competitive mobile phone market you cannot simply assume things. To solve this problem he collects sales data of mobile phones of various companies.

Bob wants to find out some relation between features of a mobile phone(eg:- RAM,Internal Memory etc) and its selling price. But he is not so good at Machine Learning. So he needs your help to solve this problem.

In this problem you do not have to predict actual price but a price range indicating how high the price is:

* 0 (low cost)
* 1 (medium cost)
* 2 (high cost)
* 3 (very high cost)

Your final notebook should contain the following:
1. Intoduction
2. Exploratory Data Analysis (EDA)
3. Train-Test split
4. Training a Decision-Tree classifier
5. Making Predictions
6. Using the following metrics for analysis:

    * Confusion metrics
    * Accuracy_score
    * Precision_score
    * Recall_score
    * F1_score
7. Conclusion

### NOTE:

Explain your code and all the metrics you used in your notebook

# Introduction

Machine learning is basically teaching a machine how to make predictions using data


# Importing libraries

In [28]:
# common libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics



# Importing data

In [3]:
data = pd.read_csv('data_class.csv')
data.head()

In [None]:
data = pd.read_csv('data_class.csv')
data.head()

In [None]:
#display the column names
data.columns

In [None]:
#display the unique values of price range
data.price_range.unique()

In [None]:
#See which columns has null values
data.isnull().sum()

# Machine Learning Model

Train - test Split

In [5]:
#create a neW dataset ouf the OLD one
mldata = data
mldata.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


Train-test split

In [7]:
# I will use the Logistic Regression Model since i need to predict price range where the values are [1, 2, 3, 0] to Split the Data-set into Independent and Dependent Features

# X are the input (or independent) variables
X = mldata.drop('price_range', axis = 1)
# Y is output (or dependent) variable
y = mldata['price_range']

In [14]:
#split the data into training data and testing data using the 80/20 golden rule.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [15]:
print('Training data', len(X_train), len(y_train))
print('Test data', len(X_test), len(y_test))

Training data 1600 1600
Test data 400 400


fit amd train the model

In [16]:
# instatiate the model
decision_tree = DecisionTreeClassifier(random_state=0, max_depth=2)

#Train the model
decision_tree.fit(X_train, y_train)

DecisionTreeClassifier(max_depth=2, random_state=0)

In [19]:
# Using the test set to predict
# predict Price range

predictions = decision_tree.predict(X_test)

In [25]:
Results = pd.DataFrame(X_test)
Results['Actual_price'] = y_test
Results['Predicted_price'] = predictions

In [26]:
Results.head(10)

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,Actual_class,Predicted_class,Actual_price,Predicted_price
405,1454,1,0.5,1,1,0,34,0.7,83,4,...,7,5,5,1,1,0,3,3,3,3
1190,1092,1,0.5,1,10,0,11,0.5,167,3,...,14,4,11,0,1,0,0,0,0,0
1132,1524,1,1.8,1,0,0,10,0.6,174,4,...,16,5,13,1,0,1,2,2,2,2
731,1807,1,2.1,0,2,0,49,0.8,125,1,...,17,13,13,0,1,1,2,1,2,1
1754,1086,1,1.7,1,0,1,43,0.2,111,6,...,11,5,17,1,1,0,2,3,2,3
1178,909,1,0.5,1,9,0,30,0.4,97,3,...,12,0,4,1,1,1,0,0,0,0
1533,642,1,0.5,0,0,1,38,0.8,86,5,...,9,2,2,1,1,0,0,0,0,0
1303,888,0,2.6,1,2,1,33,0.4,198,2,...,12,1,20,1,0,0,3,3,3,3
1857,914,1,0.7,0,1,1,60,0.9,198,5,...,14,8,5,1,0,0,3,3,3,3
18,1131,1,0.5,1,11,0,49,0.6,101,5,...,19,13,16,1,1,0,1,1,1,1


Confusion metrics analysis

In [29]:
conf_matrix = metrics.confusion_matrix(Results['Actual_price'], Results['Predicted_price'])
print("Confusion matrix \n\n {}".format(conf_matrix))

Confusion matrix 

 [[77 18  0  0]
 [ 6 71 15  0]
 [ 0 25 56 18]
 [ 0  0 24 90]]


Accuracy_score analysis

In [30]:
accu_score = metrics.accuracy_score(Results['Actual_price'], Results['Predicted_price']) 
print('Accuracy Score = {}'.format(accu_score))

Accuracy Score = 0.735


Precision_score analysis

In [38]:
precision = metrics.precision_score(Results['Actual_price'], Results['Predicted_price'], average = 'weighted')
print("Precision Score = {}".format(precision))

Precision Score = 0.7469716761783978


Recall_score analysis

In [33]:
recall = metrics.recall_score(Results['Actual_price'], Results['Predicted_price'], average = 'weighted')
print('Recall Score = {}'.format(recall))

Recall Score = 0.735


F1_score analysis

In [34]:
f1 = metrics.f1_score(Results['Actual_price'], Results['Predicted_price'], average = 'weighted')
print("F1_Score = {}".format(f1))

F1_Score = 0.7379888964295014


In [None]:
Sao Paulo Real Estate - Sale / Rent - April 2019

link: https://www.kaggle.com/argonalyst/sao-paulo-real-estate-sale-rent-april-2019
        
columns:
    - price - Final price advertised (R$ Brazilian Real)
    - Condominium expenses (unknown values are marked as zero)
    - condo - Condominium expenses (unknown values are marked as zero)
    - size - The property size in Square Meters m² (private areas only)
    - rooms - Number of bedrooms
    - toilets - Number of toilets (all toilets)
    - suites - Number of bedrooms with a private bathroom (en suite)
    - parking - Number of parking spots
    - elevator - Binary value: 1 if there is elevator in the building, 0 otherwise
    - furnished - Binary value: 1 if the property is funished, 0 otherwise
    - swimming pool - Binary value: 1 if the property has swimming pool, 0 otherwise