# Introduction to Deep Learning

## Objectives
In this lab, you will embark on the journey of creating a ANN, DNN model tailored for predicting the total expenditure of potential consumers based on various characteristics. As a vehicle salesperson, your goal is to develop a model that can effectively estimate the overall spending potential.

Your task is to build and train an ANN/DNN model using tensorflow in a Jupyter notebook.

Feel Free to Explore the dataset, analyze its contents, and derive meaningful insights. Additionally, feel empowered to create insightful visualizations that enhance the understanding of the data. 

# Step 1: Import Libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
from tensorflow.keras.models import Sequential
from keras.models import Sequential

from tensorflow.keras.layers import Dense
from sklearn import preprocessing 
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
import time
import warnings
warnings.filterwarnings('ignore')

# Step 2: Load and Explore the Data

In [3]:
df = pd.read_csv('car_purchasing.csv', encoding='ISO-8859-1')
df

Unnamed: 0,customer name,customer e-mail,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,Martina Avila,cubilia.Curae.Phasellus@quisaccumsanconvallis.edu,Bulgaria,0,41.851720,62812.09301,11609.380910,238961.2505,35321.45877
1,Harlan Barnes,eu.dolor@diam.co.uk,Belize,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,Naomi Rodriquez,vulputate.mauris.sagittis@ametconsectetueradip...,Algeria,1,43.152897,53798.55112,11160.355060,638467.1773,42925.70921
3,Jade Cunningham,malesuada@dignissim.com,Cook Islands,1,58.271369,79370.03798,14426.164850,548599.0524,67422.36313
4,Cedric Leach,felis.ullamcorper.viverra@egetmollislectus.net,Brazil,1,57.313749,59729.15130,5358.712177,560304.0671,55915.46248
...,...,...,...,...,...,...,...,...,...
495,Walter,ligula@Cumsociis.ca,Nepal,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,Vanna,Cum.sociis.natoque@Sedmolestie.edu,Zimbabwe,1,37.642000,56039.49793,12301.456790,360419.0988,31491.41457
497,Pearl,penatibus.et@massanonante.com,Philippines,1,53.943497,68888.77805,10611.606860,764531.3203,64147.28888
498,Nell,Quisque.varius@arcuVivamussit.net,Botswana,1,59.160509,49811.99062,14013.034510,337826.6382,45442.15353


In [4]:
df.describe()

Unnamed: 0,gender,age,annual Salary,credit card debt,net worth,car purchase amount
count,500.0,500.0,500.0,500.0,500.0,500.0
mean,0.506,46.241674,62127.239608,9607.645049,431475.713625,44209.799218
std,0.500465,7.978862,11703.378228,3489.187973,173536.75634,10773.178744
min,0.0,20.0,20000.0,100.0,20000.0,9000.0
25%,0.0,40.949969,54391.977195,7397.515792,299824.1959,37629.89604
50%,1.0,46.049901,62915.497035,9655.035568,426750.12065,43997.78339
75%,1.0,51.612263,70117.862005,11798.867487,557324.478725,51254.709517
max,1.0,70.0,100000.0,20000.0,1000000.0,80000.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   customer name        500 non-null    object 
 1   customer e-mail      500 non-null    object 
 2   country              500 non-null    object 
 3   gender               500 non-null    int64  
 4   age                  500 non-null    float64
 5   annual Salary        500 non-null    float64
 6   credit card debt     500 non-null    float64
 7   net worth            500 non-null    float64
 8   car purchase amount  500 non-null    float64
dtypes: float64(5), int64(1), object(3)
memory usage: 35.3+ KB


In [6]:
df.isnull().sum()

customer name          0
customer e-mail        0
country                0
gender                 0
age                    0
annual Salary          0
credit card debt       0
net worth              0
car purchase amount    0
dtype: int64

# Step 3: Data Cleaning and Preprocessing


**Hint: You could use a `StandardScaler()` or `MinMaxScaler()`**

In [7]:
df = df.drop(['customer name','customer e-mail'],axis=1)

In [8]:
df.columns

Index(['country', 'gender', 'age', 'annual Salary', 'credit card debt',
       'net worth', 'car purchase amount'],
      dtype='object')

In [9]:
df['age']=df['age'].astype('int')

In [10]:
df.dtypes

country                 object
gender                   int64
age                      int32
annual Salary          float64
credit card debt       float64
net worth              float64
car purchase amount    float64
dtype: object

In [11]:
label_encoder = preprocessing.LabelEncoder() 
  
df['country']= label_encoder.fit_transform(df['country']) 

In [12]:
scaler = MinMaxScaler()
scaler.fit_transform(df)

array([[0.12857143, 0.        , 0.42      , ..., 0.57836085, 0.22342985,
        0.37072477],
       [0.08095238, 0.        , 0.4       , ..., 0.476028  , 0.52140195,
        0.50866938],
       [0.0047619 , 1.        , 0.46      , ..., 0.55579674, 0.63108896,
        0.47782689],
       ...,
       [0.68571429, 1.        , 0.66      , ..., 0.52822145, 0.75972584,
        0.77672238],
       [0.11428571, 1.        , 0.78      , ..., 0.69914746, 0.3243129 ,
        0.51326977],
       [0.9952381 , 1.        , 0.52      , ..., 0.46690159, 0.45198622,
        0.50855247]])

# Step 4: Train Test Split

In [13]:
X = df.drop('car purchase amount', axis=1)
Y = df['car purchase amount']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Step 5: Build the Artifical Neural Network Model

In [14]:
model = Sequential()
model.add(Dense(units=32,activation='relu',input_shape=(6,)))
model.add(Dense(units=1,activation='linear'))

model.compile(optimizer='adam',
              loss = 'mean_squared_error',
              metrics=["accuracy"]
             )






### Clarify Your Artificial Neural Network (ANN) Model, Optimization, and Loss Function Choices and justify

i choose the linear as activation function becaue it is not a classification problem it is regression, for optimizing function i chose adam because it is computationally efficient, memory efficient, and has little memory requirements, and use MSE as loss function because
it is the metric for regression.

# Step 6: Train the Model


In [15]:
start_time = time.time()

model.fit(X_train,Y_train,epochs=10,batch_size=32, validation_split=0.2)
ANN_training_time = time.time() - start_time

Epoch 1/10


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Step 7: Evaluate the Model

In [16]:
start_time = time.time()
y_pred = model.predict(X_test)
ANN_prediction_time = time.time() - start_time
mse = mean_squared_error(Y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(Y_test, y_pred)

print('MSE: {:.2f}'.format(mse))
print('RMSE: {:.2f}'.format(rmse))
print('R-squared: {:.2f}'.format(r2))
print('Training Time: {:.2f} seconds'.format(ANN_training_time))
print('Prediction Time: {:.2f} seconds'.format(ANN_prediction_time))
ANN_evaluation = [mse,rmse,r2,ANN_training_time,ANN_prediction_time]

MSE: 53023296.56
RMSE: 7281.71
R-squared: 0.51
Training Time: 1.64 seconds
Prediction Time: 0.20 seconds


In [17]:
y_pred

array([[52846.062],
       [44296.062],
       [53663.117],
       [25331.215],
       [62621.836],
       [51877.242],
       [55123.086],
       [49219.04 ],
       [46389.07 ],
       [45035.67 ],
       [40167.016],
       [34308.18 ],
       [29880.154],
       [43928.867],
       [46690.   ],
       [57891.414],
       [52108.797],
       [28397.322],
       [61303.68 ],
       [43101.207],
       [38284.59 ],
       [43848.75 ],
       [48983.4  ],
       [36258.89 ],
       [32673.65 ],
       [44258.773],
       [59390.6  ],
       [42731.625],
       [28832.123],
       [48581.47 ],
       [50305.46 ],
       [37424.324],
       [45765.83 ],
       [45204.195],
       [47481.805],
       [39748.023],
       [44478.867],
       [28794.064],
       [33815.586],
       [30201.854],
       [59719.9  ],
       [56998.15 ],
       [39064.555],
       [40984.14 ],
       [47124.094],
       [34020.82 ],
       [34438.992],
       [33941.98 ],
       [42419.676],
       [47201.312],


# Step 8: Build the Deep Neural Network Model

In [18]:
# here DNN
model_DNN = Sequential()
model_DNN.add(Dense(units=32,activation='relu',input_shape=(6,)))
model_DNN.add(Dense(units=1,activation='linear'))
model_DNN.add(Dense(units=1,activation='linear'))
model_DNN.add(Dense(units=1,activation='linear'))


model_DNN.compile(optimizer='adam',
              loss = 'MSE',
              metrics=["accuracy"]
             )


### Clarify Your Deep Neural Network (DNN) Model, Optimization, and Loss Function Choices and justify 

i choose the linear as activation function because it is not a classification problem it is regression, for optimizing function i chose adam because it is computationally efficient, memory efficient, and has little memory requirements, and use MSE as loss function because
it is the metric for regression.

# Step 9: Train the Model

In [19]:
start_time = time.time()
model_DNN.fit(X_train,Y_train,epochs=10,batch_size=32, validation_split=0.2)
DNN_training_time = time.time() - start_time

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Step 10: Evaluate the Model

In [20]:
start_time = time.time()
y_pred = model_DNN.predict(X_test)
DNN_prediction_time = time.time() - start_time
mse = mean_squared_error(Y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(Y_test, y_pred)

print('MSE: {:.2f}'.format(mse))
print('RMSE: {:.2f}'.format(rmse))
print('R-squared: {:.2f}'.format(r2))
print('Training Time: {:.2f} seconds'.format(DNN_training_time))
print('Prediction Time: {:.2f} seconds'.format(DNN_prediction_time))
DNN_evaluation = [mse,rmse,r2,DNN_training_time,DNN_prediction_time]

MSE: 45418355.77
RMSE: 6739.31
R-squared: 0.58
Training Time: 1.62 seconds
Prediction Time: 0.16 seconds


# Step 11: Evaluate and Compare Scores, Training Time, and Prediction Time of ANN/DNN Models

In [21]:
print('{:<20} {:<20} {:<20}'.format('Metric', 'ANN Evaluation', 'DNN Evaluation'))
print('{:<20} {:<20} {:<20}'.format('MSE', ANN_evaluation[0], DNN_evaluation[0]))
print('{:<20} {:<20} {:<20}'.format('RMSE', ANN_evaluation[1], DNN_evaluation[1]))
print('{:<20} {:<20} {:<20}'.format('R-squared', ANN_evaluation[2], DNN_evaluation[2]))
print('{:<20} {:<20} {:<20}'.format('Training Time', ANN_evaluation[3], DNN_evaluation[3]))
print('{:<20} {:<20} {:<20}'.format('Prediction Time', ANN_evaluation[4], DNN_evaluation[4]))


Metric               ANN Evaluation       DNN Evaluation      
MSE                  53023296.55774615    45418355.76816429   
RMSE                 7281.709727649554    6739.314191233726   
R-squared            0.508921506084207    0.5793551288824307  
Training Time        1.6430575847625732   1.6239252090454102  
Prediction Time      0.20046234130859375  0.16370511054992676 
