# Introduction to Deep Learning

## Objectives
In this lab, you will embark on the journey of creating a ANN, DNN model tailored for predicting the total expenditure of potential consumers based on various characteristics. As a vehicle salesperson, your goal is to develop a model that can effectively estimate the overall spending potential.

Your task is to build and train an ANN/DNN model using tensorflow in a Jupyter notebook.

Feel Free to Explore the dataset, analyze its contents, and derive meaningful insights. Additionally, feel empowered to create insightful visualizations that enhance the understanding of the data. 

# Step 1: Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix


In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense




# Step 2: Load and Explore the Data

In [3]:
df = pd.read_csv('./car_purchasing.csv', encoding='latin1')
df

Unnamed: 0,customer name,customer e-mail,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,Martina Avila,cubilia.Curae.Phasellus@quisaccumsanconvallis.edu,Bulgaria,0,41.851720,62812.09301,11609.380910,238961.2505,35321.45877
1,Harlan Barnes,eu.dolor@diam.co.uk,Belize,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,Naomi Rodriquez,vulputate.mauris.sagittis@ametconsectetueradip...,Algeria,1,43.152897,53798.55112,11160.355060,638467.1773,42925.70921
3,Jade Cunningham,malesuada@dignissim.com,Cook Islands,1,58.271369,79370.03798,14426.164850,548599.0524,67422.36313
4,Cedric Leach,felis.ullamcorper.viverra@egetmollislectus.net,Brazil,1,57.313749,59729.15130,5358.712177,560304.0671,55915.46248
...,...,...,...,...,...,...,...,...,...
495,Walter,ligula@Cumsociis.ca,Nepal,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,Vanna,Cum.sociis.natoque@Sedmolestie.edu,Zimbabwe,1,37.642000,56039.49793,12301.456790,360419.0988,31491.41457
497,Pearl,penatibus.et@massanonante.com,Philippines,1,53.943497,68888.77805,10611.606860,764531.3203,64147.28888
498,Nell,Quisque.varius@arcuVivamussit.net,Botswana,1,59.160509,49811.99062,14013.034510,337826.6382,45442.15353


# Step 3: Data Cleaning and Preprocessing


**Hint: You could use a `StandardScaler()` or `MinMaxScaler()`**

In [4]:
df.isna().sum()

customer name          0
customer e-mail        0
country                0
gender                 0
age                    0
annual Salary          0
credit card debt       0
net worth              0
car purchase amount    0
dtype: int64

In [5]:
df['country'].value_counts()

country
Israel                 6
Mauritania             6
Bolivia                6
Greenland              5
Saint Barthélemy       5
                      ..
El Salvador            1
Denmark                1
Oman                   1
Trinidad and Tobago    1
marlal                 1
Name: count, Length: 211, dtype: int64

In [6]:
df.drop(['customer name','customer e-mail'],axis=1,inplace=True)

In [7]:
# Initialize the LabelEncoder
from sklearn.calibration import LabelEncoder


label_encoder = LabelEncoder()

# Fit and transform the data
df['country'] = label_encoder.fit_transform(df['country'])

In [8]:
df

Unnamed: 0,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,27,0,41.851720,62812.09301,11609.380910,238961.2505,35321.45877
1,17,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,1,1,43.152897,53798.55112,11160.355060,638467.1773,42925.70921
3,41,1,58.271369,79370.03798,14426.164850,548599.0524,67422.36313
4,26,1,57.313749,59729.15130,5358.712177,560304.0671,55915.46248
...,...,...,...,...,...,...,...
495,128,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,208,1,37.642000,56039.49793,12301.456790,360419.0988,31491.41457
497,144,1,53.943497,68888.77805,10611.606860,764531.3203,64147.28888
498,24,1,59.160509,49811.99062,14013.034510,337826.6382,45442.15353


In [9]:
# Write your code ^_^
from sklearn.preprocessing import MinMaxScaler


scaler = MinMaxScaler()

scaled=scaler.fit_transform(df[['annual Salary','credit card debt',	'net worth']])

df[['annual Salary','credit card debt','net worth']]=scaled


In [10]:
df

Unnamed: 0,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,27,0,41.851720,0.535151,0.578361,0.223430,35321.45877
1,17,0,40.870623,0.583086,0.476028,0.521402,45115.52566
2,1,1,43.152897,0.422482,0.555797,0.631089,42925.70921
3,41,1,58.271369,0.742125,0.719908,0.539387,67422.36313
4,26,1,57.313749,0.496614,0.264257,0.551331,55915.46248
...,...,...,...,...,...,...,...
495,128,0,41.462515,0.649280,0.346528,0.532316,48901.44342
496,208,1,37.642000,0.450494,0.613139,0.347366,31491.41457
497,144,1,53.943497,0.611110,0.528221,0.759726,64147.28888
498,24,1,59.160509,0.372650,0.699147,0.324313,45442.15353


# Step 4: Train Test Split

In [11]:
# Write your code ^_^

X = df.drop(columns=['car purchase amount'])
y = df['car purchase amount']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Build the Artifical Neural Network Model

In [12]:
X_train.shape[1]

6

In [13]:
from keras import optimizers


# Write your code ^_^
model = Sequential()

model.add(Dense(units=32, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dense(units=1, activation='linear'))

model.compile(
    optimizer='adam',
    loss='mean_squared_error',
    metrics=['mse']
)








### Clarify Your Artificial Neural Network (ANN) Model, Optimization, and Loss Function Choices and justify


This model predicts continuous values, using a single hidden layer with ReLU activation, an output layer for regression, Adam optimizer for faster learning, and Mean Squared Error for measuring prediction accuracy.Write your anwser here

# Step 6: Train the Model


In [14]:
# Write your code ^_^

import time


start_time_ann_training = time.time()
history_ann = model.fit(X_train,y_train,epochs=10,batch_size=8,validation_data=(X_test, y_test),validation_split=0.1)
end_time_ann_training = time.time()
ann_training_time = end_time_ann_training - start_time_ann_training


Epoch 1/10


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Step 7: Evaluate the Model

In [15]:
# Write your code ^_^
test_loss, test_mse = model.evaluate(X_test, y_test)
print(f"Test Mean squared Error: {test_mse}")

Test Mean squared Error: 2028390144.0


# Step 8: Build the Deep Neural Network Model

In [16]:
# Write your code ^_^
model2 = Sequential()

# Input layer and first hidden layer
model2.add(Dense(units=128, activation='relu', input_shape=(X_train.shape[1],)))

# Additional hidden layers
model2.add(Dense(units=64, activation='relu'))
model2.add(Dense(units=32, activation='relu'))
model2.add(Dense(units=16, activation='relu'))


# Output layer
model2.add(Dense(units=1, activation='linear'))

model2.compile(
    optimizer='adam',
    loss='mean_squared_error',
    metrics=['mse']
)

### Clarify Your Deep Neural Network (DNN) Model, Optimization, and Loss Function Choices and justify 

This neural network model comprises several ReLU-activated hidden layers for non-linearity, a linear output for continuous predictions, utilizing the Adam optimizer for adaptive learning rates, and employs Mean Squared Error (MSE) as the loss function, ideal for regression tasks due to its measure of average squared differences between predictions and actual values.Write your anwser here

# Step 9: Train the Model

In [17]:
# Write your code ^_^

# Train DNN model
start_time_dnn_training = time.time()
history_dnn = model2.fit(X_train,y_train,epochs=10,batch_size=16,validation_data=(X_test, y_test))
end_time_dnn_training = time.time()
dnn_training_time = end_time_dnn_training - start_time_dnn_training



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Step 10: Evaluate the Model

In [18]:
# Write your code ^_^
# Write your code ^_^
test_loss, test_mse2 = model2.evaluate(X_test, y_test)
print(f"Test Mean squared Error: {test_mse2}")

Test Mean squared Error: 202139360.0


# Step 11: Evaluate and Compare Scores, Training Time, and Prediction Time of ANN/DNN Models

- The ANN model has a significantly higher MSE compared to the DNN model, suggesting larger squared errors in its predictions on the test data.
- The DNN model performs better in terms of MSE, indicating smaller average squared prediction errors compared to the ANN model.

In [19]:
# Write your code ^_^


print("ANN MSE:", test_mse)
print("DNN MSE:", test_mse2)

print("ANN Training Time:", ann_training_time)
print("DNN Training Time:", dnn_training_time)

ANN MSE: 2028390144.0
DNN MSE: 202139360.0
ANN Training Time: 1.3281545639038086
DNN Training Time: 1.220855951309204
