# Introduction to Deep Learning

## Objectives
In this lab, you will embark on the journey of creating a ANN, DNN model tailored for predicting the total expenditure of potential consumers based on various characteristics. As a vehicle salesperson, your goal is to develop a model that can effectively estimate the overall spending potential.

Your task is to build and train an ANN/DNN model using tensorflow in a Jupyter notebook.

Feel Free to Explore the dataset, analyze its contents, and derive meaningful insights. Additionally, feel empowered to create insightful visualizations that enhance the understanding of the data. 

# Step 1: Import Libraries

In [19]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import LabelEncoder
from sklearn.model_selection import train_test_split, cross_val_score
import time


# Step 2: Load and Explore the Data

In [2]:
df = pd.read_csv('car_purchasing.csv', encoding='latin-1')

In [3]:
df.head()

Unnamed: 0,customer name,customer e-mail,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,Martina Avila,cubilia.Curae.Phasellus@quisaccumsanconvallis.edu,Bulgaria,0,41.85172,62812.09301,11609.38091,238961.2505,35321.45877
1,Harlan Barnes,eu.dolor@diam.co.uk,Belize,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,Naomi Rodriquez,vulputate.mauris.sagittis@ametconsectetueradip...,Algeria,1,43.152897,53798.55112,11160.35506,638467.1773,42925.70921
3,Jade Cunningham,malesuada@dignissim.com,Cook Islands,1,58.271369,79370.03798,14426.16485,548599.0524,67422.36313
4,Cedric Leach,felis.ullamcorper.viverra@egetmollislectus.net,Brazil,1,57.313749,59729.1513,5358.712177,560304.0671,55915.46248


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   customer name        500 non-null    object 
 1   customer e-mail      500 non-null    object 
 2   country              500 non-null    object 
 3   gender               500 non-null    int64  
 4   age                  500 non-null    float64
 5   annual Salary        500 non-null    float64
 6   credit card debt     500 non-null    float64
 7   net worth            500 non-null    float64
 8   car purchase amount  500 non-null    float64
dtypes: float64(5), int64(1), object(3)
memory usage: 35.3+ KB


In [5]:
df.describe()

Unnamed: 0,gender,age,annual Salary,credit card debt,net worth,car purchase amount
count,500.0,500.0,500.0,500.0,500.0,500.0
mean,0.506,46.241674,62127.239608,9607.645049,431475.713625,44209.799218
std,0.500465,7.978862,11703.378228,3489.187973,173536.75634,10773.178744
min,0.0,20.0,20000.0,100.0,20000.0,9000.0
25%,0.0,40.949969,54391.977195,7397.515792,299824.1959,37629.89604
50%,1.0,46.049901,62915.497035,9655.035568,426750.12065,43997.78339
75%,1.0,51.612263,70117.862005,11798.867487,557324.478725,51254.709517
max,1.0,70.0,100000.0,20000.0,1000000.0,80000.0


# Step 3: Data Cleaning and Preprocessing


**Hint: You could use a `StandardScaler()` or `MinMaxScaler()`**

In [6]:
df = df.drop(['customer name', 'customer e-mail'], axis=1)

In [7]:
label_encoder = LabelEncoder()
df['country'] = label_encoder.fit_transform(df['country'])

In [8]:

scaler = MinMaxScaler()
scaled=scaler.fit_transform(df[['annual Salary','credit card debt','net worth']])
df[['annual Salary','credit card debt','net worth']]=scaled

# Step 4: Train Test Split

In [10]:
X = df.loc[:,df.columns != 'car purchase amount']
y = df.loc[:,df.columns  == 'car purchase amount']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Step 5: Build the Artifical Neural Network Model

In [20]:
ann = Sequential()
ann.add(Dense(units=32, activation='relu', input_shape=(X_train.shape[1],)))
ann.add(Dense(units=1, activation='linear'))
ann.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])

### Clarify Your Artificial Neural Network (ANN) Model, Optimization, and Loss Function Choices and justify

Loss function: MSE is suitable for regresstion tasks
optimazer: adam is designed to update wights , which will improve accuracy

# Step 6: Train the Model


In [35]:
start_time = time.time()
annT = ann.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
training_time = time.time() - start_time

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Step 7: Evaluate the Model

In [36]:
loss, mse = ann.evaluate(X_test, y_test)

print(f'Test Loss: {loss}')
print(f'Test Mean Squared Error: {mse}')

Test Loss: 2123491712.0
Test Mean Squared Error: 2123491712.0


# Step 8: Build the Deep Neural Network Model

In [29]:
dnn = Sequential()
dnn.add(Dense(units=32, activation='relu', input_shape=(X_train.shape[1],)))
dnn.add(Dense(units=64, activation='relu'))
dnn.add(Dense(units=128, activation='relu'))
dnn.add(Dense(units=1, activation='linear'))
dnn.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])


### Clarify Your Deep Neural Network (DNN) Model, Optimization, and Loss Function Choices and justify 

Loss function: MSE is suitable for regresstion tasks
optimazer: adam is designed to update wights , which will improve accuracy

# Step 9: Train the Model

In [38]:
start_timeD = time.time()
dnnT = dnn.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
training_timeD = time.time() - start_time

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


# Step 10: Evaluate the Model

In [32]:
loss, mse = dnn.evaluate(X_test, y_test)

print(f'Test Loss: {loss}')
print(f'Test Mean Squared Error: {mse}')

Test Loss: 1611247232.0
Test Mean Squared Error: 1611247232.0


# Step 11: Evaluate and Compare Scores, Training Time, and Prediction Time of ANN/DNN Models

In [37]:
start_time = time.time()
y_pred = ann.predict(X_test)
prediction_time = time.time() - start_time

y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test, axis=1)
accuracy = accuracy_score(y_test_labels, y_pred_labels)

print(f"Accuracy: {accuracy}")
print(f"Training Time: {training_time} seconds")
print(f"Prediction Time: {prediction_time} seconds")


Accuracy: 1.0
Training Time: 1.0792570114135742 seconds
Prediction Time: 0.14530324935913086 seconds


In [39]:
start_timeD = time.time()
y_predD = dnn.predict(X_test)
prediction_time = time.time() - start_timeD

y_pred_labels = np.argmax(y_pred, axis=1)
y_test_labels = np.argmax(y_test, axis=1)
accuracy = accuracy_score(y_test_labels, y_pred_labels)

print(f"Accuracy: {accuracy}")
print(f"Training Time: {training_timeD} seconds")
print(f"Prediction Time: {prediction_time} seconds")


Accuracy: 1.0
Training Time: 99.65199136734009 seconds
Prediction Time: 0.19704294204711914 seconds
