# Introduction to Deep Learning

## Objectives
In this lab, you will embark on the journey of creating a ANN, DNN model tailored for predicting the total expenditure of potential consumers based on various characteristics. As a vehicle salesperson, your goal is to develop a model that can effectively estimate the overall spending potential.

Your task is to build and train an ANN/DNN model using tensorflow in a Jupyter notebook.

Feel Free to Explore the dataset, analyze its contents, and derive meaningful insights. Additionally, feel empowered to create insightful visualizations that enhance the understanding of the data. 

# Step 1: Import Libraries

In [260]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Step 2: Load and Explore the Data

In [261]:
df= pd.read_csv("car_purchasing.csv", encoding='latin')
df.head()

Unnamed: 0,customer name,customer e-mail,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,Martina Avila,cubilia.Curae.Phasellus@quisaccumsanconvallis.edu,Bulgaria,0,41.85172,62812.09301,11609.38091,238961.2505,35321.45877
1,Harlan Barnes,eu.dolor@diam.co.uk,Belize,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,Naomi Rodriquez,vulputate.mauris.sagittis@ametconsectetueradip...,Algeria,1,43.152897,53798.55112,11160.35506,638467.1773,42925.70921
3,Jade Cunningham,malesuada@dignissim.com,Cook Islands,1,58.271369,79370.03798,14426.16485,548599.0524,67422.36313
4,Cedric Leach,felis.ullamcorper.viverra@egetmollislectus.net,Brazil,1,57.313749,59729.1513,5358.712177,560304.0671,55915.46248


In [262]:
df.tail()

Unnamed: 0,customer name,customer e-mail,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
495,Walter,ligula@Cumsociis.ca,Nepal,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,Vanna,Cum.sociis.natoque@Sedmolestie.edu,Zimbabwe,1,37.642,56039.49793,12301.45679,360419.0988,31491.41457
497,Pearl,penatibus.et@massanonante.com,Philippines,1,53.943497,68888.77805,10611.60686,764531.3203,64147.28888
498,Nell,Quisque.varius@arcuVivamussit.net,Botswana,1,59.160509,49811.99062,14013.03451,337826.6382,45442.15353
499,Marla,Camaron.marla@hotmail.com,marlal,1,46.731152,61370.67766,9391.341628,462946.4924,45107.22566


In [263]:
df.shape

(500, 9)

In [264]:
df.describe().round

<bound method DataFrame.round of            gender         age  annual Salary  credit card debt  \
count  500.000000  500.000000     500.000000        500.000000   
mean     0.506000   46.241674   62127.239608       9607.645049   
std      0.500465    7.978862   11703.378228       3489.187973   
min      0.000000   20.000000   20000.000000        100.000000   
25%      0.000000   40.949969   54391.977195       7397.515792   
50%      1.000000   46.049901   62915.497035       9655.035568   
75%      1.000000   51.612263   70117.862005      11798.867487   
max      1.000000   70.000000  100000.000000      20000.000000   

            net worth  car purchase amount  
count      500.000000           500.000000  
mean    431475.713625         44209.799218  
std     173536.756340         10773.178744  
min      20000.000000          9000.000000  
25%     299824.195900         37629.896040  
50%     426750.120650         43997.783390  
75%     557324.478725         51254.709517  
max    10000

In [265]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   customer name        500 non-null    object 
 1   customer e-mail      500 non-null    object 
 2   country              500 non-null    object 
 3   gender               500 non-null    int64  
 4   age                  500 non-null    float64
 5   annual Salary        500 non-null    float64
 6   credit card debt     500 non-null    float64
 7   net worth            500 non-null    float64
 8   car purchase amount  500 non-null    float64
dtypes: float64(5), int64(1), object(3)
memory usage: 35.3+ KB


In [266]:
df.isna().sum()

customer name          0
customer e-mail        0
country                0
gender                 0
age                    0
annual Salary          0
credit card debt       0
net worth              0
car purchase amount    0
dtype: int64

In [267]:
z=df["country"].value_counts().index.tolist()
for i in z:
  print(i)

Israel
Mauritania
Bolivia
Greenland
Saint Barthélemy
Guinea
Iraq
Samoa
Liechtenstein
Bhutan
Kyrgyzstan
Equatorial Guinea
Algeria
Laos
Grenada
Armenia
Saint Vincent and The Grenadines
Senegal
Saint Pierre and Miquelon
Marshall Islands
Venezuela
Sierra Leone
Namibia
Guam
Egypt
Andorra
Madagascar
French Polynesia
Saint Kitts and Nevis
Sao Tome and Principe
Puerto Rico
China
Jersey
Mauritius
Gambia
United States Minor Outlying Islands
Kiribati
Ecuador
Poland
Slovakia
Congo (Brazzaville)
Mayotte
Macao
Croatia
Uganda
Kuwait
Bouvet Island
Wallis and Futuna
South Africa
Guadeloupe
Martinique
Latvia
Maldives
Belize
Christmas Island
Falkland Islands
Solomon Islands
Yemen
Nepal
Cocos (Keeling) Islands
Northern Mariana Islands
Tuvalu
Iceland
Viet Nam
Portugal
Turkey
Suriname
Dominican Republic
Isle of Man
Colombia
Macedonia
Tokelau
Mozambique
Micronesia
United Arab Emirates
Palestine, State of
Chile
Uruguay
Brazil
Turkmenistan
Costa Rica
Jamaica
Cape Verde
Timor-Leste
Djibouti
Turks and Caicos Isl

# Step 3: Data Cleaning and Preprocessing


**Hint: You could use a `StandardScaler()` or `MinMaxScaler()`**

In [268]:
df =df.loc[:,"country":]
df.head()

Unnamed: 0,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,Bulgaria,0,41.85172,62812.09301,11609.38091,238961.2505,35321.45877
1,Belize,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,Algeria,1,43.152897,53798.55112,11160.35506,638467.1773,42925.70921
3,Cook Islands,1,58.271369,79370.03798,14426.16485,548599.0524,67422.36313
4,Brazil,1,57.313749,59729.1513,5358.712177,560304.0671,55915.46248


In [269]:
q=df[df["country"]=="Israel"]
q["country"]="Palestine"
df[df["country"]=="Israel"]=q
df[df["country"]=="Palestine"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  q["country"]="Palestine"


Unnamed: 0,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
118,Palestine,0,42.915795,45721.66835,14250.52398,790526.5507,42592.88647
200,Palestine,1,44.314363,54918.38749,8920.385015,347017.8331,36086.93161
233,Palestine,1,53.141192,70842.83518,9536.899689,545946.9996,58235.41454
244,Palestine,1,46.270844,77146.27598,7903.33495,418764.5061,52313.98392
321,Palestine,1,49.10444,64665.39122,7404.080751,521815.7353,50666.88173
483,Palestine,0,43.386891,76523.33258,10373.00856,620355.2658,55377.87697


In [270]:
z=df["country"].value_counts().index.tolist()
for i in z:
  print(i)

Palestine
Mauritania
Bolivia
Greenland
Saint Barthélemy
Guinea
Iraq
Samoa
Liechtenstein
Bhutan
Kyrgyzstan
Equatorial Guinea
Algeria
Laos
Grenada
Armenia
Saint Vincent and The Grenadines
Senegal
Saint Pierre and Miquelon
Marshall Islands
Venezuela
Sierra Leone
Namibia
Guam
Egypt
Andorra
Madagascar
French Polynesia
Saint Kitts and Nevis
Sao Tome and Principe
Puerto Rico
China
Jersey
Mauritius
Gambia
United States Minor Outlying Islands
Kiribati
Ecuador
Poland
Slovakia
Congo (Brazzaville)
Mayotte
Macao
Croatia
Uganda
Kuwait
Bouvet Island
Wallis and Futuna
South Africa
Guadeloupe
Martinique
Latvia
Maldives
Belize
Christmas Island
Falkland Islands
Solomon Islands
Yemen
Nepal
Cocos (Keeling) Islands
Northern Mariana Islands
Tuvalu
Iceland
Viet Nam
Portugal
Turkey
Suriname
Dominican Republic
Isle of Man
Colombia
Macedonia
Tokelau
Mozambique
Micronesia
United Arab Emirates
Palestine, State of
Chile
Uruguay
Brazil
Turkmenistan
Costa Rica
Jamaica
Cape Verde
Timor-Leste
Djibouti
Turks and Caicos 

In [271]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()

le.fit(
    df["country"]
)
a=le.transform(
    df["country"]
)
df["country"]=a
df

Unnamed: 0,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,27,0,41.851720,62812.09301,11609.380910,238961.2505,35321.45877
1,17,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,1,1,43.152897,53798.55112,11160.355060,638467.1773,42925.70921
3,41,1,58.271369,79370.03798,14426.164850,548599.0524,67422.36313
4,26,1,57.313749,59729.15130,5358.712177,560304.0671,55915.46248
...,...,...,...,...,...,...,...
495,127,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,208,1,37.642000,56039.49793,12301.456790,360419.0988,31491.41457
497,144,1,53.943497,68888.77805,10611.606860,764531.3203,64147.28888
498,24,1,59.160509,49811.99062,14013.034510,337826.6382,45442.15353


In [272]:
from sklearn.preprocessing import MinMaxScaler
scaler  = MinMaxScaler().fit(df.drop(columns=["country","age","car purchase amount"]))
scaled_data = scaler.transform(df.drop(columns=["country","age","car purchase amount"]))
s= df.columns.tolist()
s.pop(0)
s.pop(1)
s.pop(4)
df_ =pd.DataFrame(scaled_data,columns=s)
df_

Unnamed: 0,gender,annual Salary,credit card debt,net worth
0,0.0,0.535151,0.578361,0.223430
1,0.0,0.583086,0.476028,0.521402
2,1.0,0.422482,0.555797,0.631089
3,1.0,0.742125,0.719908,0.539387
4,1.0,0.496614,0.264257,0.551331
...,...,...,...,...
495,0.0,0.649280,0.346528,0.532316
496,1.0,0.450494,0.613139,0.347366
497,1.0,0.611110,0.528221,0.759726
498,1.0,0.372650,0.699147,0.324313


In [273]:
bins = [0, 18, 30, 40, 50, 100]
labels = ['0-18', '19-30', '31-40', '41-50', '51+']
age_bins = pd.cut(df["age"], bins=bins, labels=labels, right=False)
df_["age"]=age_bins
df_

Unnamed: 0,gender,annual Salary,credit card debt,net worth,age
0,0.0,0.535151,0.578361,0.223430,41-50
1,0.0,0.583086,0.476028,0.521402,41-50
2,1.0,0.422482,0.555797,0.631089,41-50
3,1.0,0.742125,0.719908,0.539387,51+
4,1.0,0.496614,0.264257,0.551331,51+
...,...,...,...,...,...
495,0.0,0.649280,0.346528,0.532316,41-50
496,1.0,0.450494,0.613139,0.347366,31-40
497,1.0,0.611110,0.528221,0.759726,51+
498,1.0,0.372650,0.699147,0.324313,51+


In [274]:
from sklearn.preprocessing import OneHotEncoder

oh_enc = OneHotEncoder(handle_unknown='ignore')

oh_enc.fit(df_['age'].values.reshape(-1, 1))

oh_enc.categories_

[array(['19-30', '31-40', '41-50', '51+'], dtype=object)]

In [275]:
oh_data = oh_enc.transform(df_["age"].values.reshape(-1, 1)).toarray()
oh_df2 = pd.DataFrame(oh_data, columns = oh_enc.categories_)
oh_df2.head()

Unnamed: 0,19-30,31-40,41-50,51+
0,0.0,0.0,1.0,0.0
1,0.0,0.0,1.0,0.0
2,0.0,0.0,1.0,0.0
3,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,1.0


In [276]:
df_ = df_.drop(["age"], axis=1)

In [277]:
df__=pd.concat([oh_df2, df_],axis=1)
df__

Unnamed: 0,"(19-30,)","(31-40,)","(41-50,)","(51+,)",gender,annual Salary,credit card debt,net worth
0,0.0,0.0,1.0,0.0,0.0,0.535151,0.578361,0.223430
1,0.0,0.0,1.0,0.0,0.0,0.583086,0.476028,0.521402
2,0.0,0.0,1.0,0.0,1.0,0.422482,0.555797,0.631089
3,0.0,0.0,0.0,1.0,1.0,0.742125,0.719908,0.539387
4,0.0,0.0,0.0,1.0,1.0,0.496614,0.264257,0.551331
...,...,...,...,...,...,...,...,...
495,0.0,0.0,1.0,0.0,0.0,0.649280,0.346528,0.532316
496,0.0,1.0,0.0,0.0,1.0,0.450494,0.613139,0.347366
497,0.0,0.0,0.0,1.0,1.0,0.611110,0.528221,0.759726
498,0.0,0.0,0.0,1.0,1.0,0.372650,0.699147,0.324313


# Step 4: Train Test Split

In [286]:
from sklearn.model_selection import train_test_split
X= df__.loc[:,:"net worth"]
y=df["car purchase amount"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state = 43
)

X_train

Unnamed: 0,"(19-30,)","(31-40,)","(41-50,)","(51+,)",gender,annual Salary,credit card debt,net worth
12,0.0,0.0,1.0,0.0,0.0,0.216704,0.508458,0.419293
199,0.0,1.0,0.0,0.0,1.0,0.540135,0.178787,0.657583
488,0.0,0.0,1.0,0.0,0.0,0.519699,0.024865,0.486936
231,0.0,0.0,1.0,0.0,0.0,0.717943,0.342822,0.029204
3,0.0,0.0,0.0,1.0,1.0,0.742125,0.719908,0.539387
...,...,...,...,...,...,...,...,...
277,0.0,0.0,0.0,1.0,0.0,0.739663,0.342365,0.385408
305,0.0,0.0,1.0,0.0,1.0,0.275038,0.299138,0.570633
255,0.0,0.0,1.0,0.0,1.0,0.434374,0.378477,0.135595
320,0.0,0.0,1.0,0.0,0.0,0.747274,0.555181,0.415600


# Step 5: Build the Artifical Neural Network Model

In [287]:
model = Sequential()
model.add(Dense(units=64, activation= "relu", input_shape=(8,)))
model.add(Dense(units=1, activation= "relu"))
model.compile(
  optimizer='adam',
  loss='mse',
  metrics = ["accuracy","mse"]
)


### Clarify Your Artificial Neural Network (ANN) Model, Optimization, and Loss Function Choices and justify

The choice of 'relu' activation in the hidden layer is a common practice as it introduces non-linearity, allowing the model to learn complex patterns, and is suitable for regression tasks.
Using 'adam' optimizer is a good default choice due to its adaptive learning rates, which can lead to faster convergence and improved generalization.
'mse' is suitable for regression tasks, providing a continuous measure of the difference between predicted and true values.

# Step 6: Train the Model


In [289]:
model.fit(X_train,y_train,epochs=10,batch_size=8,validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x1fe64428430>

# Step 7: Evaluate the Model

In [290]:
test_loss = model.evaluate(X_test, y_test)



# Step 8: Build the Deep Neural Network Model

In [293]:
model_d = Sequential()
model_d.add(Dense(units=16, activation= "relu", input_shape=(8,)))
model_d.add(Dense(units=16, activation= "tanh"))
model_d.add(Dense(units=1, activation= "relu"))
model_d.compile(
  optimizer='adam',
  loss='mse',
  metrics = ["accuracy","mse"]
)


### Clarify Your Deep Neural Network (DNN) Model, Optimization, and Loss Function Choices and justify 

The choice of 'relu' and 'tanh' activation in the hidden layer is a common practice as it introduces non-linearity, allowing the model to learn complex patterns, and is suitable for regression tasks. Using the 'adam' optimizer is a good default choice due to its adaptive learning rates, which can lead to faster convergence and improved generalization. 'mse' is suitable for regression tasks, providing a continuous measure of the difference between predicted and true values.

# Step 9: Train the Model

In [294]:
model_d.fit(X_train,y_train,epochs=10,batch_size=8,validation_split=0.20)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x1fe6444a580>

# Step 10: Evaluate the Model

In [295]:
test_loss = model_d.evaluate(X_test, y_test)




# Step 11: Evaluate and Compare Scores, Training Time, and Prediction Time of ANN/DNN Models

In [296]:
print("-"*30)
print("ANN:")
print("Evaluate for test dataset:\n\
   loss: 2002217984.0000 \n\
   accuracy:0.0000e+00 \n\
   mse: 2002217984.0000 ")
print("Train the Model :4.7s"
)
print("DNN:")
print("Evaluate for test dataset:\n\
   loss: 2031594368.0000  \n\
   accuracy:0.0000e+00 \n\
   mse: 2031594368.0000 ")
print("Train the Model :6s"
)
print("-"*30)
   

------------------------------
ANN:
Evaluate for test dataset:
   loss: 2002217984.0000 
   accuracy:0.0000e+00 
   mse: 2002217984.0000 
Train the Model :4.7s
DNN:
Evaluate for test dataset:
   loss: 2031594368.0000  
   accuracy:0.0000e+00 
   mse: 2031594368.0000 
Train the Model :6s
------------------------------
