# Introduction to Deep Learning

## Objectives
In this lab, you will embark on the journey of creating a ANN, DNN model tailored for predicting the total expenditure of potential consumers based on various characteristics. As a vehicle salesperson, your goal is to develop a model that can effectively estimate the overall spending potential.

Your task is to build and train an ANN/DNN model using tensorflow in a Jupyter notebook.

Feel Free to Explore the dataset, analyze its contents, and derive meaningful insights. Additionally, feel empowered to create insightful visualizations that enhance the understanding of the data. 

# Step 1: Import Libraries

In [281]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Step 2: Load and Explore the Data

In [282]:
# Write your code ^_^
df = pd.read_csv('car_purchasing.csv',encoding='latin')
df

Unnamed: 0,customer name,customer e-mail,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,Martina Avila,cubilia.Curae.Phasellus@quisaccumsanconvallis.edu,Bulgaria,0,41.851720,62812.09301,11609.380910,238961.2505,35321.45877
1,Harlan Barnes,eu.dolor@diam.co.uk,Belize,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,Naomi Rodriquez,vulputate.mauris.sagittis@ametconsectetueradip...,Algeria,1,43.152897,53798.55112,11160.355060,638467.1773,42925.70921
3,Jade Cunningham,malesuada@dignissim.com,Cook Islands,1,58.271369,79370.03798,14426.164850,548599.0524,67422.36313
4,Cedric Leach,felis.ullamcorper.viverra@egetmollislectus.net,Brazil,1,57.313749,59729.15130,5358.712177,560304.0671,55915.46248
...,...,...,...,...,...,...,...,...,...
495,Walter,ligula@Cumsociis.ca,Nepal,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,Vanna,Cum.sociis.natoque@Sedmolestie.edu,Zimbabwe,1,37.642000,56039.49793,12301.456790,360419.0988,31491.41457
497,Pearl,penatibus.et@massanonante.com,Philippines,1,53.943497,68888.77805,10611.606860,764531.3203,64147.28888
498,Nell,Quisque.varius@arcuVivamussit.net,Botswana,1,59.160509,49811.99062,14013.034510,337826.6382,45442.15353


In [283]:
df[df['country']=='Israel']=df[df['country']=='Israel'].replace('Israel',np.NaN)

In [284]:
con=df['country'].value_counts().index.tolist()
for i in range(len(con)):
    print(con[i])

Bolivia
Mauritania
Iraq
Kyrgyzstan
Guinea
Samoa
Saint Barthélemy
Greenland
Laos
Armenia
Bhutan
Liechtenstein
Equatorial Guinea
Algeria
Grenada
Saint Vincent and The Grenadines
Senegal
Egypt
Sierra Leone
Marshall Islands
Sao Tome and Principe
Guam
Namibia
Venezuela
Andorra
French Polynesia
Saint Pierre and Miquelon
Saint Kitts and Nevis
Madagascar
Isle of Man
Cocos (Keeling) Islands
Northern Mariana Islands
Croatia
Uruguay
Macao
Jersey
Chile
Poland
Ecuador
Palestine, State of
China
United Arab Emirates
Dominican Republic
Puerto Rico
United States Minor Outlying Islands
Kiribati
Slovakia
Tuvalu
Nepal
Gambia
Yemen
Uganda
Kuwait
Bouvet Island
Wallis and Futuna
South Africa
Guadeloupe
Martinique
Latvia
Maldives
Belize
Mayotte
Brazil
Christmas Island
Falkland Islands
Solomon Islands
Congo (Brazzaville)
Micronesia
Djibouti
Turks and Caicos Islands
Turkmenistan
Suriname
Macedonia
Timor-Leste
Åland Islands
Mauritius
Colombia
Mozambique
Benin
Cape Verde
Jamaica
Costa Rica
Iceland
Viet Nam
Portug

In [285]:
df.isna().sum()

customer name          0
customer e-mail        0
country                6
gender                 0
age                    0
annual Salary          0
credit card debt       0
net worth              0
car purchase amount    0
dtype: int64

In [286]:
df.dropna(inplace=True)

In [287]:
df.isna().sum()

customer name          0
customer e-mail        0
country                0
gender                 0
age                    0
annual Salary          0
credit card debt       0
net worth              0
car purchase amount    0
dtype: int64

# Step 3: Data Cleaning and Preprocessing


**Hint: You could use a `StandardScaler()` or `MinMaxScaler()`**

In [288]:
df.drop(columns=['customer name', 'customer e-mail'],inplace=True)

In [289]:
from sklearn import preprocessing

le = preprocessing.LabelEncoder()

le.fit(
    df['country']
)

list(le.classes_)

['Afghanistan',
 'Algeria',
 'American Samoa',
 'Andorra',
 'Angola',
 'Anguilla',
 'Antarctica',
 'Argentina',
 'Armenia',
 'Aruba',
 'Australia',
 'Austria',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bermuda',
 'Bhutan',
 'Bolivia',
 'Bonaire, Sint Eustatius and Saba',
 'Bosnia and Herzegovina',
 'Botswana',
 'Bouvet Island',
 'Brazil',
 'Bulgaria',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Cape Verde',
 'Cayman Islands',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Christmas Island',
 'Cocos (Keeling) Islands',
 'Colombia',
 'Congo (Brazzaville)',
 'Cook Islands',
 'Costa Rica',
 'Croatia',
 'Curaçao',
 'Czech Republic',
 'Denmark',
 'Djibouti',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Ethiopia',
 'Falkland Islands',
 'Faroe Islands',
 'France',
 'French Guiana',
 'French Polynesia',
 'French Southern Territories',
 'Gabon',
 'Gambia',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Gr

In [290]:
con = le.transform(
    df['country']
)
con

array([ 27,  17,   1,  41,  26,  99, 178,  45,   8, 170, 166,  66, 130,
       139, 194,  60, 184, 145,  34,  83, 175,   4, 132, 105, 104, 136,
        39, 125,  46,  48,  42,  24, 155,  56, 161,  28, 176, 188,  53,
        57, 188,  20,  62, 123, 126, 182,  47,   7, 179, 209, 190, 157,
       199, 120, 174,  73,  50, 168,  18,  31, 141,  96,  80,  80, 201,
       111,  63,  95, 114,  61, 139, 148,  89, 167,  92, 197, 146,  72,
        69,  36, 164, 166,  49,   5,  90,  88,  49, 103,  82, 176, 138,
        64,  29, 124, 161,  33,  21,  96,  50,  69, 125, 123,  20, 191,
       147, 200, 183, 127, 160, 158, 153, 199, 194, 169,   3,  54, 127,
        87,  37, 104, 100, 115,  49,  18, 100,  99, 152,  96,  83, 168,
       108, 147,  72,   9,  98, 119,  54, 112,  68, 158,  92,  52, 171,
       204, 187, 137,  13, 104,  47,  21, 162, 182, 142,  11, 139, 101,
        44,  47, 207, 112, 114, 192,  91, 103,  88,  94,  72,  65,  25,
       181,  12,  97,  78,  58, 164, 209, 206,  76, 202, 165,   

In [291]:
df['country']= con
df

Unnamed: 0,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,27,0,41.851720,62812.09301,11609.380910,238961.2505,35321.45877
1,17,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,1,1,43.152897,53798.55112,11160.355060,638467.1773,42925.70921
3,41,1,58.271369,79370.03798,14426.164850,548599.0524,67422.36313
4,26,1,57.313749,59729.15130,5358.712177,560304.0671,55915.46248
...,...,...,...,...,...,...,...
495,127,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,207,1,37.642000,56039.49793,12301.456790,360419.0988,31491.41457
497,143,1,53.943497,68888.77805,10611.606860,764531.3203,64147.28888
498,24,1,59.160509,49811.99062,14013.034510,337826.6382,45442.15353


In [292]:
from sklearn.preprocessing import OneHotEncoder

In [293]:
df['age'] = df['age'].round().astype(int)

In [294]:
df['age']

0      42
1      41
2      43
3      58
4      57
       ..
495    41
496    38
497    54
498    59
499    47
Name: age, Length: 494, dtype: int32

In [295]:
# Write your code ^_^
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler().fit(df.drop(columns=['car purchase amount','country','age']))


In [296]:
scaled_data = scaler.transform(df.drop(columns=['car purchase amount','country','age']))

df1 = pd.DataFrame(scaled_data)
df1.columns = ['gender','annual Salary','credit card debt','net worth']
df1

Unnamed: 0,gender,annual Salary,credit card debt,net worth
0,0.0,0.535151,0.578361,0.223430
1,0.0,0.583086,0.476028,0.521402
2,1.0,0.422482,0.555797,0.631089
3,1.0,0.742125,0.719908,0.539387
4,1.0,0.496614,0.264257,0.551331
...,...,...,...,...
489,0.0,0.649280,0.346528,0.532316
490,1.0,0.450494,0.613139,0.347366
491,1.0,0.611110,0.528221,0.759726
492,1.0,0.372650,0.699147,0.324313


In [297]:
bins = [0, 18, 30, 40, 50, 100]
labels = ['0-18', '19-30', '31-40', '41-50', '51+']
agebins = pd.cut(df["age"], bins=bins, labels=labels, right=False)
df1["age"]=agebins
df1


Unnamed: 0,gender,annual Salary,credit card debt,net worth,age
0,0.0,0.535151,0.578361,0.223430,41-50
1,0.0,0.583086,0.476028,0.521402,41-50
2,1.0,0.422482,0.555797,0.631089,41-50
3,1.0,0.742125,0.719908,0.539387,51+
4,1.0,0.496614,0.264257,0.551331,51+
...,...,...,...,...,...
489,0.0,0.649280,0.346528,0.532316,51+
490,1.0,0.450494,0.613139,0.347366,41-50
491,1.0,0.611110,0.528221,0.759726,51+
492,1.0,0.372650,0.699147,0.324313,51+


In [298]:
df1=df1.dropna()

In [299]:
from sklearn.preprocessing import OneHotEncoder

oh_enc = OneHotEncoder(handle_unknown='ignore')

oh_enc.fit(df1['age'].values.reshape(-1, 1))

oh_enc.categories_

[array(['19-30', '31-40', '41-50', '51+'], dtype=object)]

In [300]:
oh_data = oh_enc.transform(df1["age"].values.reshape(-1, 1)).toarray()
oh_df2 = pd.DataFrame(oh_data, columns = oh_enc.categories_)
oh_df2.head()
df1 = df1.drop(["age"], axis=1)

In [301]:
df2=pd.concat([oh_df2, df1],axis=1)
df2

Unnamed: 0,"(19-30,)","(31-40,)","(41-50,)","(51+,)",gender,annual Salary,credit card debt,net worth
0,0.0,0.0,1.0,0.0,0.0,0.535151,0.578361,0.223430
1,0.0,0.0,1.0,0.0,0.0,0.583086,0.476028,0.521402
2,0.0,0.0,1.0,0.0,1.0,0.422482,0.555797,0.631089
3,0.0,0.0,0.0,1.0,1.0,0.742125,0.719908,0.539387
4,0.0,0.0,0.0,1.0,1.0,0.496614,0.264257,0.551331
...,...,...,...,...,...,...,...,...
489,,,,,0.0,0.649280,0.346528,0.532316
490,,,,,1.0,0.450494,0.613139,0.347366
491,,,,,1.0,0.611110,0.528221,0.759726
492,,,,,1.0,0.372650,0.699147,0.324313


In [304]:
df2 = df2.dropna()
df2

Unnamed: 0,"(19-30,)","(31-40,)","(41-50,)","(51+,)",gender,annual Salary,credit card debt,net worth
0,0.0,0.0,1.0,0.0,0.0,0.535151,0.578361,0.223430
1,0.0,0.0,1.0,0.0,0.0,0.583086,0.476028,0.521402
2,0.0,0.0,1.0,0.0,1.0,0.422482,0.555797,0.631089
3,0.0,0.0,0.0,1.0,1.0,0.742125,0.719908,0.539387
4,0.0,0.0,0.0,1.0,1.0,0.496614,0.264257,0.551331
...,...,...,...,...,...,...,...,...
482,0.0,0.0,1.0,0.0,0.0,0.519699,0.024865,0.486936
484,0.0,0.0,1.0,0.0,1.0,0.546525,0.533238,0.302172
485,0.0,0.0,0.0,1.0,1.0,0.731478,0.501130,0.280108
486,0.0,0.0,0.0,1.0,0.0,0.655310,0.489004,0.513960


# Step 4: Train Test Split

In [305]:
X=df2
y=df['car purchase amount']

0      35321.45877
1      45115.52566
2      42925.70921
3      67422.36313
4      55915.46248
          ...     
482    45015.67953
484    56510.13294
485    47443.74443
486    41489.64123
487    32553.53423
Name: car purchase amount, Length: 482, dtype: float64

In [310]:
# Write your code ^_^
X_train, X_test, y_train, y_test = train_test_split(X, y[:482], test_size=0.20, random_state = 23)

# Step 5: Build the Artifical Neural Network Model

In [312]:
# Write your code ^_^
model = Sequential()
model.add(Dense(units =32, activation = "relu", input_shape =(8,)))
model.add(Dense(units =1, activation = "linear"))
model.compile(
    optimizer ='adam',loss = 'mse',
    metrics =['accuracy','mse']
)


### Clarify Your Artificial Neural Network (ANN) Model, Optimization, and Loss Function Choices and justify

Write your anwser here

# Step 6: Train the Model


In [313]:
# Write your code ^_^
model.fit(X_train, y_train,epochs =10,batch_size =8,validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x2001474ef80>

# Step 7: Evaluate the Model

In [314]:
model.evaluate(X_test, y_test)



[1983014016.0, 0.0, 1983014016.0]

In [None]:
# Write your code ^_^


# Step 8: Build the Deep Neural Network Model

In [316]:
# Write your code ^_^
modelD = Sequential()
modelD.add(Dense(units =32, activation = "tanh", input_shape =(8,)))
modelD.add(Dense(units =16, activation = "tanh"))
modelD.add(Dense(units =8, activation = "tanh"))
modelD.add(Dense(units =1, activation = "linear"))
modelD.compile(
    optimizer ='adam',loss = 'mse',
    metrics =['accuracy','mse']
)


### Clarify Your Deep Neural Network (DNN) Model, Optimization, and Loss Function Choices and justify 

Write your anwser here

# Step 9: Train the Model

In [317]:
# Write your code ^_^# Write your code ^_^
modelD.fit(X_train, y_train,epochs =10,batch_size =16,validation_split=0.20)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x20015255900>

# Step 10: Evaluate the Model

In [318]:
# Write your code ^_^
modelD.evaluate(X_test, y_test)



[1984694784.0, 0.0, 1984694784.0]

# Step 11: Evaluate and Compare Scores, Training Time, and Prediction Time of ANN/DNN Models

In [319]:
# Write your code ^_^
print('*'*6,'Score','*'*6)
print('ANN: 0')
print('DNN: 0')
print('*'*6,'Training Time','*'*6)
print('ANN: 1.1s')
print('DNN: 1.4s')
print('*'*6,'Prediction Time','*'*6)
print('ANN: 2ms')
print('DNN: 2ms')

****** Score ******
ANN: 0
DNN: 0
****** Training Time ******
ANN: 1.1s
DNN: 1.4s
****** Prediction Time ******
ANN: 2ms
DNN: 2ms
