# Introduction to Deep Learning

## Objectives
In this lab, you will embark on the journey of creating a ANN, DNN model tailored for predicting the total expenditure of potential consumers based on various characteristics. As a vehicle salesperson, your goal is to develop a model that can effectively estimate the overall spending potential.

Your task is to build and train an ANN/DNN model using tensorflow in a Jupyter notebook.

Feel Free to Explore the dataset, analyze its contents, and derive meaningful insights. Additionally, feel empowered to create insightful visualizations that enhance the understanding of the data. 

# Step 1: Import Libraries

In [119]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import accuracy_score, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import tensorflow as tf

# Step 2: Load and Explore the Data

In [120]:
df = pd.read_csv('car_purchasing.csv', encoding='ISO-8859-1')
df

Unnamed: 0,customer name,customer e-mail,country,gender,age,annual Salary,credit card debt,net worth,car purchase amount
0,Martina Avila,cubilia.Curae.Phasellus@quisaccumsanconvallis.edu,Bulgaria,0,41.851720,62812.09301,11609.380910,238961.2505,35321.45877
1,Harlan Barnes,eu.dolor@diam.co.uk,Belize,0,40.870623,66646.89292,9572.957136,530973.9078,45115.52566
2,Naomi Rodriquez,vulputate.mauris.sagittis@ametconsectetueradip...,Algeria,1,43.152897,53798.55112,11160.355060,638467.1773,42925.70921
3,Jade Cunningham,malesuada@dignissim.com,Cook Islands,1,58.271369,79370.03798,14426.164850,548599.0524,67422.36313
4,Cedric Leach,felis.ullamcorper.viverra@egetmollislectus.net,Brazil,1,57.313749,59729.15130,5358.712177,560304.0671,55915.46248
...,...,...,...,...,...,...,...,...,...
495,Walter,ligula@Cumsociis.ca,Nepal,0,41.462515,71942.40291,6995.902524,541670.1016,48901.44342
496,Vanna,Cum.sociis.natoque@Sedmolestie.edu,Zimbabwe,1,37.642000,56039.49793,12301.456790,360419.0988,31491.41457
497,Pearl,penatibus.et@massanonante.com,Philippines,1,53.943497,68888.77805,10611.606860,764531.3203,64147.28888
498,Nell,Quisque.varius@arcuVivamussit.net,Botswana,1,59.160509,49811.99062,14013.034510,337826.6382,45442.15353


In [121]:
df.columns

Index(['customer name', 'customer e-mail', 'country', 'gender', 'age',
       'annual Salary', 'credit card debt', 'net worth',
       'car purchase amount'],
      dtype='object')

In [122]:
df.shape

(500, 9)

In [123]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   customer name        500 non-null    object 
 1   customer e-mail      500 non-null    object 
 2   country              500 non-null    object 
 3   gender               500 non-null    int64  
 4   age                  500 non-null    float64
 5   annual Salary        500 non-null    float64
 6   credit card debt     500 non-null    float64
 7   net worth            500 non-null    float64
 8   car purchase amount  500 non-null    float64
dtypes: float64(5), int64(1), object(3)
memory usage: 35.3+ KB


In [124]:
df.describe()

Unnamed: 0,gender,age,annual Salary,credit card debt,net worth,car purchase amount
count,500.0,500.0,500.0,500.0,500.0,500.0
mean,0.506,46.241674,62127.239608,9607.645049,431475.713625,44209.799218
std,0.500465,7.978862,11703.378228,3489.187973,173536.75634,10773.178744
min,0.0,20.0,20000.0,100.0,20000.0,9000.0
25%,0.0,40.949969,54391.977195,7397.515792,299824.1959,37629.89604
50%,1.0,46.049901,62915.497035,9655.035568,426750.12065,43997.78339
75%,1.0,51.612263,70117.862005,11798.867487,557324.478725,51254.709517
max,1.0,70.0,100000.0,20000.0,1000000.0,80000.0


In [125]:
df.dtypes

customer name           object
customer e-mail         object
country                 object
gender                   int64
age                    float64
annual Salary          float64
credit card debt       float64
net worth              float64
car purchase amount    float64
dtype: object

In [126]:
print(df.value_counts)

<bound method DataFrame.value_counts of        customer name                                    customer e-mail  \
0      Martina Avila  cubilia.Curae.Phasellus@quisaccumsanconvallis.edu   
1      Harlan Barnes                                eu.dolor@diam.co.uk   
2    Naomi Rodriquez  vulputate.mauris.sagittis@ametconsectetueradip...   
3    Jade Cunningham                            malesuada@dignissim.com   
4       Cedric Leach     felis.ullamcorper.viverra@egetmollislectus.net   
..               ...                                                ...   
495           Walter                                ligula@Cumsociis.ca   
496            Vanna                 Cum.sociis.natoque@Sedmolestie.edu   
497            Pearl                      penatibus.et@massanonante.com   
498             Nell                  Quisque.varius@arcuVivamussit.net   
499            Marla                          Camaron.marla@hotmail.com   

          country  gender        age  annual Salary  credit

# Step 3: Data Cleaning and Preprocessing


*Hint: You could use a `StandardScaler()` or `MinMaxScaler()`*

In [127]:
df.isna().sum()

customer name          0
customer e-mail        0
country                0
gender                 0
age                    0
annual Salary          0
credit card debt       0
net worth              0
car purchase amount    0
dtype: int64

In [128]:
df.duplicated().sum()

0

In [129]:
df['age'] = df['age'].astype(int)
print(df.dtypes)

customer name           object
customer e-mail         object
country                 object
gender                   int64
age                      int64
annual Salary          float64
credit card debt       float64
net worth              float64
car purchase amount    float64
dtype: object


In [130]:
categorical_columns = ['customer name', 'customer e-mail', 'country']
dummy_columns = pd.get_dummies(df[categorical_columns])

df = pd.concat([df, dummy_columns], axis=1)

df = df.drop(categorical_columns, axis=1)

In [131]:
print(df.dtypes)

gender                      int64
age                         int64
annual Salary             float64
credit card debt          float64
net worth                 float64
                           ...   
country_Western Sahara       bool
country_Yemen                bool
country_Zimbabwe             bool
country_marlal               bool
country_Åland Islands        bool
Length: 1215, dtype: object


In [132]:
s = MinMaxScaler()
df[numeric] = s.fit_transform(df[numeric])
df

Unnamed: 0,gender,age,annual Salary,credit card debt,net worth,car purchase amount,customer name_Abel Stanton,customer name_Abigail X. Lindsey,customer name_Abra D. Golden,customer name_Adria Mathis,...,country_Venezuela,country_Viet Nam,"country_Virgin Islands, British","country_Virgin Islands, United States",country_Wallis and Futuna,country_Western Sahara,country_Yemen,country_Zimbabwe,country_marlal,country_Åland Islands
0,0,0.42,0.535151,0.578361,0.223430,0.370725,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,0,0.40,0.583086,0.476028,0.521402,0.508669,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,1,0.46,0.422482,0.555797,0.631089,0.477827,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,1,0.76,0.742125,0.719908,0.539387,0.822850,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,1,0.74,0.496614,0.264257,0.551331,0.660781,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,0,0.42,0.649280,0.346528,0.532316,0.561992,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
496,1,0.34,0.450494,0.613139,0.347366,0.316780,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
497,1,0.66,0.611110,0.528221,0.759726,0.776722,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
498,1,0.78,0.372650,0.699147,0.324313,0.513270,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [133]:
'''
numeric = ['age', 'annual Salary', 'credit card debt', 'net worth', 'car purchase amount']

s = StandardScaler()
df[numeric] = s.fit_transform(df[numeric])
df
'''

"\nnumeric = ['age', 'annual Salary', 'credit card debt', 'net worth', 'car purchase amount']\n\ns = StandardScaler()\ndf[numeric] = s.fit_transform(df[numeric])\ndf\n"

# Step 4: Train Test Split

In [134]:
x = df.drop('car purchase amount', axis=1)  
y = df['car purchase amount']  

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Step 5: Build the Artifical Neural Network Model

In [142]:
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy' , metrics=['accuracy'])
model.fit(tensor, y_train, epochs=100, batch_size=32 , validation_split=0.2)


#another way:
'''
model = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(8, activation='relu'),
    layers.Dense(1)
])

model.compile(optimizer='adam', loss='mean_squared_error')
tensor = tf.convert_to_tensor(X_train, dtype=tf.int32)
model.fit(tensor, y_train, epochs=100, batch_size=32)

loss = model.evaluate(X_test, y_test)

predictions = model.predict(X_test)

'''

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

"\nmodel = keras.Sequential([\n    layers.Dense(16, activation='relu', input_shape=(X_train.shape[1],)),\n    layers.Dense(8, activation='relu'),\n    layers.Dense(1)\n])\n\nmodel.compile(optimizer='adam', loss='mean_squared_error')\ntensor = tf.convert_to_tensor(X_train, dtype=tf.int32)\nmodel.fit(tensor, y_train, epochs=100, batch_size=32)\n\nloss = model.evaluate(X_test, y_test)\n\npredictions = model.predict(X_test)\n\n"

### Clarify Your Artificial Neural Network (ANN) Model, Optimization, and Loss Function Choices and justify

Write your anwser here

# Step 6: Train the Model


In [144]:
X_train = X_train.astype(np.float32)
y_train = y_train.astype(np.int64)

model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x137a446d0>

# Step 7: Evaluate the Model

In [146]:
X_test = X_test.astype(np.float32)
y_test = y_test.astype(np.int64)

test_loss, test_accuracy = model.evaluate(X_test, y_test)

print('Test Loss:', test_loss)
print('Test Accuracy:', test_accuracy)

Test Loss: 0.0012662390945479274
Test Accuracy: 1.0


# Step 8: Build the Deep Neural Network Model

In [151]:
num_classes = 2 
input_shape = (5,)  

model = Sequential()

model.add(Dense(64, activation='relu', input_shape=input_shape))
model.add(Dropout(0.2))

model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))

model.add(Dense(num_classes, activation='softmax'))

model.summary()

Model: "sequential_20"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_51 (Dense)            (None, 64)                384       
                                                                 
 dropout_6 (Dropout)         (None, 64)                0         
                                                                 
 dense_52 (Dense)            (None, 128)               8320      
                                                                 
 dropout_7 (Dropout)         (None, 128)               0         
                                                                 
 dense_53 (Dense)            (None, 2)                 258       
                                                                 
Total params: 8962 (35.01 KB)
Trainable params: 8962 (35.01 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


### Clarify Your Deep Neural Network (DNN) Model, Optimization, and Loss Function Choices and justify 

Write your anwser here

# Step 9: Train the Model

In [152]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Step 10: Evaluate the Model

In [154]:
# Assuming you have prepared your test data and labels as X_test and y_test

# Evaluate the model on the test data
loss, accuracy = model.evaluate(X_test, y_test)

# Print the evaluation results
print(f"Loss: {loss:.4f}")
print(f"Accuracy: {accuracy:.4f}")

ValueError: in user code:

    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2066, in test_function  *
        return step_function(self, iterator)
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2049, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2037, in run_step  **
        outputs = model.test_step(data)
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 1917, in test_step
        y_pred = self(x, training=False)
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/input_spec.py", line 298, in assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer "sequential_20" is incompatible with the layer: expected shape=(None, 5), found shape=(None, 1214)


# Step 11: Evaluate and Compare Scores, Training Time, and Prediction Time of ANN/DNN Models

In [None]:
# Write your code ^_^