# Keras - Deep Learning Neural Networks
## **Activation, Loss, Optimizer**

### Activation Functions

Activation functions serve as the building blocks of deep neural networks. They determine the output of a node or neuron allowing the neural networks to learn complex relationships in data.

There are several types of activation functions, each with its own characteristics:

* Sigmoid
* Tanh (Hyperbolic Tangent)
* ReLU (Rectified Linear Unit)
* Leaky ReLU
* Parametric ReLU (PReLU)
* elu (Exponential Linear Unit)
* Swish

### Loss Functions

Loss functions measure the difference between the predicted outputs of a model and the actual target values. The goal during the training of a deep neural network is to minimize this loss, as it represents the error or discrepancy between the predicted and true values. The choice of a loss function depends on the type of task a neural network is designed for:

* Mean Squared Error (MSE) / L2 Loss
* Mean Absolute Error (MAE) / L1 Loss
* Binary Cross-Entropy Loss / Log Loss
* Categorical Cross-Entropy Loss
* Hinge Loss

### Optimizers

Optimizers in deep neural networks are algorithms that adjust the model's parameters during the training process to minimize the loss function. Here are some commonly used optimizers:

* SGD (Stochastic Gradient Descent)
* Adam (Adaptive Moment Estimation)
* RMSProp (Root Mean Square Propagation)
* AdaGrad (Adaptive Gradient Algorithm)
* Adadelta
* Nadam (Nesterov-accelerated Adaptive Moment Estimation)

## Import libraries

In [2]:
# import the libraries
import numpy as np
from numpy import genfromtxt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import confusion_matrix,classification_report

import warnings
warnings.filterwarnings('ignore')

## Dataset

We will use the Bank Authentication Data Set. This data set consists of various image features derived from images that had 400 x 400 pixels. This notebook focuses on learning the basics of building a neural network with Keras and using different types of activation functions, loss functions, and optimizers to improve the performance of the neural network.

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400 x 400 pixels. Due to the object lens and distance to the investigated object, gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool was used to extract features from images.

**Attribute Information**:

1. variance of Wavelet Transformed image (continuous) 
2. skewness of Wavelet Transformed image (continuous) 
3. curtosis of Wavelet Transformed image (continuous) 
4. entropy of image (continuous) 
5. class (integer)

### Reading in the dataset

In [3]:
data = genfromtxt('../bank_note_data.txt', delimiter=',')
data

array([[  3.6216 ,   8.6661 ,  -2.8073 ,  -0.44699,   0.     ],
       [  4.5459 ,   8.1674 ,  -2.4586 ,  -1.4621 ,   0.     ],
       [  3.866  ,  -2.6383 ,   1.9242 ,   0.10645,   0.     ],
       ...,
       [ -3.7503 , -13.4586 ,  17.5932 ,  -2.7771 ,   1.     ],
       [ -3.5637 ,  -8.3827 ,  12.393  ,  -1.2823 ,   1.     ],
       [ -2.5419 ,  -0.65804,   2.6842 ,   1.1952 ,   1.     ]])

0's and 1's show authentic and false notes.

**0** -> Real Note

**1** -> Fake Note

### Labels

In [4]:
labels = data[:,4]
labels

array([0., 0., 0., ..., 1., 1., 1.])

### Features

In [5]:
features = data[:,0:4]
features

array([[  3.6216 ,   8.6661 ,  -2.8073 ,  -0.44699],
       [  4.5459 ,   8.1674 ,  -2.4586 ,  -1.4621 ],
       [  3.866  ,  -2.6383 ,   1.9242 ,   0.10645],
       ...,
       [ -3.7503 , -13.4586 ,  17.5932 ,  -2.7771 ],
       [ -3.5637 ,  -8.3827 ,  12.393  ,  -1.2823 ],
       [ -2.5419 ,  -0.65804,   2.6842 ,   1.1952 ]])

### Assign the data into X (features) and y (labels)

In [6]:
X = features
y = labels

## Split the Data into Training and Test

Let's split the data into a train/test set. When data is very large, we can split it into train/test/validation sets. We'll keep things simple for now.

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [8]:
# Show length of X_train and X_test
X_train

array([[-0.8734  , -0.033118, -0.20165 ,  0.55774 ],
       [ 2.0177  ,  1.7982  , -2.9581  ,  0.2099  ],
       [-0.36038 ,  4.1158  ,  3.1143  , -0.37199 ],
       ...,
       [-7.0364  ,  9.2931  ,  0.16594 , -4.5396  ],
       [-3.4605  ,  2.6901  ,  0.16165 , -1.0224  ],
       [-3.3582  , -7.2404  , 11.4419  , -0.57113 ]])

In [9]:
X_test

array([[ 1.5691  ,  6.3465  , -0.1828  , -2.4099  ],
       [-0.27802 ,  8.1881  , -3.1338  , -2.5276  ],
       [ 0.051979,  7.0521  , -2.0541  , -3.1508  ],
       ...,
       [ 3.5127  ,  2.9073  ,  1.0579  ,  0.40774 ],
       [ 5.504   , 10.3671  , -4.413   , -4.0211  ],
       [-0.2062  ,  9.2207  , -3.7044  , -6.8103  ]])

In [10]:
y_train

array([1., 1., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0.,
       1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0.,
       1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1., 0., 0.,
       0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0., 1.,
       0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0.,
       0., 1., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1.,
       1., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 1., 0.,
       0., 0., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0., 1., 1.,
       0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0.,
       0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0.,
       1., 0., 0., 1., 0.

In [11]:
y_test

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1.,
       1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.,
       1., 0., 1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 1., 0.,
       0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
       0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 0., 0.,
       0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 0., 1., 0.,
       0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.,
       0., 1., 0., 1., 0., 1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 0.,
       1., 0., 0., 0., 0., 0., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1.,
       0., 1., 0., 1., 0., 0., 0., 1., 1., 1., 1., 1., 0., 1., 0., 0., 0.,
       0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1.,
       1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 0.,
       1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.,
       1., 1., 0., 1., 0.

## Standardize the Data

Usually when using Neural Networks, you get better performance when you standardize the data. Standardization means normalizing the values so that they all fit between a certain range, like 0-1, or -1 to 1.

In [12]:
scaler_object = MinMaxScaler()
scaler_object.fit(X_train) # find out the min and the max for the data that you want to fit

In [13]:
scaled_X_train = scaler_object.transform(X_train) # this operation gives us scaler transformed training data

In [14]:
scaled_X_test = scaler_object.transform(X_test) # this operation gives us scaler transformed test data

Ok, now we have the data scaled!

In [15]:
X_train.max()

17.9274

In [16]:
scaled_X_train.max()

1.0000000000000002

In [17]:
X_train

array([[-0.8734  , -0.033118, -0.20165 ,  0.55774 ],
       [ 2.0177  ,  1.7982  , -2.9581  ,  0.2099  ],
       [-0.36038 ,  4.1158  ,  3.1143  , -0.37199 ],
       ...,
       [-7.0364  ,  9.2931  ,  0.16594 , -4.5396  ],
       [-3.4605  ,  2.6901  ,  0.16165 , -1.0224  ],
       [-3.3582  , -7.2404  , 11.4419  , -0.57113 ]])

In [18]:
scaled_X_train

array([[4.44850688e-01, 5.14130449e-01, 2.18194638e-01, 8.50172258e-01],
       [6.53339968e-01, 5.82655745e-01, 9.93242398e-02, 8.17696322e-01],
       [4.81846700e-01, 6.69377018e-01, 3.61193167e-01, 7.63368407e-01],
       ...,
       [4.11050776e-04, 8.63104170e-01, 2.34046756e-01, 3.74261253e-01],
       [2.58284115e-01, 6.16029366e-01, 2.33861752e-01, 7.02643151e-01],
       [2.65661395e-01, 2.44444278e-01, 7.20316361e-01, 7.44775785e-01]])

In [19]:
scaled_X_test

array([[0.62098955, 0.75284662, 0.21900753, 0.5730998 ],
       [0.48778602, 0.82175665, 0.09174727, 0.56211079],
       [0.51158363, 0.77924916, 0.13830875, 0.50392598],
       ...,
       [0.76115065, 0.62415668, 0.27251204, 0.83616757],
       [0.9047516 , 0.90329171, 0.03658247, 0.42267079],
       [0.49296526, 0.86039507, 0.06714046, 0.1622583 ]])

## **Building and Comparing Neural Networks with Keras**

We'll use different combinations of activation functions, loss functions, and optimizers to build neural networks and compare their performances.

### **Neural Network # 1**:
* **Activation Functions**: ReLU, sigmoid
* **Loss Function**: binary_crossentropy
* **Optimizer**: adam

### Create, Compile and Train the Model

In [21]:
# Create the model and add layers
# -----------------------------------------------------------------------------------------------
# Create the model
model_01 = Sequential()

# Add layers
# First layer - expects input of 4 features
model_01.add(Dense(4, input_dim=4, activation='relu')) # input layer

# Add another Densely Connected layer (every neuron connected to every neuron in the next layer)
model_01.add(Dense(8, activation='relu')) # hidden layer

# Last layer - simple sigmoid function to output 0 or 1 (our label)
model_01.add(Dense(1, activation='sigmoid')) # output layer

# Compile the Model
# -------------------------------------------------------------------------------------------------
model_01.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit (Train) teh Model
# -------------------------------------------------------------------------------------------------
# Play around with number of epochs as well!
model_01.fit(scaled_X_train,y_train, epochs=50, verbose=2)

Epoch 1/50
29/29 - 1s - loss: 0.7001 - accuracy: 0.4037 - 1s/epoch - 41ms/step
Epoch 2/50
29/29 - 0s - loss: 0.6863 - accuracy: 0.5354 - 68ms/epoch - 2ms/step
Epoch 3/50
29/29 - 0s - loss: 0.6771 - accuracy: 0.5495 - 75ms/epoch - 3ms/step
Epoch 4/50
29/29 - 0s - loss: 0.6693 - accuracy: 0.5495 - 84ms/epoch - 3ms/step
Epoch 5/50
29/29 - 0s - loss: 0.6624 - accuracy: 0.5495 - 78ms/epoch - 3ms/step
Epoch 6/50
29/29 - 0s - loss: 0.6553 - accuracy: 0.5495 - 76ms/epoch - 3ms/step
Epoch 7/50
29/29 - 0s - loss: 0.6479 - accuracy: 0.5495 - 106ms/epoch - 4ms/step
Epoch 8/50
29/29 - 0s - loss: 0.6396 - accuracy: 0.5539 - 73ms/epoch - 3ms/step
Epoch 9/50
29/29 - 0s - loss: 0.6313 - accuracy: 0.5724 - 90ms/epoch - 3ms/step
Epoch 10/50
29/29 - 0s - loss: 0.6224 - accuracy: 0.6246 - 90ms/epoch - 3ms/step
Epoch 11/50
29/29 - 0s - loss: 0.6135 - accuracy: 0.6464 - 69ms/epoch - 2ms/step
Epoch 12/50
29/29 - 0s - loss: 0.6041 - accuracy: 0.6823 - 66ms/epoch - 2ms/step
Epoch 13/50
29/29 - 0s - loss: 0.5946

<keras.src.callbacks.History at 0x1e7414cdc10>

### Predict New Unseen Data using the Model

Let's see how we did by predicting on **new data**. Remember, our model has **never** seen the test data that we scaled previously! This process is the exact same process you would use on totally brand new data. For example , a brand new bank note that you just analyzed.

In [22]:
predictions_01 = (model_01.predict(scaled_X_test) > 0.5).astype('int32')
predictions_01



array([[0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
    

### Evaluate Model Performance

In [23]:
print("Metric Names:")
print(model_01.metrics_names)
print()
results_01 = model_01.evaluate(x=scaled_X_test,y=y_test)
results_01

Metric Names:
['loss', 'accuracy']



[0.1522723287343979, 0.9624723792076111]

In [24]:
print("Confusion Matrix:")
print(confusion_matrix(y_test,predictions_01))
print()
print("Classification Report:")
print(classification_report(y_test, predictions_01))

Confusion Matrix:
[[252   5]
 [ 12 184]]

Classification Report:
              precision    recall  f1-score   support

         0.0       0.95      0.98      0.97       257
         1.0       0.97      0.94      0.96       196

    accuracy                           0.96       453
   macro avg       0.96      0.96      0.96       453
weighted avg       0.96      0.96      0.96       453



### **Neural Network # 2**:
* **Activation Functions**: leaky_relu, tanh
* **Loss Function**: hinge
* **Optimizer**: nadam

### Create, Compile and Train the Model

In [25]:
# Create the model and add layers
# -----------------------------------------------------------------------------------------------
# Create the model
model_02 = Sequential()

# Add layers
# First layer - expects input of 4 features
model_02.add(Dense(4, input_dim=4, activation='leaky_relu')) # input layer

# Add another Densely Connected layer (every neuron connected to every neuron in the next layer)
model_02.add(Dense(8, activation='leaky_relu')) # hidden layer

# Last layer - simple sigmoid function to output 0 or 1 (our label)
model_02.add(Dense(1, activation='tanh')) # output layer

# Compile the Model
# -------------------------------------------------------------------------------------------------
model_02.compile(loss='hinge', optimizer='nadam', metrics=['accuracy'])

# Fit (Train) teh Model
# -------------------------------------------------------------------------------------------------
# Play around with number of epochs as well!
model_02.fit(scaled_X_train,y_train, epochs=50, verbose=2)

Epoch 1/50
29/29 - 2s - loss: 1.1139 - accuracy: 0.4396 - 2s/epoch - 69ms/step
Epoch 2/50
29/29 - 0s - loss: 1.0695 - accuracy: 0.5495 - 78ms/epoch - 3ms/step
Epoch 3/50
29/29 - 0s - loss: 1.0173 - accuracy: 0.5495 - 81ms/epoch - 3ms/step
Epoch 4/50
29/29 - 0s - loss: 0.9631 - accuracy: 0.5495 - 83ms/epoch - 3ms/step
Epoch 5/50
29/29 - 0s - loss: 0.9232 - accuracy: 0.5495 - 73ms/epoch - 3ms/step
Epoch 6/50
29/29 - 0s - loss: 0.8985 - accuracy: 0.5495 - 72ms/epoch - 2ms/step
Epoch 7/50
29/29 - 0s - loss: 0.8863 - accuracy: 0.5495 - 83ms/epoch - 3ms/step
Epoch 8/50
29/29 - 0s - loss: 0.8767 - accuracy: 0.5495 - 79ms/epoch - 3ms/step
Epoch 9/50
29/29 - 0s - loss: 0.8692 - accuracy: 0.5495 - 66ms/epoch - 2ms/step
Epoch 10/50
29/29 - 0s - loss: 0.8614 - accuracy: 0.5495 - 65ms/epoch - 2ms/step
Epoch 11/50
29/29 - 0s - loss: 0.8522 - accuracy: 0.5495 - 83ms/epoch - 3ms/step
Epoch 12/50
29/29 - 0s - loss: 0.8399 - accuracy: 0.5495 - 93ms/epoch - 3ms/step
Epoch 13/50
29/29 - 0s - loss: 0.8243 

<keras.src.callbacks.History at 0x1e743917150>

### Predict New Unseen Data using the Model

Let's see how we did by predicting on **new data**. Remember, our model has **never** seen the test data that we scaled previously! This process is the exact same process you would use on totally brand new data. For example , a brand new bank note that you just analyzed.

In [26]:
predictions_02 = (model_02.predict(scaled_X_test) > 0.5).astype('int32')
predictions_02



array([[0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
    

### Evaluate Model Performance

In [27]:
print("Metric Names:")
print(model_02.metrics_names)
print()
results_02 = model_02.evaluate(x=scaled_X_test,y=y_test)
results_02

Metric Names:
['loss', 'accuracy']



[0.1930973380804062, 0.8984547257423401]

In [28]:
print("Confusion Matrix:")
print(confusion_matrix(y_test,predictions_02))
print()
print("Classification Report:")
print(classification_report(y_test, predictions_02))

Confusion Matrix:
[[253   4]
 [ 42 154]]

Classification Report:
              precision    recall  f1-score   support

         0.0       0.86      0.98      0.92       257
         1.0       0.97      0.79      0.87       196

    accuracy                           0.90       453
   macro avg       0.92      0.89      0.89       453
weighted avg       0.91      0.90      0.90       453



## Comparing the two models

In [29]:
print("Model", "\t", "Loss", "\t","Accuracy")
print("01", "\t", round(results_01[0], 3), "\t", round(results_01[1], 3))
print("02", "\t", round(results_02[0], 3), "\t", round(results_02[1], 3))

Model 	 Loss 	 Accuracy
01 	 0.152 	 0.962
02 	 0.193 	 0.898


Neural Network # 1 has less loss and more accuracy. So, it performs a better than Neural Network # 2.

## Saving and Loading the Best Model

Model # 1 is our best model so we'll save and load it.

In [30]:
model_01.save('bestmodel.h5')

In [31]:
from keras.models import load_model
best_model = load_model('bestmodel.h5')

In [32]:
# Predict unseen data using the best model
best_predictions = (best_model.predict(scaled_X_test) > 0.5).astype('int32')
print(best_predictions)
print()

[[0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [1]
 [1]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [0]
 [0]
 [1]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [0]
 [0]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [1]
 [0]
 [0]
 [0]
 [0]
 [1]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [1]
 [0]
 [1]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [1]
 [0]
 [0]
 [0]
 [0]
 [1]
 [1]
 [0]
 [1]
 [1]
 [0]
 [0]
 [0]
 [1]
 [0]
 [0]
 [0]
 [1]
 [0]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]
 [0]
 [1]
 [0]
 [1]
 [0]
 [0]
 [0]
 [1]
 [1]
 [0]
 [1]
 [1]
 [0]
 [0]
 [0]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [0]
 [1]
 [1]
 [1]
 [0]
 [1]
 [0]
 [1]
 [0]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [0]
 [0]
 [0]
 [0]
 [1]
 [0]
 [1]
 [0]
 [1]
 [1]
 [0]
 [0]
 [1]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]


# Conclusion

* The choice of activation functions, loss functions and optimizers can significantly impact the performance and training dynamics of deep learning models.
* We can try different combinations of activation and loss functions along with suitable optimizers to find the best possible parameters for our specific neural network.