#### About

> Training models

Model training in deep learning involves the process of optimizing model parameters (weights and biases) to minimize a loss or cost function. This is usually done using an optimization algorithm, such as gradient descent, which iteratively adjusts the model parameters according to the gradient of the loss function with respect to the parameters. 

The general steps for training a deep learning model can be summarized as follows:

1. Data preparation: Prepare training data by performing pre-processing such as resizing images, normalizing pixel values, and splitting the data into training and validation sets.

2. Model architecture: Define the architecture of the deep learning model, including the number of layers, the types of layers (such as convolutional layers, pooling layers, fully connected layers), and their configurations (such as the number of filters, filter size, and activation). functions). 

3. Initialization: Initialize the model parameters (weights and biases) with small random values ​​or use special initialization methods (such as Xavier or He initialization) to ensure that the model starts with a good set of initial values.

5. Forward Propagation: Implements the forward propagation step, which involves passing the input data through the layers of the model to obtain the expected results. This includes applying activation functions, convolutions, pooling, and other operations based on the model architecture.

5. Compute Loss: Compute a loss or cost function that measures the difference between the predicted output and the ground truth labels. This is usually done using one of the loss functions discussed above.

6. Backpropagation: implements a backpropagation step that involves computing the gradient of the loss function with respect to the model parameters. This is done by using a computational chain rule to propagate the gradients from the output layer to the input layer.

7. Update parameters: Use an optimization algorithm such as gradient descent or its variants to update the model parameters by subtracting the product of the gradient from the learning rate. It adjusts the parameters in a direction that minimizes the loss function.

8. Iterate: Repeat forward propagation, loss calculation, backpropagation, and parameter update iteratively for multiple epochs (iterating over the entire training dataset) or until a stopping criterion is met, such as convergence of the loss function. 

9. Validation: Regularly evaluate the performance of the model on the validation set to monitor its generalizability and prevent overfitting. This may include the calculation of various performance measures such as precision, accuracy, recall and F1 scores. 

10.  Testing: Finally, the trained model is evaluated with a separate set of tests to obtain its performance on unseen data and evaluate its effectiveness in the real world.

In [1]:
# sample code for training CNN using CIFAR 10

In [2]:
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.losses import categorical_crossentropy
from keras.optimizers import SGD
from keras.datasets import cifar10
from keras.utils import to_categorical

2023-04-22 01:09:53.841718: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-22 01:09:53.906192: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-22 01:09:53.907555: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
# Load and preprocess CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [4]:
# Define CNN architecture
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

2023-04-22 01:11:17.018604: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-04-22 01:11:17.020687: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


In [5]:
#compile model
model.compile(optimizer=SGD(lr=0.01), loss=categorical_crossentropy, metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [7]:
#train model
model.fit(x_train, y_train, batch_size=32, epochs=1, validation_split=0.1)





<keras.callbacks.History at 0x7f42b022cdf0>

In [8]:
#Evaluate model

score = model.evaluate(x_test, y_test, batch_size=32)
print("Test loss:", score[0])
print("Test accuracy:", score[1])



Test loss: 1.724299430847168
Test accuracy: 0.3837999999523163
