## RNA in Python

Next we will see how we can implement ANN (Artificial Neural Networks) in Python. For this, we will use the `keras` library over `tensorflow` (which is the most common).

### Classification of textual data sets

We are going to use the Pima Indian diabetes onset dataset. This is a standard Machine Learning dataset from the UCI Machine Learning repository. It describes the medical record data of Pima Indian patients and whether they had an onset of diabetes within five years.

#### Step 1. Reading the processed data set

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

total_data = pd.read_csv("https://raw.githubusercontent.com/4GeeksAcademy/machine-learning-content/master/assets/clean-pima-indians-diabetes.csv")

X = total_data.drop("8", axis = 1)
y = total_data["8"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

X_train.head()

Unnamed: 0,0,1,2,3,4,5,6,7
60,-0.547919,-1.154694,-3.572597,-1.288212,-0.692891,-4.060474,-0.507006,-1.041549
618,1.530847,-0.278373,0.666618,0.217261,-0.692891,-0.481351,2.44667,1.425995
346,-0.844885,0.566649,-1.194501,-0.096379,0.02779,-0.417892,0.550035,-0.956462
294,-1.141852,1.255187,-0.98771,-1.288212,-0.692891,-1.280942,-0.658012,2.702312
231,0.639947,0.410164,0.563223,1.032726,2.519781,1.803195,-0.706334,1.085644


The *train* set will be used to train the model, while the *test* set will be used to evaluate the effectiveness of the model. In addition, it is generally a good practice to normalize the data before training an artificial neural network (ANN). Two types can be applied: from 0 to 1 or from -1 to 1.

#### Step 2: Model initialization and training

Models in Keras are defined as a sequence of layers. We create a sequential model and add layers one by one until we are satisfied with our network architecture.

The input layer will always have as many neurons as predictor variables. In this case, we have a total of 8 (from 0 to 7). Next, we add two hidden layers, one of 12 neurons and one of 8. Finally, the fourth layer, the output layer, will have a single neuron since the problem is dichotomous. If it were of `n` classes, the network would have `n` outputs.

> Note: We have created a default network with random hidden layers and neurons in each hidden layer. Normally you would start this way and then do a hyperparameter optimization.

In [2]:
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import set_random_seed

set_random_seed(42)

model = Sequential()
model.add(Dense(12, input_shape = (8,), activation = "relu"))
model.add(Dense(8, activation = "relu"))
model.add(Dense(1, activation = "sigmoid"))

2023-08-07 16:30:23.463216: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-07 16:30:23.491361: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-07 16:30:23.491955: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Then, once the model is defined, we can compile it. The backend automatically chooses the best way to represent the network to train and make predictions to run on your hardware, such as CPU, GPU, or evenly distributed.

When compiling, we must specify some additional properties required when training the network. Recall that training a network means finding the best set of weights to map inputs to outputs in our dataset.

In [3]:
model.compile(loss = "binary_crossentropy", optimizer = "adam", metrics = ["accuracy"])
model

<keras.src.engine.sequential.Sequential at 0x7fa9cda46cd0>

We will define the optimizer known as `adam`. This is a popular version of gradient descent because it is automatically tuned and gives good results on a wide range of problems. We will collect and report the classification accuracy, defined through the argument of the metrics.

Training occurs in **epochs**, and each epoch is divided into **batches**.

- **Epoch**: One pass through all rows of the training data set.
- **Batch**: One or more samples considered by the model within an epoch before the weights are updated.

The training process will run for a fixed number of iterations, which are the epochs. We must also set the number of rows in the data set that are considered before the model weights are updated within each epoch, which is called the batch size and is set by the `batch_size` argument.

For this problem, we will run a small number of epochs (150) and use a relatively small batch size of 10:

In [4]:
# Fit the keras model on the data set
model.fit(X_train, y_train, epochs = 150, batch_size = 10)

Epoch 1/150


Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78/150
Epoch 7

<keras.src.callbacks.History at 0x7fa9cd1d0b50>

In [5]:
_, accuracy = model.evaluate(X_train, y_train)

print(f"Accuracy: {accuracy}")

Accuracy: 0.8420195579528809


The training time of a model will depend, first of all, on the size of the dataset (instances and features), and also on the type of model and its configuration.

The accuracy of the training set is `84.20%`.

#### Step 3: Model prediction

In [6]:
y_pred = model.predict(X_test)
y_pred[:15]



array([[2.6933843e-01],
       [5.7993677e-02],
       [7.6992743e-02],
       [4.8524177e-01],
       [3.1675667e-01],
       [6.4265609e-01],
       [7.3388085e-04],
       [2.8476545e-01],
       [8.7694836e-01],
       [4.1469648e-01],
       [1.6080230e-01],
       [8.2213795e-01],
       [2.1518065e-01],
       [5.3527528e-01],
       [1.2730679e-01]], dtype=float32)

As we can see, the model does not return the classes `0` and `1` directly, but requires a previous preprocessing:

In [7]:
y_pred_round = [round(x[0]) for x in y_pred]
y_pred_round[:15]

[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0]

With raw data, it is very difficult to know whether the model is getting it right or not. To do this, we must compare it with reality. There are many metrics to measure the effectiveness of a model in predicting, including **accuracy**, which is the fraction of predictions that the model makes correctly.

In [8]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_pred_round)

0.7272727272727273

#### Step 4: Saving the model

Once we have the model we were looking for (presumably after hyperparameter optimization), to be able to use it in the future it is necessary to store it in our directory.

In [9]:
model.save("keras_8-12-8-1_42.keras")

Adding an explanatory name to the model is vital, since in the case of losing the code that has generated it we will know what architecture it has (in this case we say `8-12-8-1` because it has 8 neurons in the input layer, 12 and 8 in the two hidden layers and one neuron in the output layer) and also the seed to replicate the random components of the model, which in this case we do by adding a number to the file name, `42`.

### Image set classification

The following is a simple example of how to train a neural network to classify images from the MNIST dataset. MNIST is a dataset of images of handwritten digits, from 0 to 9.

#### Step 1. Reading the data set

In [10]:
from tensorflow.keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize the data (transform pixel values from 0-255 to 0-1)
X_train, X_test = X_train / 255.0, X_test / 255.0

The pixel values of the images are normalized to be in the range 0 to 1 instead of 0 to 255.

#### Step 2: Model initialization and training

The architecture of the neural network is defined. In this case, we are using a simple sequential model with a flattening layer that transforms 2D images into 1D vectors, a dense layer with 128 neurons, and an output layer with 10 neurons.

An alternative way to create an ANN to the above is provided below. Both are valid:

In [11]:
from tensorflow.keras.layers import Flatten

set_random_seed(42)

model = Sequential([
  # Layer that flattens the 28x28 pixel input image to a vector of 784 elements
  Flatten(input_shape = (28, 28)),
  # Dense hidden layer with 128 neurons and ReLU activation function
  Dense(128, activation = "relu"),
  # Output layer with 10 neurons (one for each digit from 0 to 9)
  Dense(10)
])

We also added the network compiler to define the optimizer and the loss function, as we did before:

In [12]:
from tensorflow.keras.losses import SparseCategoricalCrossentropy

model.compile(optimizer = "adam", loss = SparseCategoricalCrossentropy(from_logits = True), metrics = ["accuracy"])

The model is trained on the training set for a certain number of epochs. When working with images, it is less common to use the `batch_size` parameter:

In [13]:
model.fit(X_train, y_train, epochs = 5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7fa9704ed010>

In [14]:
_, accuracy = model.evaluate(X_train, y_train)

print(f"Accuracy: {accuracy}")

Accuracy: 0.9858166575431824


The training time of a model will depend, first of all, on the size of the dataset (instances and features), and also on the type of model and its configuration.

#### Step 3: Model prediction

In [15]:
test_loss, test_acc = model.evaluate(X_test,  y_test, verbose=2)

print('\nTest accuracy:', test_acc)

313/313 - 0s - loss: 0.0841 - accuracy: 0.9751 - 271ms/epoch - 867us/step

Test accuracy: 0.9750999808311462


#### Step 4: Saving the model

Once we have the model we were looking for (presumably after hyperparameter optimization), to be able to use it in the future, it is necessary to store it in our directory.

In [16]:
model.save("keras_28x28-128-10_42.keras")

Adding an explanatory name to the model is vital, since in the case of losing the code that has generated it we will know what architecture it has (in this case we say `28x28-128-10` because it has an input layer of 28 x 28 pixels, 128 neurons in the only hidden layer it has, and 10 neurons in the output layer) and also the seed to replicate the random components of the model, which in this case we do by adding a number to the file name, `42`.