## Load Data

In this tutorial, we'll use the Pima Indians onset of diabetes dataset from the
UCI machine learning repository. Ir describes patient medical record data for
Pima Indians and whether they had an onset of diabetes within five years.

It is a binary classification problem (onset of diabetes as 1 or not as 0).
All of the input variables that describes each patient aren't numeric. This
makes it easy to use directly with neural networks that expect numerical
input and output values and is an ideal choice for our first neural network
in keras.

The data is available here:

- [Dataset CSV file (pima-indians-diabetes.csv)](https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv)
- [Dataset details](https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.names)

In [1]:
# Imports
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from infoml import utils
import numpy as np

# Download the dataset
dfile = utils.downloadurl('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv')

2023-02-26 21:53:11.262385: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-26 21:53:12.439266: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-02-26 21:53:13.895538: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/home/muaddib/.conda/envs/keras/lib/
2023-02-26 21:53:13.895684: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 

There are eight input variables and one output variable (the last column). You will be learning a model to map rows of input variables (X) to an output variable (y), which is often summarized as y = f(X).

The variables can be summarized as follows:

Input Variables (X):

Number of times pregnant
Plasma glucose concentration at 2 hours in an oral glucose tolerance test
Diastolic blood pressure (mm Hg)
Triceps skin fold thickness (mm)
2-hour serum insulin (mu U/ml)
Body mass index (weight in kg/(height in m)^2)
Diabetes pedigree function
Age (years)
Output Variables (y):

Class variable (0 or 1)
Once the CSV file is loaded into memory, you can split the columns of data into input and output variables.

In [6]:
# load the dataset
dataset = np.loadtxt(dfile, delimiter=',')

# split into input and output variables
X = dataset[:, :8]
y = dataset[:, 8]

## Define Keras Model

Models in Keras are defined as a sequence of layers.

We create a [Sequential model](https://keras.io/models/sequential/) and add
layers one at a time unitl we are happy with out network architercture.

The first thing to get right is to enssure the input layer has the correct
number of input features. This can be specified when creating the first layer
with the **input_dim** argument and setting it to (8,) for presenting the
eight input variables as a single vector.

**How do we know the number of layers and their types?**

This is a tricky question. There are heuristics that you can use and often,
the best network structure is found through a process of trial and error
experimentation. Generally, you need a network large enough to capture the
structure of the problem.

In this example, let's use a fully-connected network structure with three
layers. This means that each node in one layer is connected to each node in
the next layer. 

We can piece this together by first creating a network that
has a single hidden layer with the same number of neurons as input variables
(8). We can then create a second hidden layer with half the number of
neurons (4) and the output layer with a single neuron to predict the class
value (0 or 1). We can do this using the Keras Dense class.

Fully connected layers are defined using the `Dense` class. You can specify the
number of neurons in the layer as the first argument and the activation function
using the **activation** argument. It is common to use the rectifier (relu)
activation function on the first two layers and the sigmoid activation function
in the output layer.

It used to be the case that Sigmoid and Tanh activation functions were preferred
for all layers. Nowadays better performance is achieved using the ReLU
activation function. Using a sigmoid on the output layer ensures the network
output is between 0 and 1 and is easy to map to either a probability of class
1 or snap to a hard classification of either class with a defualt threshold of
0.5.

You can piece it all together by adding each layer.

- The model expects rows of data with 8 variables (the input_dim=8 argument)
- The first hidden layer has 12 nodes and uses the relu activation function.
- The second hidden layer has 8 nodes and uses the relu activation function.
- The output layer has one node and uses the sigmoid activation function.

[Setting the number of layers and nodes](https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/)

In [3]:
# Define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

2023-02-26 22:12:10.193571: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-02-26 22:12:10.308036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-02-26 22:12:10.308102: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:966] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-02-26 22:12:10.311632: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate

**Note:** The most confusing thing here is that the shape of the input to the
model is defined as an argument on the first hidden layer. This means that the
line of code that adds the first Dense layer is doing two things, defining the 
input or visible layer and the first hidden layer. This is a common Keras
design pattern.

## Compile the keras model

Now that the model is defined, you can compile it.

Compiling the model uses the efficient numerical libraries under the covers
(the so-called backend) such as Theano or TensorFlow. The backend automatically
chooses the best way to represent the network for training and making 
predictions to run on your hardware, such as CPU or GPU or even distributed.

When compiling, we must specify some additional properties required when
training the network. Remember training a network means finding the best set
of weights to map inputs to outputs in our training data.

You must specify the loss function to use to evaluate a set of weights, the
optimizer used to search through different weights for the network, and any
optional metrics you want to collect and report during training.

In this case, use *cross entropy* as the loss argument. This loss is for a
binary classification problem and is defined in keras as **`binary_crossentropy`**.

[Choosing a loss function](https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/)

We willa define the optimizer as the efficient gradient descent algorithm 
**`adam`**. Adam is an extension to stochastic gradient descent that has
proven very effective for deep learning models.

[Introduction to Adam](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/)

Finally, because it is a classification problem, we will collect and report
the classification accuracy defined via the **`metrics`** argument.

In [4]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

## Fit the keras model

You have defined your model and compiled it to get ready for efficient
computation. Now it is time to execute the model on some data.

You can train or fit your model on your loaded data by calling the **`fit()`**
function on the model.

Training occurs over epochs, and each epoch is split into batches.

- **Epochs:** One pass through all of the rows in the training dataset.
- **Batch:** One or more samples considered by the model within an epoch before
  the weights are updated.

One epoch comprises one or more batches, based on the chosen batch size, and
the model is fit for many epochs. For more information on the batch size and
number of epochs, see [Batch vs Epoch in Neural Networks](https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/).

The training process will run for a fixed number of epochs, through the dataset
that you must specify using the **`epochs`** argument. You must also set the
number of dataset rows that are considered before the model weights are updated
within each epoch, called the batch size and set using the **`batch_size`**

This problem will run for a small number of epochs (150) and use a relatively
small batch size of 10.

These configurations can be chosen experimentally by trial and error. You 
want to train the model enough so that it learns a good (or good enough)
mapping of rows of input data to the output classification. The model will
always have some error, but the amount of error will level out after some point
for a given model configuration. This is called **model convergence**.

In [7]:
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<keras.callbacks.History at 0x7f68bc31d420>

This is where the work happens on your CPU or GPU.

No GPU is required for this exmaple, but if you're interested in how to run
large models on GPU hardware cheaply in the cloud, see this 
post: [How to Setup Amazon AWS EC2 GPUs to Train Keras Deep Learning Models](https://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/)

## Evaluate Keras Model

You have trained our neural network on the entire dataset, and you can evaluate
the performance of the network on the same dataset.

This will only give you an idea of how well you have modeled the dataset (e.g.
training accuracy), but no idea of how well the algorithm might perform on
new data. This was done for simplicitly, but ideally you could separate your
data into train and test datasets for training and evaluation of the model.

You can evaluate your model on your training dataset by calling the **`evaluate()`**
function and passing it the same input and output used to train the model.

This will generate a prediction for each input and output pair and collect 
scores, including the average loss and any metrics you have configured, such
as accuracy.

The evaluate function will return a list with two values. The first will be the
loss of the model on the dataset, and the second will be the accuracy of the
model on the dataset. You are only interested in reporting the accuracy so
ignore the loss value.

In [9]:
# evaluate the keras model
loss, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

Accuracy: 77.21


## Make Predictions

You can adapt the above example and use it to generate predictions on the
training dataset, pretending it is a new dataset you have not seen before.

Making predictions is as easy as calling the **`predict()`** function on the
model. You are using a sigmoid activation function on the output layer, so the
predictions will be a probability between 0 and 1. You can easily convert them
into a crisp binary prediction for this classification task by rouding them.

In [12]:
# make probability predictions with the model
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]



Alternatively, you can convert the probability into 0 or 1 to predict crisp
classes directly, for example:

In [15]:
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
# summarize the first 5 cases
for i in range(5):
    print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 1 (expected 1)
[1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0)
[8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1)
[1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0)
[0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)
