#**Building a fully connected deep neural net with Keras**
<font color='grey' size='1.5'> Created by Parisa Hosseinzadeh for *Machine learning for proteins*, Spring 2022. This notebook is adapted from [Jason Brownlee](https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/).

Today, we will work on developing a fully connected deep neural net using [keras](https://keras.io/) to perform classification on *Pima Indians diabetes* set from [Kaggle](https://www.kaggle.com/datasets/kumargh/pimaindiansdiabetescsv?resource=download), the same one we used for our random forest classifier building. 

Here is the description of the dataset from the website:

This dataset describes the medical records for Pima Indians
and whether or not each patient will have an onset of diabetes within ve years.

Fields description follow:

- **preg** = Number of times pregnant
- **plas** = Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- **pres** = Diastolic blood pressure (mm Hg)
- **skin** = Triceps skin fold thickness (mm)
- **test** = 2-Hour serum insulin (mu U/ml)
- **mass** = Body mass index (weight in kg/(height in m)^2)
- **pedi** = Diabetes pedigree function
- **age** = Age (years)
- **class** = Class variable (1:tested positive for diabetes, 0: tested negative for diabetes)



## Preparing to run

### Loading and installing necessary modules

In [None]:
!pip3 install keras-visualizer

In [None]:
# first neural network with keras tutorial
from numpy import loadtxt
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
from keras.utils.vis_utils import plot_model
from keras_visualizer import visualizer 
from IPython.display import Image

%matplotlib inline

ModuleNotFoundError: ignored

### Loading and preparing the dataset

We will use loadtxt module from numpy to load in our dataset. But before that, let's take a look at the dataframe to see how it looks like.

In [None]:
# the dataset does not have column names
# defining features/column names
features = ['preg','plas','pres','skin',
            'test','mass','pedi','age',
            'class']
# loading the dataset
data = pd.read_csv('pima-indians-diabetes.csv', 
                   header=None,
                   names=features)
# viewing the top 5 rows
data.head()

You can see that the first 8 columns are the features and the last one is the lables. 

Let's now use loadtxt to load our dataset into a numpy array.

In [None]:
# load the dataset
# if you open csv, you can see that ',' is used to separate columns
# that's why we use ',' as delimiter.
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')

In [None]:
# let's take a look at our dataset
dataset

As you can see, the dataset is a matrix (array of arrays). Each row is one input. Columns define features and the last column is our label.

Define X and y. X is all the rows and the first 8 column. y is all the rows and the last column. Remember that in python, numbering always start at 0.

In [None]:
# split into input (X) and output (y) variables
X = 
y = 

In [None]:
#@markdown Sample answer

X = dataset[:,0:8]
y = dataset[:,8]

In [None]:
# let's take a look at the distribution of labels
plt.hist(y)

The two labels seem to be in the same ballpark. Let's use [train-test split from scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to generate our training set and test set. Let's keep 10% of data in the test set.

In [None]:
# import the function

# define the split
X_train, X_test, y_train, y_test =

In [None]:
#@markdown Sample code

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
     X, y, test_size=0.1, random_state=42)

## Training

The model we are building today is a fully connected dense layer with two hidden layers. The first layer has 12 neurons and the other one has 8 neurons. The output layer has a size of 1 because we're performing a binary classification.

If you remember from class, we use `relu` for all activation functions except for the output layer that uses `sigmoid`. Let's build our sequential model. Sequential means the output of each layer is the input of the next.


### Building a model

In [None]:
# define the keras model
model = Sequential()
# Build 3 layers of size 12, 8 and 1.
# The input dimension is the number of your features.
model.add(Dense( , #add number of neurons
                input_dim= , #add inout dimension
                activation='') #add activation function
          )
# add the second dense layer.
# Note that no dimension for the remaining layers
model.add(Dense( , # add number of neurons
                activation='') #add activation function
          )
# add your output layer
model.add(Dense( , # number of nodes
                activation=''))#add activation function

In [None]:
#@markdown Sample answer

# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Let's take a look at our model. We can look at it in a sequential way.

In [None]:
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

The cell below helps you visualize your model in a more graphic way.

In [None]:
visualizer(model, format='png', view=True)
Image('graph.png')

#### Q1. Time to exercise

Load your network model (it is saved as graph.png) onto the in-class activity.

### Compiling the model

Now that we have set the network architecture, we need to define the details of the model, things like the loss and the optimizer.

If you remember, we use corss-entropy as a loss function for binary classification and `adam` is almost always the best first choice for your optimizer.

In [None]:
# compile the keras model
model.compile(loss='', # choose your loss from https://keras.io/api/losses/
              optimizer='', # define optimizer
              metrics=['accuracy']) # we want to see accuracy

In [None]:
#@markdown Sample code
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

### Fitting

The next step is to fit your model to your train set.

We use epoch number of 150 and batch size of 10 for now.

In [None]:
model.fit( , # input data
           , # labels
          epochs=150, 
          batch_size=10)

In [None]:
#@markdown Sample code
# fit the keras model on the dataset
model.fit(X_train, y_train, epochs=150, batch_size=10)

### Evaluation

Let's first check the accuracy of our model.

In [None]:
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

#### Q2. Time to exercise

1. What is the accuracy you obtained?
2. How is it compared to your random forest and your gradient boosting?

Now let's predict the test set values.

In [None]:
# make probability predictions with the model
predictions = model.predict() # add test inputs

In [None]:
#@markdown Sample code
# make probability predictions with the model
predictions = model.predict(X_test)


In [None]:
# taking a look at 10 first predictions
predictions[:10]

As you can see, predictions are probabilities. We need to change them to 0/1 using a threshold (often 0.5).

Try to generate a predictions list that is just 0 or 1 from the list above.

In [None]:
predictions = 

In [None]:
#@markdown Sample code

# round predictions 
rounded = [round(x[0]) for x in predictions]
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)

Let's take a look to see if it's working.

In [None]:
predictions[:10]

Now let's try to see how well your model is working.

In [None]:
#looking at first 10 predictions
for i in range(10):
	print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

### Practice

#### Q3. Calculate accuracy on the test set.

In [None]:
# your code

#### Q4. Plot a confusion matrix on test set

[Here's](https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62) a refresher on confusion matrices. You can use [scikit](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) to generate the confusion matrix.

In [None]:
# your code

### Post-class assignment

Try changing the network architecture/parameters to improve your model. 

What is the best accuracy that you get? What is the final best model?