<a href="https://colab.research.google.com/github/cagBRT/Machine-Learning/blob/master/diabetesML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/Machine-Learning.git cloned-repo
%cd cloned-repo
!ls

# **Load the necessary libraries**

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

# Install TensorFlow
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

from keras import optimizers
from keras.utils import to_categorical
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Activation, Dense

print(tf.__version__)

**Can we predict which patients will get diabetes within five years.**

This tutorial uses the Pima Indians onset of diabetes dataset. <br>
It is a standard machine learning dataset from the UCI Machine Learning repository. <br>
It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.

  ** Dataset Details**
   1. Number of times pregnant
   2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
   3. Diastolic blood pressure (mm Hg)
   4. Triceps skin fold thickness (mm)
   5. 2-Hour serum insulin (mu U/ml)
   6. Body mass index (weight in kg/(height in m)^2)
   7. Diabetes pedigree function
   8. Age (years)
   9. Class variable (0 or 1)

In [0]:
# load the dataset
dataset = pd.read_csv('pima_indians_diabetes.csv', delimiter=',')

# **Examine the dataset**

In [0]:
dataset.head()

**Is data missing?**

In [0]:
dataset.isna().sum()

**Are the Diabetes/No diabetes classes balanced?**

In [0]:
dataset["class"].value_counts()

**Are there any features that are strongly correlated?** 

In [0]:
corr = dataset.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)
plt.show()

# **Prepare the data**

Split the dataset into train and test sets

In [0]:
#Choose a number for frac 0.2 - 0.95
df_train = dataset.sample(frac=##,random_state=0)
df_test = dataset.drop(df_train.index)
print("done")

# **Remove the labels**

In [0]:
train_labels = df_train.pop('class')
test_labels = df_test.pop('class')

# **Normalize the data**

In [0]:
train_stats = df_train.describe()
train_stats = train_stats.transpose()
train_stats

In [0]:
test_stats = df_test.describe()
test_stats = test_stats.transpose()
test_stats

In [0]:
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']

normed_train_data = norm(df_train)
normed_test_data = norm(df_test)
print("done")

# **Create the model**

In [0]:
model = keras.Sequential()
#Choose the size of the layers and the activation functions
model.add(Dense(##, input_shape=([len(df_train.keys())]), activation='???'))
#More layers???
model.add(Dense(##, activation='???'))
model.add(Dense(1, activation='???'))

https://keras.io/losses/


https://keras.io/optimizers/

# **Set the HyperParameters**

In [0]:
#'mean_squared_error' or 'binary_crossentropy' or 'mean_absolute_error' or 
#'mean_absolute_percentage_error'

adam = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
sgd = keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)
rms = keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9)
model.compile(loss='mean_squared_error', optimizer=adam, metrics=['accuracy','mean_squared_error'])

In [0]:
#Choose how many epochs to do 
EPOCHS = ##

# **Train the model**

In [0]:
history = model.fit(
  normed_train_data, train_labels,
  #Choose the batch size
  epochs=EPOCHS, validation_split = 0.2, verbose=1,batch_size = #### )

In [0]:
hist = pd.DataFrame(history.history)
hist['epochs'] = history.epoch
hist.tail(20)

In [0]:
axes = plt.gca()
axes.set_ylim([0,0.5])
plt.plot(hist['loss'], label='training loss')
plt.plot(hist['val_loss'], label='testing loss')
plt.title('Loss')
plt.xlabel('epochs')
plt.ylabel('loss')
plt.legend()

In [0]:
axes = plt.gca()
axes.set_ylim([0,1])
plt.plot(hist['accuracy'], label='training accuracy')
plt.plot(hist['val_accuracy'], label='testing accuracy')
plt.title('Accuracy')
plt.xlabel('epochs')
plt.ylabel('acc')
plt.legend()

In [0]:
test_l = model.predict_classes(normed_test_data)

In [0]:
from sklearn.metrics import confusion_matrix
#0,0=true neg, 1,0=false neg, 1,1=true pos, 0,1=false pos
confusion_matrix(test_labels, test_l)

In [0]:
count = 0
for i in range(len(normed_test_data)):
  print(normed_test_data.iloc[i],test_labels.iloc[i]," = ", test_l[i])
  if test_labels.iloc[i] == test_l[i]:
    count = count+1
print(count)