# 1. Look up the Adam optimization functions in PyTorch https://pytorch.org/docs/stable/optim.html . How does it work? Try at least one other optimization function with the diabetes dataset shown in class. How does the model perform with the new optimizer? Did it perform better or worse than Adam? Why do you think that is?

The Adam(Adaptive Movement Estimation) optimizer adapts a learning rate for each input variable for the objective function and further smooths the search process by using an exponentially decreasing moving average of the gradient to make updates to variables.

https://machinelearningmastery.com/adam-optimization-from-scratch/
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/

In [1]:
import pandas as pd
import torch

diabetes_df = pd.read_csv("../week_13/diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

# #Standardize
sc= StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)

In [3]:
#training data
#prints a tensor; multidimensional array
print(X_train)
print(y_train)

[[ 0.93138344  2.0179454   0.78066953 ...  0.43148259 -0.37477883
   0.63212912]
 [ 0.63260632 -1.14861888  0.46538785 ... -0.1198324  -0.29416766
   0.71699246]
 [-0.56250219 -0.47692343 -0.2702694  ... -0.20958135  2.74517192
   0.03808578]
 ...
 [-0.86127931 -0.76479291  0.04501228 ...  0.76483585 -0.78380586
  -0.30136756]
 [ 0.63260632  2.20985838  1.2010451  ...  0.43148259 -0.60466993
   2.75371249]
 [ 0.03505207  0.73852549 -0.58555107 ... -0.33779414 -0.57779954
   0.29267578]]
[1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1
 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0
 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 1
 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1
 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0
 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 1
 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1

In [4]:
#torch neural network model
import torch.nn as nn
import torch.nn.functional as F #where the activation functions are

#create tensors (from our data) which are matrices 

#redefine X_train and X_test based on the datatypes we need to use top optimize using torch
X_train = torch.FloatTensor(X_train) 
X_test = torch.FloatTensor(X_test)

#y is an integer. May run into issues when working with integers, so use LongTensor
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

In [5]:
#print values to see what's happening to data.
#it's been converted to a slightly different format so it's optimized for usage
print(X_train)

tensor([[ 0.9314,  2.0179,  0.7807,  ...,  0.4315, -0.3748,  0.6321],
        [ 0.6326, -1.1486,  0.4654,  ..., -0.1198, -0.2942,  0.7170],
        [-0.5625, -0.4769, -0.2703,  ..., -0.2096,  2.7452,  0.0381],
        ...,
        [-0.8613, -0.7648,  0.0450,  ...,  0.7648, -0.7838, -0.3014],
        [ 0.6326,  2.2099,  1.2010,  ...,  0.4315, -0.6047,  2.7537],
        [ 0.0351,  0.7385, -0.5856,  ..., -0.3378, -0.5778,  0.2927]])


In [6]:
#create an artificial neural network using classes
#it will accept a neural network Model
class ANN_Model(nn.Module):
    #initialize the class and the attributes created from the class
    #various data will be passed in from the nn.Model
    def __init__(self, input_features=8, hidden1=20, hidden2=20, out_features=2):
        super().__init__()
        #layer 1 connection is everything between the input value and first hidden layer
        #it's all of the different connections it's making
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
 
    #now build in forward propogation into the class 

    def forward(self, x):
        #apply activation functions
        #x is being redefined each time
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [7]:
torch.manual_seed(42)

#create instance of model that was created above
ann = ANN_Model()

In [8]:
#loss function
loss_function = nn.CrossEntropyLoss()

#optimizer: ways to improve the model on top of loss function
#lr is learning rate: how fast is it going to learn
optimizer = torch.optim.Adamax(ann.parameters(),lr=0.01)

In [9]:
#run model
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss}')
        
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() 
    optimizer.step() #perform one optimization step each epoch

Epoch number: 1 with loss: 0.647470235824585
Epoch number: 11 with loss: 0.5535582304000854
Epoch number: 21 with loss: 0.4916539788246155
Epoch number: 31 with loss: 0.45121264457702637
Epoch number: 41 with loss: 0.4320707619190216
Epoch number: 51 with loss: 0.4135775864124298
Epoch number: 61 with loss: 0.3985571563243866
Epoch number: 71 with loss: 0.3826017379760742
Epoch number: 81 with loss: 0.36624693870544434
Epoch number: 91 with loss: 0.35049572587013245
Epoch number: 101 with loss: 0.3354971706867218
Epoch number: 111 with loss: 0.3211274743080139
Epoch number: 121 with loss: 0.3066723644733429
Epoch number: 131 with loss: 0.29085466265678406
Epoch number: 141 with loss: 0.2746669054031372
Epoch number: 151 with loss: 0.25882044434547424
Epoch number: 161 with loss: 0.24144339561462402
Epoch number: 171 with loss: 0.2244419902563095
Epoch number: 181 with loss: 0.20857742428779602
Epoch number: 191 with loss: 0.1923830360174179
Epoch number: 201 with loss: 0.17724101245403

In [10]:
#predictions
y_pred = []

with torch.no_grad(): #decreases memory consumption
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax())

In [11]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.78      0.75      0.76       150
           1       0.57      0.62      0.59        81

    accuracy                           0.70       231
   macro avg       0.68      0.68      0.68       231
weighted avg       0.71      0.70      0.70       231



For the model above, I used Adamax to replace Adam as the optimizer and the performance of the model improved in predicting people who do have diabetes. Yet, the overall performance of predicting those who do have diabetes is still low. 
Adamax accelerates the optimization process; it adapts a seperate learning rate for each parameter. Adamax uses the infinity norm and is less susceptible to gradient noise.


Links for me to review:
https://machinelearningknowledge.ai/pytorch-optimizers-complete-guide-for-beginner/
https://machinelearningmastery.com/gradient-descent-optimization-with-adamax-from-scratch/
https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c#:~:text=The%20learning%20rate%20of%20AdaGrad,a%20saddle%20point%20much%20better.

# 2. Write a function that lists and counts the number of divisors for an input value.
Example 1:

Input: 5

Output: “There are 2 divisors: 1 and 5”

In [13]:
def print_divisors(x):
    #print("The divisors of",x,"are:")
    count = 0
    div = []
    for i in range(1, x + 1):
        #print(i)
        if x % i == 0:
            count = count + 1
            div.append(i)
            
#make sure return is outside of for loop
    print('There are', count, 'divisors:', div) 
print_divisors(8)

There are 4 divisors: [1, 2, 4, 8]


# 2a. Example 2:
# Input: 40
# Output: “There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40”

In [14]:
def print_divisors(x):
    #print("The divisors of",x,"are:")
    count = 0
    div = []
    for i in range(1, x + 1):
        #print(i)
        if x % i == 0:
            count = count + 1
            div.append(i)
            
#make sure return is outside of for loop
    print('There are', count, 'divisors:', div)
print_divisors(40)

There are 8 divisors: [1, 2, 4, 5, 8, 10, 20, 40]
