# In-Class Activity, Week 18, Heather Leighton-Dick

## 1. Look up the Adam optimization functions in PyTorch https://pytorch.org/docs/stable/optim.html . How does it work? Try at least one other optimization function with the diabetes dataset shown in class. How does the model perform with the new optimizer? Did it perform better or worse than Adam? Why do you think that is?

In [14]:
import pandas as pd
import numpy as np
import torch.nn as nn
import torch

In [15]:
diabetes_df = pd.read_csv("../Homework14/diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [16]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

# #Standardize
sc= StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)
# print(X_train)
# print(y_train)

In [19]:
import torch.nn as nn
import torch.nn.functional as F #where the activation functions are

#create tensors = matrices 
X_train = torch.FloatTensor(X_train) 
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(X_train)

tensor([[ 0.9314,  2.0179,  0.7807,  ...,  0.4315, -0.3748,  0.6321],
        [ 0.6326, -1.1486,  0.4654,  ..., -0.1198, -0.2942,  0.7170],
        [-0.5625, -0.4769, -0.2703,  ..., -0.2096,  2.7452,  0.0381],
        ...,
        [-0.8613, -0.7648,  0.0450,  ...,  0.7648, -0.7838, -0.3014],
        [ 0.6326,  2.2099,  1.2010,  ...,  0.4315, -0.6047,  2.7537],
        [ 0.0351,  0.7385, -0.5856,  ..., -0.3378, -0.5778,  0.2927]])


In [56]:
#aritficial neural network
#creating a class using the neural network module
class ANN_Model(nn.Module):
    
    #uses the parameters for nn.Module, check documentation
    def __init__(self, input_features=8, hidden1=20, hidden2=20, out_features=2):
        
        # keyword super is a computed indirect reference, 
        # iolates changes and makes sure children in the layers of multiple inheritance
        # are calling the correct parents
        super().__init__() 
        
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
        
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [147]:
torch.manual_seed(42)

#create instance of model
ann = ANN_Model()

### Adam Optimizer

In [176]:
#loss function
loss_function = nn.CrossEntropyLoss()

# Adam optimizer
optimizer = torch.optim.Adam(ann.parameters(),lr=0.1)

In [181]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        #print(f'Epoch number:{epoch} with loss: {loss}')
    
    # impliment optimizer
    # gradient descent - zero the gradient before running backwards propagation
        optimizer.zero_grad()
        loss.backward()
    #perform optimization step for each epoch
        optimizer.step()

In [183]:
#predictions
y_pred = []

with torch.no_grad(): #decreases memory consumption
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax()) #returns index with max element in each prediction set

In [184]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.83      0.57      0.68       150
           1       0.50      0.79      0.61        81

    accuracy                           0.65       231
   macro avg       0.67      0.68      0.65       231
weighted avg       0.72      0.65      0.66       231



### Rprop Optimizer

In [185]:
#loss function
loss_function = nn.CrossEntropyLoss()

# Adam optimizer
optimizer = torch.optim.Rprop(ann.parameters(),lr=0.1)

In [186]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        #print(f'Epoch number:{epoch} with loss: {loss}')
    
    # impliment optimizer
    # gradient descent - zero the gradient before running backwards propagation
        optimizer.zero_grad()
        loss.backward()
    #perform optimization step for each epoch
        optimizer.step()

In [188]:
#predictions
y_pred = []

with torch.no_grad(): #decreases memory consumption
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax()) 

In [189]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.80      0.64      0.71       150
           1       0.51      0.70      0.59        81

    accuracy                           0.66       231
   macro avg       0.66      0.67      0.65       231
weighted avg       0.70      0.66      0.67       231



### In terms of recall, the Adam optimizer outperformed Rprop in the Recall (1) class by 9 percentage points (significant in terms of diagnosing people correctly). However, in the Recall (0) class, Rprop performed better. In terms of precision, they performed at about the same level.

### Rprop stands for Resilient Propagation and works by adapting the step size (learning rate) for each network weight at each step for each parameter. The Adam (Adaptive Moment Estimation) optimizer takes that part of Rprop and adds it to the way it looks at 1st and 2nd moments (which I think is a way of taking more advantage of the networks/hidden layers (?). Therefore, it makes sense that Rprop and Adam performed fairly similarly, but Adam was a little more able to make definite positive diagnoses.

## 2. Write a function that lists and counts the number of divisors for an input value.
Example 1:
Input: 5
Output: “There are 2 divisors: 1 and 5”
Example 2:
Input: 40
Output: “There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40”

In [141]:
def factors(x):
    factor_list = [x]
    
    for i in range(1, (x//2 + 1)):
        if x % i == 0:
            factor_list.append(i)
        else:
            pass

    print("There are " + str(len(factor_list)) + " divisors in " + str(x) + ": " + str(factor_list[1:i]).lstrip('[').rstrip(']') + ", and " + str(factor_list[0]) + ".")
    

In [142]:
factors(44)

There are 6 divisors in 44: 1, 2, 4, 11, 22, and 44.
