Look up the Adam optimization functions in PyTorch
https://pytorch.org/docs/stable/optim.html . How does it work? Try at least one other
optimization function with the diabetes dataset shown in class. How does the model
perform with the new optimizer? Did it perform better or worse than Adam? Why do you
think that is?


How does Adam work? 
It adjusts the learning rate as it does gradient descent, to ensure reasonable values throughout the weight optimization process. 

Out of the ones tested, Rprop had the best performance. Possibly it worked better in this situation because the parameters were more specific than for Adam. According to datacamp Adam is supposed to be the 'go to' for optimization so I'm a little surprised we were able to do better. I guess it isn't always the best case or else the others wouldn't exist. 

In [63]:
import pandas as pd
import torch

diabetes_df = pd.read_csv("diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [64]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42, stratify=y)

# #Standardize
sc= StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)

In [65]:
import torch.nn as nn
import torch.nn.functional as F #this has activation functions

# Creating tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(X_train)

tensor([[-0.8514, -0.9801, -0.4048,  ..., -0.6077,  0.3108, -0.7922],
        [ 0.3566,  0.1614,  0.4654,  ..., -0.3021, -0.1164,  0.5610],
        [-0.5494, -0.5045, -0.6223,  ...,  0.3726, -0.7649, -0.7076],
        ...,
        [-0.8514, -0.7582,  0.0303,  ...,  0.7800, -0.7861, -0.2847],
        [ 1.8665, -0.3142,  0.0303,  ..., -0.5695, -1.0194,  0.5610],
        [ 0.0546,  0.7322, -0.6223,  ..., -0.3149, -0.5770,  0.3073]])


In [66]:
class ANN_Model(nn.Module):
    def __init__(self, input_features=8, hidden1=20, hidden2=20, out_features =2):
        super().__init__()
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
    
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [67]:
torch.manual_seed(42)

#instantiate the model
model = ANN_Model()

In [68]:
# loss function
loss_function = nn.CrossEntropyLoss()

#optimizer
#ptimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
#optimizer = torch.optim.Adadelta(model.parameters(), lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)
#optimizer = torch.optim.LBFGS(model.parameters(), lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn=None) #doesn't work for this
#optimizer = torch.optim.SparseAdam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08) #sparseAdam doesn't work
#optimizer = torch.optim.SGD(model.parameters(), lr= 1, momentum=0, dampening=0, weight_decay=0, nesterov=False)
optimizer = torch.optim.Rprop(model.parameters(), lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))

In [69]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
    
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() #for backward propagation 
    optimizer.step() #performs one optimization step each epoch

Epoch number: 1 with loss: 0.647413969039917
Epoch number: 11 with loss: 0.4519883096218109
Epoch number: 21 with loss: 0.38703450560569763
Epoch number: 31 with loss: 0.3395463526248932
Epoch number: 41 with loss: 0.30658775568008423
Epoch number: 51 with loss: 0.2752005457878113
Epoch number: 61 with loss: 0.25215277075767517
Epoch number: 71 with loss: 0.22651635110378265
Epoch number: 81 with loss: 0.20444045960903168
Epoch number: 91 with loss: 0.1857936829328537
Epoch number: 101 with loss: 0.17048323154449463
Epoch number: 111 with loss: 0.1541174352169037
Epoch number: 121 with loss: 0.13827121257781982
Epoch number: 131 with loss: 0.1239677220582962
Epoch number: 141 with loss: 0.11416149139404297
Epoch number: 151 with loss: 0.10690716654062271
Epoch number: 161 with loss: 0.10015998780727386
Epoch number: 171 with loss: 0.09251970052719116
Epoch number: 181 with loss: 0.08521825820207596
Epoch number: 191 with loss: 0.07992039620876312
Epoch number: 201 with loss: 0.07561813

In [70]:

#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = model(data)
        y_pred.append(prediction.argmax().item())




In [71]:
from sklearn.metrics import accuracy_score
a_score = accuracy_score(y_test, y_pred)
print(a_score)
#Adam = 0.6948051948051948
#adadelta with default parameters .7077922077922078
#SGD .6818181818
#final wiht Rprop = .7142857

0.7142857142857143


In [72]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

#Adam macro av =  .67 .68 .67
#adam weighted av = .7 .69 .7


#adadelta with default parameters macro av .68  .69  .68
#adadelta with default parameters weighted av .71 .71  .71

#SGD macro av .67  .68  .67
# SGD weighted avg .70 .68  .69

#final result with Rprop below

              precision    recall  f1-score   support

           0       0.79      0.77      0.78       100
           1       0.59      0.61      0.60        54

    accuracy                           0.71       154
   macro avg       0.69      0.69      0.69       154
weighted avg       0.72      0.71      0.72       154



Write a function that lists and counts the number of divisors for an input value.

Example 1:

Input: 5

Output: “There are 2 divisors: 1 and 5”

Example 2:

Input: 40

Output: “There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40

'def prime(x):\n    if x >= 2:\n        for y in range(2,x):\n            if not ( x % y ):\n                return False\n    else:\n        return False\n    return True\nprime(0)'

In [171]:
def divise_me(n): 
    divisor_list= []
    numbers = [x for x in range(n+1)]
    if n == 1: 
        return print('The only divisor for 1 is itself. ')
        
    for number in numbers:
        if number == 0:
            continue
        #elif number == 1: 
         #  print("The only divisor is itself")
        if n%number == 0:
            divisor_list.append(number)
    #end = ', '.join([str(x) for x in divisor_list]) + ' and ' + str(divisor_list[-1]) - have to slice the divisor list the first time 
    # or it gives you the entire list and you don't get the and in there. 
    end = ', '.join([str(n) for n in divisor_list[0:-1]]) + ' and ' + str(divisor_list[-1])    
    return print("There are " + str(len(divisor_list)) + " divisors: " + end)
    print(num_list)


divise_me(44)

There are 6 divisors: 1, 2, 4, 11, 22 and 44


In [172]:
divise_me(1)

The only divisor for 1 is itself. 


In [174]:
divise_me(88)

There are 8 divisors: 1, 2, 4, 8, 11, 22, 44 and 88


In [175]:
divise_me(8)

There are 4 divisors: 1, 2, 4 and 8


In [177]:
divise_me(5)

There are 2 divisors: 1 and 5


In [178]:
divise_me(40)

There are 8 divisors: 1, 2, 4, 5, 8, 10, 20 and 40
