1. Look up the Adam optimization functions in PyTorch https://pytorch.org/docs/stable/optim.html . How does it work? 

- PyTorch optimizers are useful for minimizing the error rate while training the neural networks. 


#### When neural network training is taking place... 
    - The weights or strength of the connection between units of the network, are first randomly initialized. 
    - Then these weights are updated in each iteration (each epoch) so that they increase accuracy and decrease the error(loss) untill eventually we get good weights.  
    
    
#### In each epoch... 
    - The output of the training data is compared to actual data by.. 
    - Calculating the error using the loss function and then the weight is updated accordingly to that error. 


#### Adam Optimizer
Adam Optimizer updates network weights iteratively in training data by using both momentum and adaptive learning rate as opposed to using just a single learning rate for all the weights. So the learning rate changes during the training process. 

#### Review on Stochatic Gradient Descent
- A iterative algorithm that...
        
        - randomly picks 1 sample per step to calculate the slopes to predict values.
        -OR-
        - the algorithm starts from a random point on the function and travels down its slope in steps untill eventually it gets to the lowest point of that function. 

In [20]:
import pandas as pd
import torch

diabetes_df = pd.read_csv("diabetes copy2.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [41]:
len(diabetes_df)

768

In [31]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

# #Standardize
sc= StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)

In [32]:
print(X_train)
print(y_train)

[[ 0.93138344  2.0179454   0.78066953 ...  0.43148259 -0.37477883
   0.63212912]
 [ 0.63260632 -1.14861888  0.46538785 ... -0.1198324  -0.29416766
   0.71699246]
 [-0.56250219 -0.47692343 -0.2702694  ... -0.20958135  2.74517192
   0.03808578]
 ...
 [-0.86127931 -0.76479291  0.04501228 ...  0.76483585 -0.78380586
  -0.30136756]
 [ 0.63260632  2.20985838  1.2010451  ...  0.43148259 -0.60466993
   2.75371249]
 [ 0.03505207  0.73852549 -0.58555107 ... -0.33779414 -0.57779954
   0.29267578]]
[1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1
 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0
 0 0 1 0 0 0 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 1
 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1
 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0
 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 1
 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1

In [33]:
import torch.nn as nn
import torch.nn.functional as F #where the activation functions are

#create tensors = matrices 
X_train = torch.FloatTensor(X_train) 
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(X_train)

tensor([[ 0.9314,  2.0179,  0.7807,  ...,  0.4315, -0.3748,  0.6321],
        [ 0.6326, -1.1486,  0.4654,  ..., -0.1198, -0.2942,  0.7170],
        [-0.5625, -0.4769, -0.2703,  ..., -0.2096,  2.7452,  0.0381],
        ...,
        [-0.8613, -0.7648,  0.0450,  ...,  0.7648, -0.7838, -0.3014],
        [ 0.6326,  2.2099,  1.2010,  ...,  0.4315, -0.6047,  2.7537],
        [ 0.0351,  0.7385, -0.5856,  ..., -0.3378, -0.5778,  0.2927]])


In [34]:
#number of nodes (hidden1 and hidden2)(preceptons?)



#artificial neural network
class ANN_Model(nn.Module):
    def __init__(self, input_features=8,hidden1=20,hidden2=20,out_features=2):
        super().__init__() #super is a computed indirect reference. So, it isolates changes
        # and makes sure that children in the layers of multiple inheritence are calling
        #the right parents
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
        
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [35]:
torch.manual_seed(42)

#create instance of model
ann = ANN_Model()

In [36]:
#loss function
loss_function = nn.CrossEntropyLoss()

Try at least one other optimization function with the diabetes dataset shown in class. 

Adaptive Gradient Algorithm

- AdaGrad’s learning rate (for each parameter) helps increases the learning rate for sparser parameters. Thus, AdaGrad works well for thinly distributed gradients since the algorithm performs smaller updates.  

In [37]:
#optimizer
optimizer = torch.optim.Adagrad(ann.parameters(),lr=0.01)

In [38]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss}')
        
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() 
    optimizer.step() #perform one optimization step each epoch

Epoch number: 1 with loss: 0.647470235824585
Epoch number: 11 with loss: 0.5630926489830017
Epoch number: 21 with loss: 0.5061333775520325
Epoch number: 31 with loss: 0.47481316328048706
Epoch number: 41 with loss: 0.45581328868865967
Epoch number: 51 with loss: 0.4437910318374634
Epoch number: 61 with loss: 0.4347322881221771
Epoch number: 71 with loss: 0.4276331961154938
Epoch number: 81 with loss: 0.4215301275253296
Epoch number: 91 with loss: 0.41631075739860535
Epoch number: 101 with loss: 0.4113055169582367
Epoch number: 111 with loss: 0.40628811717033386
Epoch number: 121 with loss: 0.401469349861145
Epoch number: 131 with loss: 0.3970152735710144
Epoch number: 141 with loss: 0.392855167388916
Epoch number: 151 with loss: 0.3887157440185547
Epoch number: 161 with loss: 0.3847091495990753
Epoch number: 171 with loss: 0.3805328607559204
Epoch number: 181 with loss: 0.37636613845825195
Epoch number: 191 with loss: 0.37252601981163025
Epoch number: 201 with loss: 0.36923617124557495

How does the model perform with the new optimizer? Did it perform better or worse than Adam? Why do you think that is?

In [39]:
#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax()) 

In [40]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.76      0.81      0.78       150
           1       0.60      0.52      0.56        81

    accuracy                           0.71       231
   macro avg       0.68      0.67      0.67       231
weighted avg       0.70      0.71      0.70       231



Overall, this Adagrad optimizer performed a little better than the Adam optimizer. I think this is because the dataset is not too big so it works well with an optimizer that performs smaller updates. 

2. Write a function that lists and counts the number of divisors for an input value.
    
        Example 1:
        Input: 5
        Output: “There are 2 divisors: 1 and 5”
        
        Example 2:
        Input: 40
        Output: “There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40”

In [46]:
def divisor_count (x):
    empty_list = []
    if x ==1:
        print("There is 1 divisor only: 1")
    for y in range(1, x+1):
        if x % y == 0:
            empty_list.append(y)
    #print(f'There are {len(empty_list)} divisors: {empty_list}')
    print(f'There are {len(empty_list)} divisors: {str((empty_list)[ :-1])[1:-1]} and {empty_list[-1]}')

In [47]:
divisor_count (40)

There are 8 divisors: 1, 2, 4, 5, 8, 10, 20 and 40


In [49]:
divisor_count (5)

There are 2 divisors: 1 and 5
