1. Look up the Adam optimization functions in PyTorch
https://pytorch.org/docs/stable/optim.html . How does it work? Try at least one other
optimization function with the diabetes dataset shown in class. How does the model
perform with the new optimizer? Did it perform better or worse than Adam? Why do you
think that is?

Adam optimization is an extension to Stochastic gradient decent and can be used in place of classical stochastic gradient descent to update network weights more efficiently. Adam optimzer combines the advantages of two other extensions of stochastic gradient descent:

- Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter learning rate that improves performance on problems with sparse gradients
    
- Root Mean Square Propagation (RMSProp) that also maintains per-parameter learning rates that are adapted based on the average of recent magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the algorithm does well on online and non-stationary problems (e.g. noisy).

To put it simply, Adam uses Momentum and Adaptive Learning Rates to converge faster.


I chose AdamW as the new optimizer. AdamW performed similary as Adam with accuracy of of 0.69. The precision scores are similar for both class. AdamW optimizer produced recall score  of 0.56 for patients with diabetes instead of 0.49 but the recall for patients without diabetes. AdamW yields better training loss and that the models generalize much better than models trained with Adam allowing the new version to compete with stochastic gradient descent with momentum.abetes remained the same. Althrough both have same accuracy, AdamW gaves us better recall score for patients with diabetes which is the main metric we are examaing. 
AdamW yields better training loss and that the models generalize much better than models trained with Adam allowing the new version to compete with stochastic gradient descent with momentum.



In [43]:
import pandas as pd
import torch

diabetes_df = pd.read_csv("diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [44]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42, stratify=y)

# #Standardize
sc= StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)
# print(X_train)
# print(y_train)

In [46]:
#artificial neural network
class ANN_Model(nn.Module):
    def __init__(self, input_features=8,hidden1=20,hidden2=20,out_features=2):
        super().__init__() #super is a computed indirect reference. So, it isolates changes
        # and makes sure that children in the layers of multiple inheritence are calling
        #the right parents
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
        
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [47]:
torch.manual_seed(42)

#create instance of model
ann = ANN_Model()

In [56]:
#loss function
loss_function = nn.CrossEntropyLoss()

#optimizer
optimizer = torch.optim.AdamW(ann.parameters(),lr=0.01)

In [57]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = ann.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss}')
        
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() 
    optimizer.step() #perform one optimization step each epoch

Epoch number: 1 with loss: 0.512274444103241
Epoch number: 11 with loss: 0.43990558385849
Epoch number: 21 with loss: 0.4073721170425415
Epoch number: 31 with loss: 0.3721182644367218
Epoch number: 41 with loss: 0.3371720612049103
Epoch number: 51 with loss: 0.3019469380378723
Epoch number: 61 with loss: 0.27063336968421936
Epoch number: 71 with loss: 0.24318458139896393
Epoch number: 81 with loss: 0.2156248688697815
Epoch number: 91 with loss: 0.18953929841518402
Epoch number: 101 with loss: 0.16915617883205414
Epoch number: 111 with loss: 0.1481829434633255
Epoch number: 121 with loss: 0.1299981027841568
Epoch number: 131 with loss: 0.11233701556921005
Epoch number: 141 with loss: 0.09761093556880951
Epoch number: 151 with loss: 0.08329033106565475
Epoch number: 161 with loss: 0.0718306377530098
Epoch number: 171 with loss: 0.06095889210700989
Epoch number: 181 with loss: 0.051846738904714584
Epoch number: 191 with loss: 0.04463285580277443
Epoch number: 201 with loss: 0.037971783429

In [58]:
#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = ann(data)
        y_pred.append(prediction.argmax()) 

In [59]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.76      0.76      0.76       150
           1       0.56      0.56      0.56        81

    accuracy                           0.69       231
   macro avg       0.66      0.66      0.66       231
weighted avg       0.69      0.69      0.69       231



2. Write a function that lists and counts the number of divisors for an input value.

In [42]:
def div_count(num):
    div = []
    
    for i in range(1,num+1):
        if num%i==0:
            div.append(i)
            
    before_and = ', '.join(map(str, div[0:-1]))
    last_value = str(div[-1])
    
    print("There are " + str(len(div)) + ": " + before_and + " and " + last_value)
div_count(40)
div_count(5)

There are 8: 1, 2, 4, 5, 8, 10, 20 and 40
There are 2: 1 and 5
