### Lab 2.2: Perceptron Algorithm in PyTorch

In this lab you will again implement the perceptron algorithm, but this time using PyTorch.

In [1]:
!pip install torch



In [2]:
import numpy as np
import torch

PyTorch is very similar to NumPy in its basic functionality.  In PyTorch arrays are called tensors.

In [3]:
a = torch.tensor(5)
a

tensor(5)

In [4]:
b = torch.tensor(6)
a+b

tensor(11)

In [5]:
c = torch.zeros(3,5).float()
c

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

*A note on broadcasting:* You may have noticed in the previous lab that NumPy is particular about the sizes of the arrays in operations; PyTorch is the same way.

For example, if `A` has shape `(10,5)` and `b` has shape `(10,)`, then we can't compute `A*b`.  It wants the *last* dimensions to match, not the first ones.  So you would need to do either `A.T*b`.

In [6]:
A = np.random.normal(size=(10,5))
b = np.ones(10)

In [7]:
try:
    A*b
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (10,5) (10,) 


In [8]:
A.T*b

array([[-1.18855324,  1.28726858,  0.51256884,  0.51814703, -1.00162589,
         0.68321507, -1.56497073,  0.87358068,  0.98523351,  0.10082051],
       [ 0.28205935, -0.55162952,  1.18200814, -0.38159949,  0.77415902,
         0.9491022 , -0.64211142, -0.80010773,  0.18593863,  0.69395376],
       [ 1.11494288,  2.74076169, -0.3282237 , -0.17072238, -0.98394359,
         0.77188869, -0.26801037,  0.04752706,  0.5715008 , -1.94777957],
       [-0.48944808,  2.92772875,  1.61479923,  0.6851468 , -0.56832597,
        -1.27275893, -0.37358047, -0.98451901,  1.39416489,  1.43298344],
       [ 0.7054441 ,  0.14236854, -0.13088255, -0.88995339, -0.97969072,
         0.34576788, -0.96749   , -1.58297491,  0.86171779, -0.47094699]])

An alternative is to introduce an extra dimension of size one to $b$.  However, note that this produces the transposed result from before.

In [9]:
A*b[:,None]

array([[-1.18855324,  0.28205935,  1.11494288, -0.48944808,  0.7054441 ],
       [ 1.28726858, -0.55162952,  2.74076169,  2.92772875,  0.14236854],
       [ 0.51256884,  1.18200814, -0.3282237 ,  1.61479923, -0.13088255],
       [ 0.51814703, -0.38159949, -0.17072238,  0.6851468 , -0.88995339],
       [-1.00162589,  0.77415902, -0.98394359, -0.56832597, -0.97969072],
       [ 0.68321507,  0.9491022 ,  0.77188869, -1.27275893,  0.34576788],
       [-1.56497073, -0.64211142, -0.26801037, -0.37358047, -0.96749   ],
       [ 0.87358068, -0.80010773,  0.04752706, -0.98451901, -1.58297491],
       [ 0.98523351,  0.18593863,  0.5715008 ,  1.39416489,  0.86171779],
       [ 0.10082051,  0.69395376, -1.94777957,  1.43298344, -0.47094699]])

In [10]:
A*np.expand_dims(b,-1)

array([[-1.18855324,  0.28205935,  1.11494288, -0.48944808,  0.7054441 ],
       [ 1.28726858, -0.55162952,  2.74076169,  2.92772875,  0.14236854],
       [ 0.51256884,  1.18200814, -0.3282237 ,  1.61479923, -0.13088255],
       [ 0.51814703, -0.38159949, -0.17072238,  0.6851468 , -0.88995339],
       [-1.00162589,  0.77415902, -0.98394359, -0.56832597, -0.97969072],
       [ 0.68321507,  0.9491022 ,  0.77188869, -1.27275893,  0.34576788],
       [-1.56497073, -0.64211142, -0.26801037, -0.37358047, -0.96749   ],
       [ 0.87358068, -0.80010773,  0.04752706, -0.98451901, -1.58297491],
       [ 0.98523351,  0.18593863,  0.5715008 ,  1.39416489,  0.86171779],
       [ 0.10082051,  0.69395376, -1.94777957,  1.43298344, -0.47094699]])

In general, carefully check the sizes of all arrays in your code!

In [11]:
from palmerpenguins import load_penguins
from mlxtend.plotting import plot_decision_regions
from matplotlib import pyplot as plt

Here we loading and format the Palmer penguins dataset for binary classification.

In [12]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# tricky code to randomly shuffle the rows
df = df.sample(frac=1).reset_index(drop=True)

# select only two specices
df = df[(df['species']=='Adelie')|(df['species']=='Chinstrap')]

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert speces labels to -1 and 1
y = df['species'].map({'Adelie':-1,'Chinstrap':1}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [13]:
X -= np.mean(X, axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [14]:
X = torch.tensor(X).float()
y = torch.tensor(y).float()

# move X and y to the GPU if possible
if torch.cuda.is_available():
    print('tensors moved to GPU')
    X = X.to('cuda')
    y = y.to('cuda')

tensors moved to GPU


In [15]:
X

tensor([[ 6.0794e+00,  3.1953e+00],
        [ 2.0794e+00,  9.6953e+00],
        [ 1.0794e+00, -4.2047e+00],
        [-2.9206e+00, -6.1047e+00],
        [-1.9206e+00, -2.4047e+00],
        [-1.0921e+01, -5.5047e+00],
        [ 3.0794e+00, -6.5047e+00],
        [-5.9206e+00, -2.4047e+00],
        [-5.9206e+00, -6.8047e+00],
        [-1.9921e+01, -4.1047e+00],
        [-3.9206e+00, -2.5047e+00],
        [-6.9206e+00, -8.0047e+00],
        [ 4.0794e+00,  2.0953e+00],
        [-1.3921e+01, -8.9047e+00],
        [ 3.0794e+00, -1.7047e+00],
        [-7.9206e+00, -5.6047e+00],
        [ 6.0794e+00,  8.1953e+00],
        [ 4.0794e+00,  8.8953e+00],
        [-3.9206e+00, -1.9047e+00],
        [-5.9206e+00, -2.5047e+00],
        [ 9.0794e+00,  9.3953e+00],
        [ 1.6079e+01, -1.2047e+00],
        [-2.9206e+00, -5.1047e+00],
        [-1.9206e+00, -3.5047e+00],
        [ 7.9439e-02, -4.7047e+00],
        [-4.9206e+00, -5.3047e+00],
        [ 7.0794e+00, -3.4047e+00],
        [-3.9206e+00, -3.404

### Exercises

Your task is to again complete this class for the perceptron, with two changes from last time:
- the implementation should use PyTorch tensors, not NumPy arrays;
- `train_step` now accepts the entire dataset as input and should calculate the average gradient over all examples, rather than updating the weights one data point at a time.

In [16]:
pip install jupyterlab_flake8

Note: you may need to restart the kernel to use updated packages.


In [17]:
class Perceptron:
    def __init__(self, lr=1e-3):
        # store the learning rate
        self.lr = lr

        # initialize the weights to small, normally-distributed values
        self.w = torch.normal(mean=0, std=0.01, size=(2,))  # shape (2,)

        # initialize the bias to zero
        self.b = torch.zeros(1)
        
        if torch.cuda.is_available():
            self.w = self.w.to('cuda')
            self.b = self.b.to('cuda')
                

    def train_step(self, X:torch.Tensor, y:torch.Tensor) -> None:
        """ Apply the first update rule shown in lecture.
            Arguments:
             x: data matrix of shape (N,2)
             y: labels of shape (N,) 
        """
        z = X@self.w + self.b  # shape (N,)
        
        self.w += self.lr * torch.mean((y - z).unsqueeze(-1) * X, dim=0)
        self.b += self.lr * torch.mean(y - z)
    
    def predict(self,X:torch.Tensor) -> torch.Tensor:
        """ Calculate model prediction for all data points.
            Arguments:
             X: data matrix of shape (N,3)   
            Returns:
             Predicted labels (-1 or 1) of shape (N,)
        """
        # WRITE CODE HERE
        z = X@self.w + self.b
        return torch.where(z > 0, 1, -1) 
        
    def score(self,X:torch.Tensor,y:torch.Tensor) -> torch.Tensor:
        """ Calculate model accuracy
            Arguments:
             X: data matrix of shape (N,3)   
             y: labels of shape (N,)
            Returns:
             Accuracy score
        """
        # WRITE CODE HERE
        return torch.mean((self.predict(X) == y).float())

Run the following code to train the model and print out the accuracy at each step.

In [18]:
lr = 1e-3
epochs = 100
model = Perceptron(lr)
for i in range(epochs):
    model.train_step(X,y)
    print(f'step {i}: {model.score(X,y)}')

step 0: 0.84112149477005
step 1: 0.84112149477005
step 2: 0.84112149477005
step 3: 0.84112149477005
step 4: 0.84112149477005
step 5: 0.8457943797111511
step 6: 0.8457943797111511
step 7: 0.8457943797111511
step 8: 0.8504672646522522
step 9: 0.8504672646522522
step 10: 0.8504672646522522
step 11: 0.8504672646522522
step 12: 0.8551401495933533
step 13: 0.8551401495933533
step 14: 0.8551401495933533
step 15: 0.8551401495933533
step 16: 0.8551401495933533
step 17: 0.8551401495933533
step 18: 0.8551401495933533
step 19: 0.8551401495933533
step 20: 0.8551401495933533
step 21: 0.8551401495933533
step 22: 0.8551401495933533
step 23: 0.8598130345344543
step 24: 0.8598130345344543
step 25: 0.8598130345344543
step 26: 0.8644859790802002
step 27: 0.8644859790802002
step 28: 0.8644859790802002
step 29: 0.8644859790802002
step 30: 0.8691588640213013
step 31: 0.8691588640213013
step 32: 0.8691588640213013
step 33: 0.8785046339035034
step 34: 0.8831775188446045
step 35: 0.8831775188446045
step 36: 0.8

Run the training multiple times.  Is the training the same each time, or does it vary?  Why?

The training is different each time because the weights are initialized randomly

Play with the learning rate and number of epochs to find the best setting.

In [19]:
import itertools

lr_list = [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5]
epochs_list = [25, 50, 100, 200, 500, 1000]
results = {}

for lr, epochs in itertools.product(lr_list, epochs_list):
    model = Perceptron(lr)
    for i in range(epochs):
        model.train_step(X,y)

    accuracy = model.score(X,y)
    results[accuracy] = f'LR: {lr}, epochs: {epochs}, accuracy: {accuracy}'

results = dict(reversed(sorted(results.items())))
for result in results.values():
    print(result)

LR: 0.01, epochs: 500, accuracy: 0.9579439163208008
LR: 0.01, epochs: 200, accuracy: 0.9579439163208008
LR: 0.01, epochs: 1000, accuracy: 0.9579439163208008
LR: 0.01, epochs: 50, accuracy: 0.9532709717750549
LR: 0.01, epochs: 100, accuracy: 0.9532709717750549
LR: 0.001, epochs: 500, accuracy: 0.9532709717750549
LR: 0.001, epochs: 1000, accuracy: 0.9532709717750549
LR: 0.01, epochs: 25, accuracy: 0.9439252018928528
LR: 0.001, epochs: 200, accuracy: 0.9392523169517517
LR: 0.0001, epochs: 1000, accuracy: 0.9345794320106506
LR: 0.001, epochs: 100, accuracy: 0.9252336025238037
LR: 0.001, epochs: 50, accuracy: 0.8831775188446045
LR: 0.0001, epochs: 500, accuracy: 0.8831775188446045
LR: 1e-05, epochs: 50, accuracy: 0.8644859790802002
LR: 1e-05, epochs: 1000, accuracy: 0.8598130345344543
LR: 1e-05, epochs: 100, accuracy: 0.8551401495933533
LR: 0.001, epochs: 25, accuracy: 0.8551401495933533
LR: 0.0001, epochs: 200, accuracy: 0.8551401495933533
LR: 1e-05, epochs: 200, accuracy: 0.83644855022430

Best result was with 500 epochs and a learning rate of 0.01
