### Lab 2.2: Perceptron Algorithm in PyTorch

In this lab you will again implement the perceptron algorithm, but this time using PyTorch.

In [1]:
!pip install torch

Defaulting to user installation because normal site-packages is not writeable
Collecting torch
  Downloading torch-2.5.1-cp310-cp310-manylinux1_x86_64.whl (906.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m906.4/906.4 MB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:06[0m
[?25hCollecting triton==3.1.0
  Downloading triton-3.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.5/209.5 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting networkx
  Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [3

In [2]:
import numpy as np
import torch

PyTorch is very similar to NumPy in its basic functionality.  In PyTorch arrays are called tensors.

In [3]:
a = torch.tensor(5)
a

tensor(5)

In [4]:
b = torch.tensor(6)
a+b

tensor(11)

In [5]:
c = torch.zeros(3,5).float()
c

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

*A note on broadcasting:* You may have noticed in the previous lab that NumPy is particular about the sizes of the arrays in operations; PyTorch is the same way.

For example, if `A` has shape `(10,5)` and `b` has shape `(10,)`, then we can't compute `A*b`.  It wants the *last* dimensions to match, not the first ones.  So you would need to do either `A.T*b`.

In [6]:
A = np.random.normal(size=(10,5))
b = np.ones(10)

In [7]:
try:
    A*b
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (10,5) (10,) 


In [8]:
A.T*b

array([[ 0.38847881, -0.5626746 ,  1.25907934,  0.64232121, -0.1277998 ,
        -0.13425051, -0.07241136,  0.28927276, -1.45624085, -2.0434823 ],
       [-1.07647421, -0.94679011, -0.64089027, -0.97784175, -1.10750976,
        -0.56736301, -0.38424279, -0.41216369,  0.46051863, -0.63099655],
       [ 0.56850151,  0.73189845,  1.397011  , -0.85433539,  0.88051217,
         0.02515673, -1.31779655,  1.12472672, -0.46994546,  0.35280928],
       [-0.3034394 ,  0.2922913 ,  1.59458565,  0.6403556 , -1.27600199,
         1.58219226, -1.879448  ,  1.25675627, -0.86391803, -0.3213449 ],
       [-1.17705543,  0.73517856, -0.03265319,  2.17832456, -1.67149913,
         1.24812458,  0.64249916, -0.86899086, -0.8901553 ,  0.96526974]])

An alternative is to introduce an extra dimension of size one to $b$.  However, note that this produces the transposed result from before.

In [9]:
A*b[:,None]

array([[ 0.38847881, -1.07647421,  0.56850151, -0.3034394 , -1.17705543],
       [-0.5626746 , -0.94679011,  0.73189845,  0.2922913 ,  0.73517856],
       [ 1.25907934, -0.64089027,  1.397011  ,  1.59458565, -0.03265319],
       [ 0.64232121, -0.97784175, -0.85433539,  0.6403556 ,  2.17832456],
       [-0.1277998 , -1.10750976,  0.88051217, -1.27600199, -1.67149913],
       [-0.13425051, -0.56736301,  0.02515673,  1.58219226,  1.24812458],
       [-0.07241136, -0.38424279, -1.31779655, -1.879448  ,  0.64249916],
       [ 0.28927276, -0.41216369,  1.12472672,  1.25675627, -0.86899086],
       [-1.45624085,  0.46051863, -0.46994546, -0.86391803, -0.8901553 ],
       [-2.0434823 , -0.63099655,  0.35280928, -0.3213449 ,  0.96526974]])

In [10]:
A*np.expand_dims(b,-1)

array([[ 0.38847881, -1.07647421,  0.56850151, -0.3034394 , -1.17705543],
       [-0.5626746 , -0.94679011,  0.73189845,  0.2922913 ,  0.73517856],
       [ 1.25907934, -0.64089027,  1.397011  ,  1.59458565, -0.03265319],
       [ 0.64232121, -0.97784175, -0.85433539,  0.6403556 ,  2.17832456],
       [-0.1277998 , -1.10750976,  0.88051217, -1.27600199, -1.67149913],
       [-0.13425051, -0.56736301,  0.02515673,  1.58219226,  1.24812458],
       [-0.07241136, -0.38424279, -1.31779655, -1.879448  ,  0.64249916],
       [ 0.28927276, -0.41216369,  1.12472672,  1.25675627, -0.86899086],
       [-1.45624085,  0.46051863, -0.46994546, -0.86391803, -0.8901553 ],
       [-2.0434823 , -0.63099655,  0.35280928, -0.3213449 ,  0.96526974]])

In general, carefully check the sizes of all arrays in your code!

In [11]:
from palmerpenguins import load_penguins
from mlxtend.plotting import plot_decision_regions
from matplotlib import pyplot as plt

Here we loading and format the Palmer penguins dataset for binary classification.

In [12]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# tricky code to randomly shuffle the rows
df = df.sample(frac=1).reset_index(drop=True)

# select only two specices
df = df[(df['species']=='Adelie')|(df['species']=='Chinstrap')]

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert speces labels to -1 and 1
y = df['species'].map({'Adelie':-1,'Chinstrap':1}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [13]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [14]:
X = torch.tensor(X).float()
y = torch.tensor(y).float()

In [None]:
X

### Exercises

Your task is to again complete this class for the perceptron, with two changes from last time:
- the implementation should use PyTorch tensors, not NumPy arrays;
- `train_step` now accepts the entire dataset as input and should calculate the average gradient over all examples, rather than updating the weights one data point at a time.

In [51]:
class Perceptron:
    def __init__(self,lr=1e-3):
        # store the learning rate
        self.lr = lr

        # initialize the weights to small, normally-distributed values
        self.w = torch.normal(mean=0, std=0.01, size=(2,))

        # initialize the bias to zero
        self.b = torch.zeros(1)

    def train_step(self,X:torch.Tensor,y:torch.Tensor) -> None:
        """ Apply the first update rule shown in lecture.
            Arguments:
             x: data matrix of shape (N,3)
             y: labels of shape (N,) 
        """
        # WRITE CODE HERE
        # update rule: w' = w + syx
        z = X @ self.w + self.b
        
        misclassified = y * z <= 0
        
        self.w += self.lr * torch.sum(misclassified.unsqueeze(1) * y.unsqueeze(1) * X, dim=0)
        self.b += self.lr * torch.sum(misclassified * y)


    
    def predict(self,X:torch.Tensor) -> torch.Tensor:
        """ Calculate model prediction for all data points.
            Arguments:
             X: data matrix of shape (N,3)   
            Returns:
             Predicted labels (-1 or 1) of shape (N,)
        """
        # WRITE CODE HERE
        z = X @ self.w + self.b  
        return torch.where(z>0,1,-1)
    
    def score(self,X:torch.Tensor,y:torch.Tensor) -> torch.Tensor:
        """ Calculate model accuracy
            Arguments:
             X: data matrix of shape (N,3)   
             y: labels of shape (N,)
            Returns:
             Accuracy score
        """
        # WRITE CODE HERE
        pred = self.predict(X)
        return torch.mean((pred == y).float())

Run the following code to train the model and print out the accuracy at each step.

In [52]:
lr = 1e-3
epochs = 100
model = Perceptron(lr)
for i in range(epochs):
    model.train_step(X,y)
    print(f'step {i}: {model.score(X,y)}')

step 0: 0.6915887594223022
step 1: 0.6168224215507507
step 2: 0.7850467562675476
step 3: 0.855140209197998
step 4: 0.9345794320106506
step 5: 0.9392523169517517
step 6: 0.9252336621284485
step 7: 0.9345794320106506
step 8: 0.9345794320106506
step 9: 0.9392523169517517
step 10: 0.9439252614974976
step 11: 0.9439252614974976
step 12: 0.9485981464385986
step 13: 0.9439252614974976
step 14: 0.9485981464385986
step 15: 0.9485981464385986
step 16: 0.9485981464385986
step 17: 0.9532710313796997
step 18: 0.9626168012619019
step 19: 0.9626168012619019
step 20: 0.9626168012619019
step 21: 0.9626168012619019
step 22: 0.9532710313796997
step 23: 0.9532710313796997
step 24: 0.9532710313796997
step 25: 0.9579439163208008
step 26: 0.9579439163208008
step 27: 0.9672897458076477
step 28: 0.9579439163208008
step 29: 0.9626168012619019
step 30: 0.9626168012619019
step 31: 0.9626168012619019
step 32: 0.9626168012619019
step 33: 0.9626168012619019
step 34: 0.9579439163208008
step 35: 0.9579439163208008
ste

Run the training multiple times.  Is the training the same each time, or does it vary?  Why?

The training results vary because of random intialization of weights, but it is around 

Play with the learning rate and number of epochs to find the best setting.