<a href="https://colab.research.google.com/github/dmtrung14/pyTorch_fundamentals/blob/main/PyTorch_Fundamentals_Classification_Problems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch in Machine Learning Classification

Types of classification problems: 
* Binary classification: Spam filtering
* Multiclass: food identifier
* Multilabel classification: what tags should this wikipedia have?

Let's not mistaken *Multiclass* vs *multilabel* classification, the first one define which class among the labels should this object be classified as, and the later one which (multiple labels) should be applied. 

As we'll see, we will use `softmax` for the first one, and other algos for the latter


## Make classificaion data and get it ready

In [3]:
import sklearn

In [4]:
from sklearn.datasets import make_circles

#Make 1000 samples
n_samples = 1000

x,y = make_circles(n_samples, noise = 0.03, random_state= 42)

len(x), len(y), x[:5], y[:5]

(1000,
 1000,
 array([[ 0.75424625,  0.23148074],
        [-0.75615888,  0.15325888],
        [-0.81539193,  0.17328203],
        [-0.39373073,  0.69288277],
        [ 0.44220765, -0.89672343]]),
 array([1, 1, 1, 1, 0]))

In [5]:
from IPython.lib.display import YouTubeVideo
# Make DataFram of circle data
import pandas as pd
circles = pd.DataFrame({"X1": x[:,0], 
                        "X2": x[:,1],
                        "label": y})
circles.head(10)

Unnamed: 0,X1,X2,label
0,0.754246,0.231481,1
1,-0.756159,0.153259,1
2,-0.815392,0.173282,1
3,-0.393731,0.692883,1
4,0.442208,-0.896723,0
5,-0.479646,0.676435,1
6,-0.013648,0.803349,1
7,0.771513,0.14776,1
8,-0.169322,-0.793456,1
9,-0.121486,1.021509,0


### Check input and output shapes


In [6]:
x.shape,y.shape

((1000, 2), (1000,))

### Turn data into tensors and create train and test splits

In [7]:
import torch

In [8]:
x = torch.from_numpy(x).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

In [9]:
#split data into training and test split
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state = 42)
len(x_train), len(x_test), len(y_train), len(y_test)

(800, 200, 800, 200)

## Building a model

Let's build a model to classify our points into 2 circles. 
To do so, we want to:
1. set up device agnostic code so our code will run on an acceleartor (GPU) if there is one.
2. Construct a model (by subclassing `nn.Module`)
3. Define a loss function and an optimizer
4. Create a training and test loop


In [10]:
import torch
from torch import nn

## Make device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

Now we've setup device agnostic code, let's create a model that:

1. Subclass ``nn.module`` (almost all models in PyTOrch subclass this module)
2. Create 2 `nn.Linear` layers that are capable of handling the shapes of our data
3. Define a `forward()` method that outlines the forward pass for forward computation of the model
4. Instantiate an instance of our model class and send it to the target device.


In [11]:
# Construct a model that subclasses nn.Module

class CircleModelV0(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer_1 = nn.Linear(in_features = 2, out_features =5) #Takes in 2 features and output 5 features
    self.layer_2 = nn.Linear(in_features = 5, out_features = 1) #Takes in 5 features and output 1 result
  
  #Define a forward() method that outlines the forward pass
  def forward(self, x: torch.tensor) -> torch.tensor:
    return self.layer_2(self.layer_1(x)) #x -> layer 1 -> layer 2 -> output

# Instantiate an instance of our model class and send it to the target device

model_0 = CircleModelV0().to(device)
list(model_0.parameters())


[Parameter containing:
 tensor([[ 0.2558, -0.2993],
         [-0.6544, -0.4216],
         [ 0.5142,  0.4256],
         [ 0.6886,  0.6829],
         [-0.1417,  0.0173]], device='cuda:0', requires_grad=True),
 Parameter containing:
 tensor([-0.5412,  0.0892, -0.0575,  0.1808, -0.5940], device='cuda:0',
        requires_grad=True),
 Parameter containing:
 tensor([[-0.0765,  0.1801,  0.0993,  0.0444,  0.0437]], device='cuda:0',
        requires_grad=True),
 Parameter containing:
 tensor([-0.1903], device='cuda:0', requires_grad=True)]

In [12]:
device

'cuda'

##``nn.Sequential`` to replicate the model in an easier ưay



In [13]:
model_0 = nn.Sequential(
    nn.Linear(in_features = 2, out_features = 5),
    nn.Linear(in_features = 5, out_features = 1)
).to(device)

model_0, model_0.state_dict()

(Sequential(
   (0): Linear(in_features=2, out_features=5, bias=True)
   (1): Linear(in_features=5, out_features=1, bias=True)
 ),
 OrderedDict([('0.weight',
               tensor([[-0.1123, -0.4245],
                       [-0.3988,  0.1052],
                       [ 0.5334,  0.3251],
                       [ 0.3103, -0.4958],
                       [ 0.0657,  0.2498]], device='cuda:0')),
              ('0.bias',
               tensor([ 0.0723,  0.6632, -0.5456,  0.0171,  0.6458], device='cuda:0')),
              ('1.weight',
               tensor([[ 0.0251, -0.2089,  0.1239, -0.0135, -0.3588]], device='cuda:0')),
              ('1.bias', tensor([0.3209], device='cuda:0'))]))

In [14]:
#Let's make some prediction with our new model:
untrained_preds = model_0(x_test.to(device))
x_test[:10], untrained_preds[:10], torch.round(untrained_preds[:10])

(tensor([[-0.3752,  0.6827],
         [ 0.0154,  0.9600],
         [-0.7028, -0.3147],
         [-0.2853,  0.9664],
         [ 0.4024, -0.7438],
         [ 0.6323, -0.5711],
         [ 0.8561,  0.5499],
         [ 1.0034,  0.1903],
         [-0.7489, -0.2951],
         [ 0.0538,  0.9739]]),
 tensor([[-0.2114],
         [-0.1859],
         [-0.1753],
         [-0.2221],
         [-0.0116],
         [ 0.0027],
         [-0.0551],
         [-0.0105],
         [-0.1822],
         [-0.1824]], device='cuda:0', grad_fn=<SliceBackward0>),
 tensor([[-0.],
         [-0.],
         [-0.],
         [-0.],
         [-0.],
         [0.],
         [-0.],
         [-0.],
         [-0.],
         [-0.]], device='cuda:0', grad_fn=<RoundBackward0>))

### Set up loss function and optimizer
Which loss function or optimizer should you use?

For example, for regression you might want MAE or MSE

For classification you might want binary cross entropy or categorical cross entropy

* For the loss function we are going to use ``torch.nn.BCEWithLogitsLoss()``.

In [15]:
loss_fn = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(params = model_0.parameters(), lr = 0.1)

In [16]:
#Calculate accuracy - out of 100 example, what percentage does our model get right?
def accuracy_fn(y_true, y_pred):
  correct = torch.eq(y_true, y_pred).sum().item()
  acc = (correct/len(y_pred))*100
  return acc

## 3. Train our Model

To train our model, we need to build a training loop, which includes:

1. Forward pass
2. Calculate the loss
3. Optimizer zero gra
4. Loss backward (backpropagation)
5. Optimizer step (gradient descent)

### 3.1 Going from raw logits -> prediction probabilities -> prediction labels

Our model outpus are going to be raw logits


We can convert these logits into prediction probabilityies by passing them to some kind of activation function (e.g. sigmoid for binary classification and softwmax for mutlticlass classification)

Then we can convert our model's prediction probabilities to **prediction labels** by either rounding them or taking the `argmax()`

In [17]:
#view the first 5 outputs of the forward pass on the test data
model_0.eval()
with torch.inference_mode():
  y_logits = model_0(x_test.to(device))[:5]
y_logits

tensor([[-0.2114],
        [-0.1859],
        [-0.1753],
        [-0.2221],
        [-0.0116]], device='cuda:0')

In [19]:
# Use the sigmoid activation function on our model logits to turn them into prediction
y_pred_probs = torch.sigmoid(y_logits)
y_pred_probs, torch.round(y_pred_probs)

(tensor([[0.4473],
         [0.4537],
         [0.4563],
         [0.4447],
         [0.4971]], device='cuda:0'),
 tensor([[0.],
         [0.],
         [0.],
         [0.],
         [0.]], device='cuda:0'))

For our prediction probability values, we need to perform a range-style rounding on them:
* `y_pred_probs` >= 0.5, `y =1`, (class 1)
* `y_pred_probs` < 0.5, `y = 0`, (class 0)


In [21]:
#Find the predicted labels
y_preds = torch.round(y_pred_probs)

#In full
y_pred_labels = torch.round(torch.sigmoid(model_0(x_test.to(device))[:5]))

#Check for equality
print(torch.eq(y_preds.squeeze(), y_pred_labels.squeeze()))

y_preds.squeeze()

tensor([True, True, True, True, True], device='cuda:0')


tensor([0., 0., 0., 0., 0.], device='cuda:0')

In [22]:
y_test[:5]

tensor([1., 0., 1., 0., 1.])

### 3.2 Building a training and testing loop of our own
