# Shapes Used in Common Tasks
I am always confused about what shape to output and what loss to use for common tasks like binary classification, multiclass classification, and regression. This notebook provides recipes for these common tasks.

In [1]:
import torch as t

## Binary Classification
Most datasets will have a single scalar, either $0$ or $1$, as the target variable. This means that after being batched by the data loader, the targets will have shape `torch.Size(batch_size)`, i.e., a single "row" vector. For this reason my usual way is to ensure that the model also outputs a row vector.

For the loss function use the `BCEWithLogitsLoss` which takes as inputs the logits ($h^{(i)}$) and the targets ($y^{(i)}$) and then calculates the negative log loss by first converting the logits to probabilities by passing them through the sigmoid function.
$$
p^{(i)} = \frac{1}{1 + e^{-h^{(i)}}} \\
-\mathcal L^{(i)} = - \left[ y^{(i)} log(p^{(i)}) + (1 - y^{(i)}) log(1 - p^{(i)}) \right] \\
$$

This means that my model does **not** have to output probabilities, so no need for the sigmoid activation on the final single unit. One weirdness about the way PyTorch has implemented this loss function is that it needs both the probabilities and targets as float values, even though the targets are clearly integers. For this reason I have to convert the targets to floats before calling this loss function.

### In Summary
  * Ensure that the model is **not** outputting a probability.
  * Squeeze the output tensor along the 1st dimension before returning.
  * Remember to cast the target values to float before calling the BCE loss function.

In [18]:
class BinaryClassifier(t.nn.Module):
    def __init__(self):
        super().__init__()
        self.model = t.nn.Sequential(
            t.nn.Linear(7, 8),
            t.nn.ReLU(),
            t.nn.Linear(8, 1)
        )

    def forward(self, inputs):
        outputs = self.model(inputs)
        return outputs.squeeze(dim=1)

In [21]:

batch_size = 5
inputs = t.randn(batch_size, 7)
targets = t.randint(0, 2, (batch_size,))
print(inputs.shape, targets.shape)
print(targets)


torch.Size([5, 7]) torch.Size([5])
tensor([0, 1, 0, 1, 1])


In [22]:
model = BinaryClassifier()
outputs = model(inputs)
print(outputs.shape)
outputs

torch.Size([5])


tensor([0.5676, 0.2446, 0.0893, 0.2848, 0.8677], grad_fn=<SqueezeBackward1>)

In [23]:
bce_loss = t.nn.BCEWithLogitsLoss()
bce_loss(outputs, targets.to(t.float32))

tensor(0.6491, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)

## Multiclass Classification
Most datasets will output a single scalar either $0, 1, \cdots, c-1$ where $c$ is the number of classes. The model should output a row vector of values indicating the likelihood of each class. 

For the loss function use `CrossEntropyLoss`. This is similar to the `BCEWithLogitsLoss` in that it accepts the logits ($\mathbf h^{(i)}$) as its input. It will convert the row of logits into probabilities using the softmax function and then calculate the negative log likelihood for each instance.

$$
p_k = \frac{e^{h_k}}{\sum_{j=1}^c e^{h_j}} \\
-\mathcal L = \sum_{k=1}^c y_k log(p_k)
$$

### In Summary
  * Ensure that the model is **not** outputting probability distributions across the classes.

In [24]:
class MulticlassClassifier(t.nn.Module):
    def __init__(self):
        super().__init__()
        self.model = t.nn.Sequential(
            t.nn.Linear(7, 8),
            t.nn.ReLU(),
            t.nn.Linear(8, 3)
        )

    def forward(self, inputs):
        return self.model(inputs)

In [25]:
batch_size = 5
inputs = t.randn(batch_size, 7)
targets = t.randint(0, 3, (batch_size,))
print(inputs.shape, targets.shape)
print(targets)

torch.Size([5, 7]) torch.Size([5])
tensor([2, 2, 1, 2, 0])


In [27]:
model = MulticlassClassifier()
outputs = model(inputs)
print(outputs.shape)
outputs

torch.Size([5, 3])


tensor([[-0.1355, -0.1946,  0.1838],
        [-0.3887, -0.2140,  0.0636],
        [-0.1810, -0.3151,  0.1867],
        [-0.2574, -0.1620,  0.2319],
        [-0.1269, -0.0753,  0.3005]], grad_fn=<AddmmBackward0>)

In [28]:
ce_loss = t.nn.CrossEntropyLoss()
ce_loss(outputs, targets)

tensor(1.0383, grad_fn=<NllLossBackward0>)

## Regression
Most datasets will have a single float scalar as the target. This means that after being batched by a dataloader, the target will be a single row vector `torch.Size(batch_size)`. This is just like the binary classification case. And I'll use the same method of ensuring that my model output is also a single row vector.

For the loss function use the `MSELoss` which accepts two row vectors - both floats. Unlike the binary classification case, this is not a problem because the targets are already floats.

In [37]:
class Regressor(t.nn.Module):
    def __init__(self):
        super().__init__()
        self.model = t.nn.Sequential(
            t.nn.Linear(7, 8),
            t.nn.ReLU(),
            t.nn.Linear(8, 1)
        )

    def forward(self, inputs):
        outputs = self.model(inputs)
        return outputs.squeeze(dim=1)

In [38]:
batch_size = 5
inputs = t.randn(batch_size, 7)
targets = t.rand((batch_size,))
print(inputs.shape, targets.shape)
print(targets)

torch.Size([5, 7]) torch.Size([5])
tensor([0.5479, 0.9801, 0.7519, 0.5885, 0.8585])


In [39]:
model = Regressor()
outputs = model(inputs)
print(outputs.shape)
outputs

torch.Size([5])


tensor([-0.0138, -0.3122, -0.2345, -0.1548, -0.1343],
       grad_fn=<SqueezeBackward1>)

In [40]:
mse_loss = t.nn.MSELoss()
mse_loss(outputs, targets)

tensor(0.8992, grad_fn=<MseLossBackward0>)