### Deep Learning with PyTorch
Tensors are similar to NumPy arrays but have unique features. Convert a Python list into a Torch tensor with `torch.tensor`.

In [2]:
import torch

temperatures = [[10, 20], [30, 40], [50, 60]]

temperature_tensor = torch.tensor(temperatures)
print(temperature_tensor)

tensor([[10, 20],
        [30, 40],
        [50, 60]])


Tensors have a `shape` and `dtype`, and can be added elementwise.

In [3]:
addend_tensor = torch.tensor([[10, 20], [10, 20], [10, 20]])
sums_tensor = temperature_tensor + addend_tensor
print("Sums:", sums_tensor)
print("Sums shape:", sums_tensor.shape)
print("Sums shape:", sums_tensor.dtype)

Sums: tensor([[20, 40],
        [40, 60],
        [60, 80]])
Sums shape: torch.Size([3, 2])
Sums shape: torch.int64


#### Linear layer
`Linear` from `torch.nn` takes a tensor as input and outputs a tensor whose sizes correspond to `in_features` and `out_features`. The weights and biases involved in the calculation between are initialized randomly.

In [4]:
import torch.nn as nn

input_tensor = torch.tensor([0.1, -0.1, 0.8])

linear_layer = nn.Linear(
    in_features=3,
    out_features=2
)
output = linear_layer(input_tensor)
print(output)
print(linear_layer.weight)
print(linear_layer.bias)

tensor([-0.7611, -0.2199], grad_fn=<ViewBackward0>)
Parameter containing:
tensor([[ 0.3318,  0.1210, -0.4353],
        [ 0.5348,  0.5282, -0.5684]], requires_grad=True)
Parameter containing:
tensor([-0.4340,  0.2342], requires_grad=True)


### Sequential layer
`Sequential` from `torch.nn` can stack layers such as `Linear` to pass data through the layers in sequence. Layers bookended by the input and output are called hidden layers.

A neuron in a linear layer has $n+1$ parameters, with $n$ counting the weight for each input from the previous layer and $1$ accounting the neuron's bias.

More hidden layers = more parameters = higher model capacity.

In [5]:
sequential_model = nn.Sequential(
    nn.Linear(3, 2),
    nn.Linear(2, 8),
    nn.Linear(8, 3)
)

Acquire the model's parameters using `parameters()`, which outputs a container of tensors containing each layer's weights and each layer's biases.

`numel()` outputs the number of elements in a tensor.

In [6]:
count = 0
for parameter in sequential_model.parameters():
    print(parameter)
    count += parameter.numel()
print(count)

Parameter containing:
tensor([[ 0.2044, -0.3502,  0.4553],
        [ 0.5028, -0.4500, -0.4404]], requires_grad=True)
Parameter containing:
tensor([-0.2631,  0.5386], requires_grad=True)
Parameter containing:
tensor([[-0.2358,  0.3472],
        [ 0.5581, -0.5005],
        [ 0.6865,  0.6474],
        [-0.0606, -0.5850],
        [ 0.4166, -0.4181],
        [-0.6183,  0.1080],
        [-0.1166, -0.1088],
        [ 0.1808,  0.6320]], requires_grad=True)
Parameter containing:
tensor([-0.6204, -0.4627,  0.0551,  0.1314, -0.2035,  0.5163, -0.3902,  0.7050],
       requires_grad=True)
Parameter containing:
tensor([[ 0.1125,  0.0053,  0.2274,  0.0758, -0.2285, -0.0329,  0.0310, -0.3375],
        [-0.0017,  0.0559,  0.3066,  0.2623,  0.0890,  0.0215,  0.1443,  0.1948],
        [ 0.3497,  0.0680,  0.0157,  0.3237,  0.1773,  0.3146,  0.0838, -0.3032]],
       requires_grad=True)
Parameter containing:
tensor([-0.1649,  0.3178, -0.2053], requires_grad=True)
59


#### Sigmoid function
Type of function that takes a real-valued input (specifically a float) and outputs a single value between 0 and 1. Used for binary classification, and can be placed as the final activation of a network of linear layers after which a forward pass determines classification-or-not by a threshold (for instance 0.5).

Equivalent to traditional logistic regression (in that the output is a probability for the category of interest).

In [12]:
input = torch.tensor([10.0, 12.0, 13.0])
sigmoid_model = nn.Sequential(
    nn.Linear(3, 2),
    nn.Linear(2, 1),
    nn.Sigmoid()
)
sigmoid_model(input)

tensor([0.0251], grad_fn=<SigmoidBackward0>)

#### Softmax function
Type of function that takes a one-dimensional input (specifically of floats) and outputs a one-dimensional distribution of probabilities that sum to 1. Used for multi-class classification, and can be placed as the final activation of a network of linear layers after which a forward pass produces the classification to be chosen from the highest per-class probability.

In [None]:
input = torch.tensor([10.0, 12.0, 13.0])
softmax_model = nn.Sequential(
    nn.Linear(3, 2),
    nn.Linear(2, 5),
    nn.Softmax()
)
softmax_model(input)

tensor([9.6944e-04, 4.9512e-07, 1.6559e-07, 3.1283e-02, 9.6775e-01],
       grad_fn=<SoftmaxBackward0>)

#### Loss function

Greater function that quantifies how far a machine learning model's predictions are from the actual target values, be it during training or in practice. The loss function takes a model prediction $\hat{y}$ (may be a singular regressive / sigmoid output, or a softmax tensor of probabilities) and ground truth $y$ (the actual value or class itself) as inputs and outputs a single float, the loss.

The goal of training is to minimize the loss, which should be low or zero for an accurate prediction and high for an incorrect one.

For cross-entropy loss, the ground truth value may be the class itself (a number), so to convert it into a tensor functional against the model prediction (a softmax probability distribution), use `nn.functional.one_hot()` which takes a tensor of indices to make one-hots for and a num_elements and to output a tensor of containing one-hot(s).

In [21]:
import torch.nn.functional as F

print(F.one_hot(torch.tensor(0), 3))
print(F.one_hot(torch.tensor(1), 3))
print(F.one_hot(torch.tensor([0,2]), 3))

tensor([1, 0, 0])
tensor([0, 1, 0])
tensor([[1, 0, 0],
        [0, 0, 1]])


#### Cross-entropy loss

Cross-entropy loss is a common loss function for classification. With a scores tensor (model predictions before the final softmax function) and a one-hot encoded ground truth label as input (both must be converted to floats), the cross-entropy loss function applies an internal softmax to the scores (producing a probability distribution of the same size), then outputs the negative natural log of the ground truth's corresponding probability.

In [29]:
scores = torch.tensor([-5.2, 4.6, 0.8])
one_hot_target = F.one_hot(torch.tensor(0), 3)

softmax = nn.Softmax()
print(softmax(scores))
criterion = nn.CrossEntropyLoss()
print(criterion(scores.double(), one_hot_target.double()))

tensor([5.4235e-05, 9.7807e-01, 2.1880e-02])
tensor(9.8222, dtype=torch.float64)


  return self._call_impl(*args, **kwargs)
