### PyTorch Sum with `dim` Parameter

In PyTorch, the `torch.sum` function is used to calculate the sum of elements in a tensor. By using the `dim` parameter, you can specify the dimension along which you want to perform the sum.

1. **`dim=0` - Sum along Rows (summing elements along columns):**
   - If you set `dim=0`, PyTorch will perform the sum vertically, meaning across each column.
   - **Example with a 3x3 matrix:**
     ```python
     import torch

     matrix = torch.tensor([[1, 2, 3],
                            [4, 5, 6],
                            [7, 8, 9]])

     sum_along_columns = torch.sum(matrix, dim=0)
     print(sum_along_columns)
     ```
     The result will be `tensor([12, 15, 18])`, where each element of the resulting tensor is the sum of the corresponding column.

2. **`dim=1` - Sum along Columns (summing elements along rows):**
   - If you set `dim=1`, PyTorch will perform the sum horizontally, meaning across each row.
   - **Example:**
     ```python
     import torch

     matrix = torch.tensor([[1, 2, 3],
                            [4, 5, 6],
                            [7, 8, 9]])

     sum_along_rows = torch.sum(matrix, dim=1)
     print(sum_along_rows)
     ```
     The result will be `tensor([ 6, 15, 24])`, where each element of the resulting tensor is the sum of the corresponding row.

This allows you to choose how you want to calculate the sum along different directions in your tensor.


You can also pass in an `OrderedDict` to name the individual layers and operations, instead of using incremental integers. Note that dictionary keys must be unique, so _each operation must have a different name_.

In [2]:
from torch import nn

In [3]:
# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[1], output_size),
                      nn.Softmax(dim=1))
print(model)

Sequential(
  (0): Linear(in_features=784, out_features=128, bias=True)
  (1): ReLU()
  (2): Linear(in_features=128, out_features=64, bias=True)
  (3): ReLU()
  (4): Linear(in_features=64, out_features=10, bias=True)
  (5): Softmax(dim=1)
)


In [4]:
from collections import OrderedDict
model = nn.Sequential(OrderedDict([
                      ('fc1', nn.Linear(input_size, hidden_sizes[0])),
                      ('relu1', nn.ReLU()),
                      ('fc2', nn.Linear(hidden_sizes[0], hidden_sizes[1])),
                      ('relu2', nn.ReLU()),
                      ('output', nn.Linear(hidden_sizes[1], output_size)),
                      ('softmax', nn.Softmax(dim=1))]))
model

Sequential(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (relu2): ReLU()
  (output): Linear(in_features=64, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)

# Activation Function
| Activation Function | Purpose                                      | Output Range                 | Common Usage                                   |
|---------------------|----------------------------------------------|------------------------------|------------------------------------------------|
| Softmax             | Multi-class classification probabilities    | [0, 1] (probability values) | Output layer in multi-class classification       |
| tanh                | Introducing non-linearity, zero-centered     | [-1, 1]                      | Hidden layers, mitigating vanishing gradient     |
| ReLU                | Introducing non-linearity, unbounded         | [0, ∞)                       | Hidden layers, widely used for simplicity       |
