**Option 1: nn.Sequential**

In [32]:
import torch.nn as nn
model = nn.Sequential(nn.Linear(2, 50), nn.ReLU(), nn.Linear(50,1), nn.Sigmoid())

This builds a Multilayer Perceptron (MLP) with input dimension 2, output dimension 1, ReLU activation function, one hidden layers with dimension 50 and a final Sigmoid layer.

In [33]:
input = torch.empty(1, 2).normal_()
output = model(input)
print(output)

tensor([[0.5278]], grad_fn=<SigmoidBackward0>)


**Option 2: nn.Module**

In [34]:
import torch
import torch.nn as nn
import torch.nn.functional as F
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(in_features=2, out_features=50)
        self.fc2 = nn.Linear(in_features=50, out_features=1)
    def forward(self,x):
     x = self.fc1(x)
     x = F.relu(x)
     x = self.fc2(x)
     return F.sigmoid(x)

In [35]:
model = MLP()
input = torch.empty(1, 2).normal_()
output = model(input)
print(output)

tensor([[0.5274]], grad_fn=<SigmoidBackward0>)


**Loss Function**

In [36]:
loss_fn=nn.CrossEntropyLoss()

### **Example: Cross-Entropy Loss**
The outputs of the neural network $z$ are transformed into probabilities $q$ using the **Softmax** function:

$$q_i = \text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{C} e^{z_j}}$$
where $C$ is the the output dimension (number of classes).

The **Cross-Entropy** measures the "distance" between the predicted distribution $q$ and the true distribution $p$. For a single sample, the loss is:

$$\mathcal{L}(p, q) = -\sum_{i=1}^{C} p_i \log(q_i)$$

Since our labels are **one-hot encoded** ($p_i = 1$ for the correct class $c$, and $0$ otherwise), the formula collapses to:

$$\mathcal{L} = -\log(q_c)$$


**Optimizer**

In [37]:
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)

**One Step of Gradient Descent**

In [None]:
#Training Set
Data=...
label=...

output = model(Data)
loss = loss_fn(output, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()