Tensors

Tensors are the fundamental data structures in PyTorch, similar to NumPy arrays but with additional capabilities such as GPU acceleration. PyTorch tensors can be used for building neural networks, performing operations, and holding data in models.

In [None]:
import torch

# Create a tensor (2x3 matrix)
tensor = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

print(tensor)
print(tensor.shape)  # Output: torch.Size([2, 3])

PyTorch allows you to perform various tensor operations like addition, multiplication, and matrix operations similar to NumPy, but with the added benefit of GPU support.

In [None]:
# Basic tensor operations
a = torch.tensor([1.0, 2.0])
b = torch.tensor([3.0, 4.0])

# Element-wise addition
c = a + b

# Matrix multiplication
d = torch.matmul(a.unsqueeze(0), b.unsqueeze(1))

print(c)  # Output: tensor([4., 6.])
print(d)  # Output: tensor([[11.]])

Autograd (Automatic Differentiation)

One of PyTorch's most powerful features is autograd, which allows for automatic computation of gradients for tensors. This is crucial for training neural networks because gradients are used to update model weights.

In [None]:
# Define a tensor with requires_grad=True to track operations for automatic differentiation
x = torch.tensor(2.0, requires_grad=True)

# Perform some operations
y = x ** 2 + 3 * x

# Compute gradients by calling backward()
y.backward()

# The gradient of y with respect to x (dy/dx)
print(x.grad)  # Output: tensor(7.)

Here, requires_grad=True tells PyTorch to keep track of all operations on x so that when backward() is called, it can compute the derivative (gradient).

Dynamic Computation Graph

PyTorch uses a dynamic computation graph, meaning the graph is created on-the-fly when operations are performed. This makes debugging and experimenting easier since the graph is generated dynamically with each forward pass.

In [None]:
def forward(x):
    return x ** 2 + 3 * x + 5

x = torch.tensor(3.0, requires_grad=True)
y = forward(x)
y.backward()

print(x.grad)  # Gradient will still be 9, but the graph is created dynamically

Unlike frameworks like TensorFlow (1.x), where the computation graph is static and defined before running, PyTorch builds the graph dynamically as the operations are executed.

Neural Networks (nn.Module)

In PyTorch, neural networks are built using the torch.nn module, which contains many pre-built layers like fully connected layers, convolutional layers, and more. A neural network in PyTorch is typically defined as a class that inherits from torch.nn.Module.

In [None]:
# Define a simple feedforward neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        # Define layers
        self.fc1 = nn.Linear(4, 10)  # Input layer (4 features), output (10 neurons)
        self.fc2 = nn.Linear(10, 3)  # Hidden layer (10 neurons), output (3 neurons)

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # Apply ReLU activation
        x = self.fc2(x)  # Output layer
        return x

# Instantiate the network
net = SimpleNet()

# Example input
input_data = torch.rand(1, 4)  # 1 sample with 4 features
output = net(input_data)
print(output)

This defines a neural network with two fully connected (linear) layers. The forward() function specifies how data flows through the network, and torch.relu() is an activation function applied to the output of the first layer.

Loss Functions

Loss functions in PyTorch measure how well or poorly a model's predictions match the true labels. Common loss functions include:

Mean Squared Error (MSE): Used for regression tasks.
Cross-Entropy Loss: Used for classification tasks.

In [None]:
loss_fn = nn.CrossEntropyLoss()

# Dummy target and prediction
pred = torch.tensor([[1.0, 2.0, 3.0]])
target = torch.tensor([2])

# Calculate loss
loss = loss_fn(pred, target)
print(loss)  # Cross-entropy loss

The nn.CrossEntropyLoss() combines LogSoftmax and NLLLoss in one class, which is typically used for multi-class classification problems.

Optimization

Optimization algorithms are used to adjust the model parameters (weights) to minimize the loss function. PyTorch provides several optimization algorithms in torch.optim, such as SGD, Adam, and RMSProp.

In [None]:
import torch.optim as optim

# Create optimizer
optimizer = optim.Adam(net.parameters(), lr=0.001)

# Training step
optimizer.zero_grad()  # Zero out gradients before backpropagation
loss.backward()  # Compute gradients
optimizer.step()  # Update model weights

Here, the optimizer adjusts the parameters of the neural network based on the computed gradients to minimize the loss function. optimizer.zero_grad() is called to clear the gradients before computing them again in the next iteration.

Training a Model

Putting everything together, training a model in PyTorch involves forward passing the data through the network, computing the loss, backpropagating the gradients, and updating the weights.

In [None]:
# Define a simple dataset (dummy data)
X = torch.rand(100, 4)  # 100 samples with 4 features
y = torch.randint(0, 3, (100,))  # 100 labels, 3 classes

# Define network, loss function, and optimizer
net = SimpleNet()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

# Training loop
for epoch in range(100):  # 100 epochs
    optimizer.zero_grad()  # Zero out gradients
    output = net(X)  # Forward pass
    loss = loss_fn(output, y)  # Compute loss
    loss.backward()  # Backpropagation
    optimizer.step()  # Update weights

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item()}")


This is a simple example of a training loop. The data is passed through the network, the loss is computed, and the weights are updated iteratively. Every 10 epochs, the loss is printed to track training progress.

Datasets and DataLoaders

PyTorch provides the torch.utils.data module, which includes Dataset and DataLoader classes for loading and batching data, which is especially useful for large datasets.

Dataset: Defines how to access the data.
DataLoader: Loads data in batches for training.

In [None]:
from torch.utils.data import Dataset, DataLoader

# Define a custom dataset
class CustomDataset(Dataset):
    def __init__(self, data, targets):
        self.data = data
        self.targets = targets

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.data[index], self.targets[index]

# Create dataset and dataloader
dataset = CustomDataset(X, y)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Iterate through data in batches
for batch_data, batch_labels in dataloader:
    print(batch_data, batch_labels)


The DataLoader class automatically handles batching, shuffling, and parallel loading, making it easy to work with large datasets during training.

 GPU Acceleration (CUDA)

One of PyTorch’s key strengths is its seamless support for GPU acceleration using CUDA. By sending tensors and models to the GPU, operations can be performed much faster.

In [None]:
# Check if CUDA is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move tensor to GPU
tensor = torch.tensor([[1.0, 2.0, 3.0]], device=device)

# Move model to GPU
net = SimpleNet().to(device)

# Move input to GPU before feeding it to the model
output = net(tensor)
print(output)

With tensor.to(device) and model.to(device), you can move both data and models to the GPU, where operations can be performed much faster.