# The Hello World of Neural Networks with Pytorch

The provided code demonstrates the entire process of creating a simple linear regression model, training it, making predictions, and inspecting the model's parameters. The dataset used consists of a set of numbers x and y, where x is `[-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]`, and y is `[-3.0, -1.0, 1.0, 3.0, 5.0, 7.0]`. The model's objective is to approximate the linear relationship `y = 2x - 1` based on this training data.



In [None]:
%pip install pynvml
%pip install nvidia-ml-py
%pip install torchinfo  # Reemplaza torchsummary, compatible con MPS

In [19]:
print("x is ", [-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]) #input
print("y is ", [-3.0, -1.0, 1.0, 3.0, 5.0, 7.0]) #output

x is  [-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]
y is  [-3.0, -1.0, 1.0, 3.0, 5.0, 7.0]




Note that with a rule-based approach, we should only write a function like this


In [20]:
def function_with_rules(x):
    y = (2 * x) - 1
    return y

x = 4
print("For x =", x, ", y = ", function_with_rules(x=x))

For x = 4 , y =  7


## Importing Libraries

We import the necessary libraries, including PyTorch for building and training the neural network model and NumPy for handling numerical data.


In [21]:
import torch
import torch.nn as nn #neural nets
import torch.optim as optim #
import numpy as np
import os

# Check for GPU availability

This line of code initializes a PyTorch device based on whether CUDA, the GPU acceleration library for NVIDIA GPUs, is available on the system or not. So, the line of code essentially sets the device variable to "cuda" if CUDA is available, indicating that GPU acceleration can be used, and "cpu" otherwise, indicating that computations should be performed on the CPU. This approach allows for seamless switching between CPU and GPU computations based on availability, ensuring that the code runs on the best available hardware without errors.

In [22]:
if torch.backends.mps.is_available():
    device = torch.device("mps")
elif torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

print("You are using", device)

You are using mps


## Inspecting GPU Status in Google Colab
When you run !nvidia-smi, it displays information about the Nvidia GPU allocated to your Colab session, including details such as the GPU model, GPU memory usage, processes currently running on the GPU, and more. This command is useful for verifying that you have access to a GPU and for monitoring GPU usage during training or inference tasks in Colab.

In [None]:
!nvidia-smi #only works for device = "gpu"!!!!!!!

The command `nvidia-smi -L `lists the available GPUs on the system.

In [None]:
!nvidia-smi  -L

The command `nvidia-smi -L | wc -l` is used to count the number of GPUs available on a machine.


In [None]:
!nvidia-smi  -L | wc -l

## Controlling GPU Allocation with CUDA_VISIBLE_DEVICES Configuration
This code snippet sets the environment variable CUDA_VISIBLE_DEVICES, which is used by CUDA (NVIDIA's parallel computing platform) to specify which GPUs should be made visible to CUDA-enabled applications.

Setting CUDA_VISIBLE_DEVICES is useful when you have multiple GPUs available but only want to use a subset of them for a specific task. By setting this environment variable, you can control which GPUs are utilized by CUDA-enabled applications, such as deep learning frameworks like TensorFlow or PyTorch.



In [23]:
# Para MPS (Apple Silicon), no hay necesidad de configurar dispositivos visibles
# ya que solo hay una GPU integrada

if device.type == "mps":
    print("Usando MPS (Metal Performance Shaders) - GPU de Apple Silicon")
    print("MPS utiliza automáticamente la GPU integrada del chip Apple")
elif device.type == "cuda":
    num_gpus = 1  # num. gpus you want to use in this notebook
    os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(str(x) for x in range(num_gpus))
    print("CUDA_VISIBLE_DEVICES =", os.environ["CUDA_VISIBLE_DEVICES"])
else:
    print("Usando CPU")

Usando MPS (Metal Performance Shaders) - GPU de Apple Silicon
MPS utiliza automáticamente la GPU integrada del chip Apple


## Function to Select the Best CUDA Device

This function `get_best_cuda_device` helps in automatically selecting the most suitable CUDA-enabled GPU device for PyTorch computations. It checks for CUDA availability and, if multiple GPUs are present, it identifies the one with the most free memory to optimize performance and avoid out-of-memory errors. If no CUDA device is available, it defaults to using the CPU.

In [24]:
# Versión adaptada para MPS, CUDA y CPU
import torch

def get_best_device():
    """
    Selecciona el mejor dispositivo disponible:
    - MPS (Metal Performance Shaders) para Apple Silicon
    - CUDA para GPUs NVIDIA
    - CPU como fallback
    """
    # Verificar MPS (Apple Silicon)
    if torch.backends.mps.is_available():
        print("MPS (Metal Performance Shaders) está disponible.")
        print("Usando GPU de Apple Silicon")
        return torch.device("mps")
    
    # Verificar CUDA (NVIDIA GPUs)
    if torch.cuda.is_available():
        try:
            import pynvml
            pynvml.nvmlInit()
            best_gpu = 0
            max_free_mem = 0

            for i in range(torch.cuda.device_count()):
                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
                mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
                free_mem = mem_info.free
                print(f"GPU {i} - Memoria libre: {free_mem / 1024**2:.2f} MiB")

                if free_mem > max_free_mem:
                    best_gpu = i
                    max_free_mem = free_mem

            pynvml.nvmlShutdown()
            print(f"Seleccionando GPU {best_gpu} con más memoria libre.")
            return torch.device(f"cuda:{best_gpu}")
        except:
            print("CUDA disponible pero error al obtener información de memoria.")
            return torch.device("cuda")
    
    # Fallback a CPU
    print("GPU no disponible. Usando CPU.")
    return torch.device("cpu")

get_best_device()

MPS (Metal Performance Shaders) está disponible.
Usando GPU de Apple Silicon


device(type='mps')

# Model Definition
We define a simple linear regression model using PyTorch's `nn.Sequential` container. Inside the container, we have one `nn.Linear` layer, which represents a linear transformation.

In [25]:
# Build a simple Sequential model
model = nn.Sequential(
    nn.Linear(1, 1) # theta'*x, logits
    #no activation
)

model.to(device)# Move the model to the GPU if available

Sequential(
  (0): Linear(in_features=1, out_features=1, bias=True)
)

This code defines a simple neural network model using PyTorch's `nn.Sequential` container, which is a convenient way to create a sequence of neural network layers. In this case, we have only one layer:

* `nn.Linear(1, 1)`: This line defines a linear (fully connected) layer within the model. Let's break down the arguments:

    * `nn.Linear`: This is the linear layer class provided by PyTorch, which implements a linear transformation. It's essentially a matrix multiplication operation.

    * `1, 1`: The first 1 represents the number of input features, and the second 1 represents the number of output features. In other words, this layer has one input feature and produces one output feature.

So, what does this layer do? It's a linear transformation that can be expressed as $y=\theta_0+\theta_1x$ , where:

* $y$ is the output (a single number in this case).
* $\theta_1$ is the weight (a learnable parameter), and since it's 1x1, it's a scalar.
* $x$ is the input feature (a single number in this case).
* $\theta_0$ is the bias (another learnable parameter), also a scalar.

This linear layer essentially tries to learn the best values for $\theta_0$  and $\theta_1$  that allow the model to make predictions based on the input feature $x$.

In the context of this linear regression problem (predicting y based on x), this linear layer models a linear relationship between x and y. It's the core of this model, and its goal during training is to adjust the weights $\theta_0$  and $\theta_1$ to minimize the mean squared error (MSE) between the predicted values and the actual target values.

The `.to(device)` method in PyTorch is used to move tensors or models to a specific device, such as a GPU or CPU. This method is commonly used to ensure that the tensors and models are compatible with the device on which computations are performed.

When you call .to(device), you specify the device as an argument. For example, if you want to move a tensor or model to the GPU (if available), you use torch.device("cuda"). If you want to use the CPU, you specify torch.device("cpu").

## Visualizing the model


In [26]:
from torchinfo import summary
features = 1  # our input is a single feature
summary(model, input_size=(1, features))  # (batch_size, features)

ModuleNotFoundError: No module named 'torchinfo'

The provided code snippet utilizes the summary function from the `torchsummary` library to generate a summary of the model, including information about the layers and their output shapes.

Here's a breakdown of each part of the code:

`from torchsummary import summary:` This line imports the summary function from the `torchsummary` library. This function provides a summary of a PyTorch model, including details such as the number of parameters and the output shapes of each layer.

`features = 1: `This line defines the number of features in the input data. In this case, it indicates that the input to the model consists of a single feature. This information is used to specify the input size when generating the model summary.

`summary(model, input_size=(features,)):` This line calls the summary function, passing in the model and the input size as arguments. The input_size parameter specifies the size of the input data expected by the model. In this case, it is specified as a tuple `(features,)`, indicating that the input data consists of a single feature. The summary function then analyzes the model and generates a summary, including details such as the layer types, output shapes, and the number of parameters in each layer.

The output shape [-1,1] indicates that the output of the model has a batch dimension (-1) and a single feature dimension (1). The -1 in the batch dimension represents that the batch size can vary and is determined dynamically based on the input data during inference or training.


# Loss Function and Optimizer

We define the loss function as Mean Squared Error (MSE) using `nn.MSELoss`. This is a common loss function for regression problems.
We then set up the optimizer as Stochastic Gradient Descent (SGD) using `optim.SGD`. It's used to update the model's parameters during training.

In [27]:
# Define the loss function and optimizer
criterion = nn.MSELoss() #evaluation, loss function J
optimizer = optim.SGD(model.parameters(), lr=0.01)


 Let's break down the code where the loss function and optimizer are defined:

* `criterion = nn.MSELoss()`: Here, we define the loss function as Mean Squared Error (MSE) using `nn.MSELoss()`. The `MSELoss` measures the mean squared difference between predicted values and actual target values. In the context of linear regression, it quantifies how well the model's predictions match the true target values. The goal during training is to minimize this loss, meaning the model aims to make its predictions as close as possible to the actual targets.

* `optimizer = optim.SGD(model.parameters(), lr=0.01)`: We define the optimizer as Stochastic Gradient Descent (SGD) using `optim.SGD`. The parameters of this optimizer are as follows:
    * `model.parameters()`: This method retrieves all the learnable parameters of the model. In the context of the linear regression model, these parameters are the weights  $\theta_0$  and $\theta_1$   of the linear layer defined earlier. The optimizer will adjust these parameters during training to minimize the loss.
    * `lr=0.01`: This sets the learning rate for the optimizer. The learning rate is a hyperparameter that controls the step size during the optimization process. It influences how quickly or slowly the model's parameters are updated. A smaller learning rate makes the training more stable but may require more epochs to converge, while a larger learning rate can speed up convergence but may lead to overshooting the optimal parameters.


# Data Preparation

We define our input data `xs` and output data `ys` as NumPy arrays. `xs` contains the input values, and `ys` contains the corresponding target values.

We convert these NumPy arrays into PyTorch tensors and reshape them using view to ensure they have the correct shape for PyTorch.

`.view(-1, 1)` is used to reshape the tensors. The -1 in the view method indicates that the size of that dimension is inferred from the length of the data in the tensor, and 1 specifies the new shape. In this case, it reshapes the tensors to have dimensions `(6, 1)`, where 6 represents the number of data points, and 1 represents that there is one feature for each data point

In [28]:
# Declare model inputs and outputs for training y = 2x - 1
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=np.float32)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=np.float32)

print(xs.shape)

# Convert NumPy arrays to PyTorch tensors and reshape xs and move to the GPU if available
xs = torch.tensor(xs).view(-1, 1).to(device)
ys = torch.tensor(ys).view(-1, 1).to(device)

print(xs.shape)

(6,)
torch.Size([6, 1])


This code prepares data for training a model using PyTorch by creating a dataset and a data loader. The dataset contains input-output pairs (`xs` and `ys`), and the data loader will provide batches of this data for training, with each batch containing a specified number of samples (`batch_size`) and being shuffled for each epoch (`shuffle=True`).

Let's break down what each part of the code does:

1. `from torch.utils.data import TensorDataset, DataLoader`: This line imports the `TensorDataset` class and the `DataLoader` class from the `torch.utils.data` module. These classes are commonly used in PyTorch for handling datasets and data loading during the training process.

2. `dataset = TensorDataset(xs, ys)`: This line creates a dataset object using the `TensorDataset` class. `xs` and `ys` are tensors containing input data and corresponding labels, respectively. The `TensorDataset` class allows you to combine these input-output pairs into a single dataset object.

3. `dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)`: This line creates a data loader object using the `DataLoader` class. It takes the dataset object created earlier (`dataset`) and additional parameters like `batch_size` and `shuffle`. Here, `batch_size` specifies the number of samples per batch, and `shuffle=True` indicates that the data will be shuffled randomly before being divided into batches. The data loader will then iterate over these batches during the training process.

In [29]:
from torch.utils.data import TensorDataset, DataLoader
# Define batch size
batch_size = 2

# Create dataset
dataset = TensorDataset(xs, ys)
print("xs: ", dataset.tensors[0], "\nys: ", dataset.tensors[1])

#reproducibility
torch.manual_seed(42)

# Create dataloader
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

xs:  tensor([[-1.],
        [ 0.],
        [ 1.],
        [ 2.],
        [ 3.],
        [ 4.]], device='mps:0') 
ys:  tensor([[-3.],
        [-1.],
        [ 1.],
        [ 3.],
        [ 5.],
        [ 7.]], device='mps:0')


# Example of Automatic Differentiation in PyTorch

This code demonstrates a fundamental concept in PyTorch: automatic differentiation using the .backward() method. It shows how to define a simple mathematical function, mark a tensor for which you want to compute gradients, and then automatically calculate the derivative of the function with respect to that tensor. This is the core mechanism that enables training neural networks by calculating the gradients of the loss function with respect to the model's parameters.

In [30]:
import torch

# Define a tensor x with requires_grad=True to track gradients
x = torch.tensor([7.0], requires_grad=True)

# Define the function y = x^2
y = x**2

print(x.grad) #None before backward

# Compute the gradients of y with respect to x
y.backward() # the gradient is 2*x

# Print the gradient of x
print(x.grad)

None
tensor([14.])


#**Training**

This code represents a typical training loop for a neural network model using PyTorch. Let's break it down step by step:

1. **Training Loop Initialization**: It sets the number of epochs (`num_epochs`) to 500, indicating how many times the entire dataset will be passed forward and backward through the neural network.

2. **Loop Over Epochs**: The outer loop iterates over each epoch from 0 to `num_epochs - 1`. During each epoch, the entire dataset is passed through the network once.

3. **Inner Loop Over Batches**: The inner loop iterates over batches of data (`xs_batch` and `y_batch`) obtained from the `dataloader`. `xs_batch` contains input data samples, and `y_batch` contains corresponding labels.

4. **Forward Pass**: Inside the inner loop, the input batch (`xs_batch`) is fed into the neural network model (`model`) to obtain predictions (`outputs`).

5. **Compute Loss**: The predicted outputs (`outputs`) are compared against the actual labels (`y_batch`) using a loss function (`criterion`) to calculate the loss value (`loss`). The loss quantifies how well the model's predictions match the true labels.

6. **Backpropagation and Parameter Update**: We clear the gradients using `optimizer.zero_grad()` to prepare for a new backward pass. This is essential to avoid the accumulation of gradients from one batch to the next, which could lead to incorrect and unstable training. It's a standard practice when training neural networks with gradient-based optimization algorithms like stochastic gradient descent (SGD).
After computing the loss, the gradients of the model parameters with respect to the loss are calculated (`loss.backward()`), and the optimizer updates the model parameters based on these gradients (`optimizer.step()`). This process is called backpropagation.



In [31]:
# Training loop
num_epochs = 20
for epoch in range(num_epochs):

    for batch_idx, (xs_batch, y_batch) in enumerate(dataloader):

        #print("Batch", batch_idx, "xs_batch:", xs_batch, "\nys_batch:", y_batch)
        #print("***************")
        # Forward pass
        outputs = model(xs_batch)

        # Compute the loss
        loss = criterion(outputs, y_batch) #MSE

        # Zero the gradients, perform a backward pass, and update the weights
        optimizer.zero_grad()
        loss.backward() #backpropagation
        optimizer.step() #Gradiend descent

    print("======================, End of Epoch", epoch)
    if epoch % 5 == 0:
        print("At Epoch ", epoch, "Loss is", loss.item()) # Use loss.item() to get a scalar value



At Epoch  0 Loss is 0.4724831283092499
At Epoch  5 Loss is 1.500733494758606
At Epoch  10 Loss is 2.4987146854400635
At Epoch  15 Loss is 1.4497300386428833


# Making a Prediction:
After training, we make a prediction for the input value 10.0 by passing it through the trained model and converting the result to a Python scalar using `.item()`. The statement  `with torch.no_grad()` disables gradient calculation for the operations inside the with block. Gradient calculation is used during training to update the model's parameters. However, when making predictions, we don't need to calculate gradients.

Forgetting to set the PyTorch model to evaluation mode (model.eval()) before performing inference can lead to unexpected behavior, particularly with layers like Batch Normalization or Dropout, which behave differently during training and inference. This can result in incorrect predictions or degraded model performance due to improper normalization or dropout. It's best practice to always set the model to evaluation mode before inference to ensure consistent behavior and accurate predictions.

In [32]:
# Make a prediction
model.eval()
with torch.no_grad():
    predicted_value = model(torch.tensor([10.0]).to(device))

print(predicted_value)
print(predicted_value.item())  # Convert the result to a Python scalar

#y = 2*x - 1 para x = 10: y:19

tensor([16.5392], device='mps:0')
16.53920555114746


# Printing Model Parameters
Finally, we print the model's parameter names and their values. In this case, there's only one set of parameters corresponding to the linear transformation.

In [33]:
# Print model layer information
for name, param in model.named_parameters():
    print(name)
    print(param.data)# print(param.data.item())
    #print(param.grad)
print(model)

# y = 2*x -1 = theta_0 + theta_1*x

0.weight
tensor([[1.6427]], device='mps:0')
0.bias
tensor([0.1123], device='mps:0')
Sequential(
  (0): Linear(in_features=1, out_features=1, bias=True)
)


# Actividad



For the given data, build a neural network with two hidden layers, each consisting of 'N' units, and use the sigmoid activation function. Choose the value of 'N' so that the prediction for an input of 2.5 is as close as possible to the expected value (considering the quadratic relationship between the input and output). The output layer should have a single unit with a linear activation function. Compile your model using the same optimizer and loss function as in the previous example. Train your model for 10000 epochs and calculate the mean squared error (MSE) on the training dataset.

In [34]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Declare model inputs and outputs for training y = x^2
xs = np.array([-4.0, -3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=np.float32)
ys = np.array([16, 9, 4, 1, 0, 1, 4, 9, 16], dtype=np.float32)

# Convert NumPy arrays to PyTorch tensors
xs = torch.tensor(xs).view(-1, 1)
ys = torch.tensor(ys).view(-1, 1)


In [35]:
# Move tensors to the appropriate device
xs = xs.to(device)
ys = ys.to(device)

# Define N (number of neurons per hidden layer)
N = 8  # This value can be tuned for better performance

# Build a neural network with two hidden layers with sigmoid activation
model_quadratic = nn.Sequential(
    nn.Linear(1, N),      # Input layer to first hidden layer
    nn.Sigmoid(),         # Sigmoid activation
    nn.Linear(N, N),      # First hidden layer to second hidden layer
    nn.Sigmoid(),         # Sigmoid activation
    nn.Linear(N, 1)       # Second hidden layer to output (linear activation)
)

# Move model to device
model_quadratic.to(device)

print("Model architecture:")
print(model_quadratic)

Model architecture:
Sequential(
  (0): Linear(in_features=1, out_features=8, bias=True)
  (1): Sigmoid()
  (2): Linear(in_features=8, out_features=8, bias=True)
  (3): Sigmoid()
  (4): Linear(in_features=8, out_features=1, bias=True)
)


In [36]:
# Print model parameters
print("\nModel Parameters (after training):")
print("=" * 50)
for name, param in model_quadratic.named_parameters():
    print(f"\n{name}:")
    print(param.data)
print("=" * 50)


Model Parameters (after training):

0.weight:
tensor([[-0.5164],
        [-0.6817],
        [ 0.5306],
        [-0.4042],
        [ 0.6069],
        [-0.2373],
        [ 0.5720],
        [-0.7770]], device='mps:0')

0.bias:
tensor([-0.5046,  0.3049,  0.2114, -0.2550,  0.5961,  0.6798, -0.7252, -0.5339],
       device='mps:0')

2.weight:
tensor([[ 0.3237, -0.1193, -0.1253, -0.3421, -0.2025,  0.0883, -0.0467, -0.2566],
        [ 0.0083, -0.2415, -0.3000, -0.1947, -0.3094, -0.2251,  0.3534,  0.0668],
        [ 0.1090, -0.3298, -0.2322, -0.1177,  0.0553, -0.3111, -0.1523, -0.2117],
        [ 0.0010, -0.1316, -0.0245, -0.2396, -0.2427, -0.2063, -0.1210, -0.2791],
        [ 0.2964, -0.0702,  0.3042,  0.1102, -0.2994,  0.2447, -0.0973, -0.1355],
        [-0.2935, -0.3515,  0.1012, -0.0772,  0.1376, -0.2901,  0.2625, -0.2595],
        [-0.0610,  0.0738,  0.1825,  0.2854,  0.3221, -0.2803,  0.0890, -0.1521],
        [-0.0387, -0.2646,  0.3220, -0.2595,  0.1890,  0.1243,  0.1149, -0.1911]],
   

In [38]:
# Define the loss function and optimizer (same as previous example)
criterion_quadratic = nn.MSELoss()
optimizer_quadratic = optim.SGD(model_quadratic.parameters(), lr=0.01)

print("Loss function: MSE")
print("Optimizer: SGD with learning rate = 0.01")

Loss function: MSE
Optimizer: SGD with learning rate = 0.01


In [39]:
# Calculate the Mean Squared Error (MSE) on the training dataset
model_quadratic.eval()
with torch.no_grad():
    predictions = model_quadratic(xs)
    mse = criterion_quadratic(predictions, ys)

print(f"\n{'='*50}")
print(f"Mean Squared Error (MSE) on training dataset:")
print(f"{'='*50}")
print(f"MSE: {mse.item():.6f}")
print(f"{'='*50}")

# Show predictions vs actual values
print(f"\nPredictions vs Actual Values:")
print(f"{'-'*50}")
print(f"{'X':>8} {'Predicted Y':>12} {'Actual Y':>12} {'Error':>10}")
print(f"{'-'*50}")
for i in range(len(xs)):
    x_val = xs[i].item()
    pred_val = predictions[i].item()
    actual_val = ys[i].item()
    error = abs(pred_val - actual_val)
    print(f"{x_val:>8.1f} {pred_val:>12.4f} {actual_val:>12.1f} {error:>10.4f}")
print(f"{'-'*50}")


Mean Squared Error (MSE) on training dataset:
MSE: 85.081932

Predictions vs Actual Values:
--------------------------------------------------
       X  Predicted Y     Actual Y      Error
--------------------------------------------------
    -4.0      -0.4049         16.0    16.4049
    -3.0      -0.4141          9.0     9.4141
    -2.0      -0.4271          4.0     4.4271
    -1.0      -0.4442          1.0     1.4442
     0.0      -0.4641          0.0     0.4641
     1.0      -0.4840          1.0     1.4840
     2.0      -0.5014          4.0     4.5014
     3.0      -0.5151          9.0     9.5151
     4.0      -0.5254         16.0    16.5254
--------------------------------------------------


In [40]:
# Make a prediction for x = 2.5
model_quadratic.eval()
with torch.no_grad():
    test_input = torch.tensor([2.5]).view(-1, 1).to(device)
    predicted_value = model_quadratic(test_input)

expected_value = 2.5 ** 2  # 6.25

print(f"\n{'='*50}")
print(f"Prediction for x = 2.5:")
print(f"{'='*50}")
print(f"Predicted value: {predicted_value.item():.4f}")
print(f"Expected value:  {expected_value:.4f}")
print(f"Difference:      {abs(predicted_value.item() - expected_value):.4f}")
print(f"{'='*50}")


Prediction for x = 2.5:
Predicted value: -0.5087
Expected value:  6.2500
Difference:      6.7587


In [41]:
# Training loop for 10000 epochs
num_epochs = 10000

# Set model to training mode
model_quadratic.train()

# Track loss history
loss_history = []

for epoch in range(num_epochs):
    # Forward pass
    outputs = model_quadratic(xs)
    
    # Compute the loss
    loss = criterion_quadratic(outputs, ys)
    
    # Zero the gradients, perform a backward pass, and update the weights
    optimizer_quadratic.zero_grad()
    loss.backward()
    optimizer_quadratic.step()
    
    # Store loss
    loss_history.append(loss.item())
    
    # Print progress every 1000 epochs
    if (epoch + 1) % 1000 == 0:
        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.6f}")

print("\nTraining completed!")

Epoch [1000/10000], Loss: 0.296147
Epoch [2000/10000], Loss: 0.104834
Epoch [3000/10000], Loss: 0.039529
Epoch [4000/10000], Loss: 0.016563
Epoch [5000/10000], Loss: 0.007931
Epoch [6000/10000], Loss: 0.004488
Epoch [7000/10000], Loss: 0.002975
Epoch [8000/10000], Loss: 0.002211
Epoch [9000/10000], Loss: 0.001760
Epoch [10000/10000], Loss: 0.001454

Training completed!


#References

This code is inspired by the "The Hello World of Neural Networks" notebook from the TensorFlow Specialization by Deeplearning.ai. Concepts and implementations have been adapted for PyTorch.