# Learning a Polynomial Function with SGD & ReLU

This notebook demonstrates how to use key machine learning concepts—**Stochastic Gradient Descent (SGD)** and the **ReLU** activation function—to approximate a polynomial function. Instead of directly solving for *x* in a given equation, we train a simple neural network to learn the relationship between *x* and *y* from a set of data points.

Our "true" function is $y = 2x^3 - x^2 - 5x + 3$. The goal is for our model to learn this shape without ever being told the true coefficients (2, -1, -5, 3).

## The Python/PyTorch Code

Below is the Python code using the PyTorch library that performs the training. We will define the true function, generate noisy data, build a simple neural network, and then train it to learn the mapping from X to Y. Finally, we'll visualize the results to see how well our model learned the underlying function.

In [None]:
# 1. Import Libraries
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# 2. Define the True Function and Generate Data
# This is the function we want our model to learn.
def true_func(x):
    # True parameters: a=2, b=-1, c=-5, d=3
    return 2 * (x**3) - 1 * (x**2) - 5 * x + 3

# Generate some (x, y) data points from the true function
N_SAMPLES = 100
X = torch.linspace(-3, 3, N_SAMPLES).unsqueeze(1)
y = true_func(X) + torch.randn(N_SAMPLES, 1) * 3 # Add some noise

# 3. Define the Neural Network Model
class PolynomialApproximator(nn.Module):
    def __init__(self):
        super().__init__()
        # A single input feature (x), and a hidden layer with 64 neurons
        self.layer1 = nn.Linear(1, 64)
        # ReLU activation function introduces non-linearity
        self.activation = nn.ReLU()
        # A single output feature (y)
        self.layer2 = nn.Linear(64, 1)

    def forward(self, x):
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        return x

# 4. Set up the Training Process
model = PolynomialApproximator()
loss_function = nn.MSELoss() # Mean Squared Error is a good choice for regression
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001) # Stochastic Gradient Descent

# 5. The Training Loop
epochs = 5000
for epoch in range(epochs):
    # Forward pass: compute predicted y by passing x to the model
    y_pred = model(X)
    
    # Compute loss
    loss = loss_function(y_pred, y)
    
    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print the loss periodically to track progress
    if (epoch + 1) % 500 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

print("\nTraining finished.")

# 6. Visualize the Results
print("Plotting results...")
# Switch model to evaluation mode
model.eval()

# Get the model's final predictions on the input data
with torch.no_grad(): # We don't need to track gradients for visualization
    learned_y = model(X)

# Create the plot
plt.figure(figsize=(12, 7))

# Plot the original noisy data points
plt.scatter(X.numpy(), y.numpy(), color='orange', label='True Data (with noise)', s=20, alpha=0.7)

# Plot the true underlying function (without noise)
plt.plot(X.numpy(), true_func(X).numpy(), 'g--', label='True Function', linewidth=2)

# Plot the function our model learned
plt.plot(X.numpy(), learned_y.numpy(), color='blue', label='Learned Function (Model Prediction)', linewidth=3)

# Add titles and labels for clarity
plt.title('Model Performance: Fitting a Cubic Function', fontsize=16)
plt.xlabel('X Value', fontsize=12)
plt.ylabel('Y Value', fontsize=12)
plt.legend()
plt.grid(True)
plt.show()