# Gradient Descent Using Different Python Libraries

A polynomial expansion of a function $f$ is defined as:

\begin{equation}

f(x) = \sum^{N}_{k=0}{w_k x^k} = w_0 + w_1x + w_2x^2 + w_3x^3 + w_4x^4 + w_5x^5 .....

\end{equation}

In this demonstration, we attempt to calculate the coefficients $w_k$ in the polynomial expansion of $sin(x)$. By gradient descent, the optimal value for each weight $w_k$ can be found when 

\begin{equation}
\frac{\partial L}{\partial w_k} \approx 0
\end{equation}

And the loss function $L$ is the $L_2$ loss of our approximation, defined as

\begin{equation}
L = (\hat{y} - y)^2 \quad \text{$\hat{y}$: true value of $sin(x)$}
\end{equation}

Taking the partial derivative of $L$ with respect to each weight, the gradient can be found
\begin{equation}
\frac{\partial L}{\partial w_k} = \frac{\partial (\hat{y}-y)^2}{\partial w_k} = 2(\hat{y}-y)x^k
\end{equation}

<center><image src="img/TaylorSeriesNeuralNetwork.png"></center>

In [11]:
# Code adapted from https://pytorch.org/tutorials/beginner/pytorch_with_examples.html# 

import numpy as np

N_SAMPLES = 4000
LEARNING_RATE = 1e-6

# Create random input and output data
x = np.linspace(-np.pi, np.pi, N_SAMPLES)
y = np.sin(x)

# Randomly initialize weights
np.random.seed(0)
w = np.random.random(8)

for t in range(4000):

    y_pred = w[0] + w[1]*x + w[2]*(x**2) + w[3]*(x**3) + w[4]*(x**4) + w[5]*(x**5) + w[6]*(x**6) + w[7]*(x**7)

    error = y_pred - y
    loss = (error ** 2).mean()
    gradients = np.array((
        2 * error,
        2 * error * x,
        2 * error * (x**2),
        2 * error * (x**3),
        2 * error * (x**4),
        2 * error * (x**5),
        2 * error * (x**6),
        2 * error * (x**7)
    )).mean(-1)
    w -= gradients * LEARNING_RATE
    
    if t % 200 == 199:
        print(f"Loss at {t+1:> 5} iteration: {loss:> 5.3f}")

Loss at   200 iteration:  98.422
Loss at   400 iteration:  84.372
Loss at   600 iteration:  72.518
Loss at   800 iteration:  62.510
Loss at  1000 iteration:  54.054
Loss at  1200 iteration:  46.905
Loss at  1400 iteration:  40.854
Loss at  1600 iteration:  35.728
Loss at  1800 iteration:  31.380
Loss at  2000 iteration:  27.688
Loss at  2200 iteration:  24.547
Loss at  2400 iteration:  21.871
Loss at  2600 iteration:  19.586
Loss at  2800 iteration:  17.632
Loss at  3000 iteration:  15.956
Loss at  3200 iteration:  14.516
Loss at  3400 iteration:  13.273
Loss at  3600 iteration:  12.199
Loss at  3800 iteration:  11.266
Loss at  4000 iteration:  10.453


In [12]:
import torch

dtype = torch.float
device = torch.device("cuda")

N_SAMPLES = 4000
LEARNING_RATE = 1e-6

x = torch.linspace(-torch.pi, torch.pi, N_SAMPLES, device=device, dtype=dtype)
y = torch.sin(x)
np.random.seed(0)
w = torch.randn(8, device = device, dtype = dtype)

for t in range(4000):
    y_pred = w[0] + w[1]*x + w[2]*x**2 + w[3]*x**3 + w[4]*x**4 + w[5]*x**5 + w[6]*x**6 + w[7]*x**7

    error = y_pred - y
    loss = error.pow(2).mean().item()

    gradients = torch.empty_like(w, device = device)
    gradients[0] = (2 * error).mean(-1)
    gradients[1] = (2 * error * x).mean(-1)
    gradients[2] = (2 * error * (x**2)).mean(-1)
    gradients[3] = (2 * error * (x**3)).mean(-1)
    gradients[4] = (2 * error * (x**4)).mean(-1)
    gradients[5] = (2 * error * (x**5)).mean(-1)
    gradients[6] = (2 * error * (x**6)).mean(-1)
    gradients[7] = (2 * error * (x**7)).mean(-1)

    """ 
    ValueError: only tensor with size 1 can be converted to python scalar
    gradients = torch.tensor((
        2 * error,
        2 * error * x,
        2 * error * (x**2),
        2 * error * (x**3),
        2 * error * (x**4),
        2 * error * (x**5),
        2 * error * (x**6),
        2 * error * (x**7)), device = device).mean(-1)
    gradients = torch.from_numpy(__gradients, device = device)
    """

    w -= gradients * LEARNING_RATE
    
    if t % 200 == 199:
        print(f"Loss at {t+1:> 5} iteration: {loss:> 5.3f}")


Loss at   200 iteration:  175.401
Loss at   400 iteration:  159.912
Loss at   600 iteration:  146.534
Loss at   800 iteration:  134.941
Loss at  1000 iteration:  124.859
Loss at  1200 iteration:  116.056
Loss at  1400 iteration:  108.339
Loss at  1600 iteration:  101.542
Loss at  1800 iteration:  95.529
Loss at  2000 iteration:  90.184
Loss at  2200 iteration:  85.409


In [None]:
import torch

dtype = torch.float
device = "cuda"
torch.set_default_device(device)

N_SAMPLES = 4000
LEARNING_RATE = 1e-6

x = torch.linspace(-torch.pi, torch.pi, N_SAMPLES, dtype=dtype)
y = torch.sin(x)
torch.manual_seed(0)
w = torch.randn(8, device = device, dtype = dtype, requires_grad = True)

for t in range(4000):
    y_pred = w[0] + w[1]*x + w[2]*x**2 + w[3]*x** 3 + w[4]*x**4 + w[5]*x**5 + w[6]*x**6 + w[7]*x**7

    error = y_pred - y
    loss = error.pow(2).mean()

    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        w -= w.grad * LEARNING_RATE

        # Manually zero the gradients after updating weights
        w.grad[:] = 0
        
    if t % 200 == 199:
        print(f"Loss at {t+1:> 5} iteration: {loss:> 5.3f}")


Loss at   200 iteration:  86.842
Loss at   400 iteration:  77.675
Loss at   600 iteration:  69.847
Loss at   800 iteration:  63.148
Loss at  1000 iteration:  57.402
Loss at  1200 iteration:  52.459
Loss at  1400 iteration:  48.195
Loss at  1600 iteration:  44.505
Loss at  1800 iteration:  41.300
Loss at  2000 iteration:  38.506
Loss at  2200 iteration:  36.060
Loss at  2400 iteration:  33.910
Loss at  2600 iteration:  32.012
Loss at  2800 iteration:  30.327
Loss at  3000 iteration:  28.824
Loss at  3200 iteration:  27.478
Loss at  3400 iteration:  26.265
Loss at  3600 iteration:  25.167
Loss at  3800 iteration:  24.167
Loss at  4000 iteration:  23.253
Loss at  4200 iteration:  22.413
Loss at  4400 iteration:  21.636
Loss at  4600 iteration:  20.916
Loss at  4800 iteration:  20.245
Loss at  5000 iteration:  19.618
Loss at  5200 iteration:  19.028
Loss at  5400 iteration:  18.473
Loss at  5600 iteration:  17.948
Loss at  5800 iteration:  17.450
Loss at  6000 iteration:  16.977
Loss at  6