# Gradient Descent Using Different Python Libraries

A Taylor series expansion of a function $f$ is defined as:

\begin{equation}

f(x) = \sum^N_{k=0}{\frac {f^{(k)}(x-a)}{n!}(x-a)^k}

\end{equation}

And the Taylor series expansion of sine function at $a=0$ is:

\begin{equation}

sin(x) = \sum^{N}_{k=0}{\frac{(-1)^{k}}{(2k+1)!} x^{2k+1}} = \frac{x^1}{1!} - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!}.....

\end{equation}

Suppose we somehow are not able to calculate the Taylor coefficients, which are the $\frac {(-1)^k}{(2k+1)!}$ terms, replacing the coefficients in the equation with unknown weights $w_{k}$, we can still calcualte them using the gradient descent method.

\begin{equation}
\hat{y} = \sum^{N}_{k=0}{w_{k}x^{2k+1}} = w_0 x^1 + w_1 x^3 + w_2 x^5 + w_3 x^7...
\end{equation}

<center><image src="img/TaylorSeriesNeuralNetwork.png"></center>

Use $L_2$ loss as the loss function, defined as:


\begin{align}
\text{Loss function } L = \sqrt{(\hat{y} - y)^2}
\end{align}

The gradient of a weight $w_i$ can be determined by the following formula:

\begin{align}
\frac{\partial L}{\partial w_i} &= \frac {1}{2} [(\hat{y} - y)^2]^{\frac {-1}{2}}  2(\hat{y}-y) \frac{\partial \hat{y}}{\partial w_i} = \frac{\partial \hat{y}}{\partial w_i} \\
\frac{\partial L}{\partial w_0} &= x \\
\frac{\partial L}{\partial w_1} &= x^3 \\
\frac{\partial L}{\partial w_2} &= x^5 \\
\frac{\partial L}{\partial w_3} &= x^7 \\
\end{align}

In [None]:
# Code adapted from https://pytorch.org/tutorials/beginner/pytorch_with_examples.html# 

import numpy as np

N_SAMPLES = 4000
LEARNING_RATE = 1e-6

# Create random input and output data
x = np.linspace(-np.pi, np.pi, N_SAMPLES)
y = np.sin(x)

# Randomly initialize weights
w = np.random.random(8)

for t in range(8000):

    y_pred = w[0] + w[1]*x + w[2]*(x**2) + w[3]*(x**3) + w[4]*(x**4) + w[5]*(x**5) + w[6]*(x**6) + w[7]*(x**7)

    error = y_pred - y
    loss = (error ** 2).mean()
    gradients = np.array((
        2 * error,
        2 * error * x,
        2 * error * (x**2),
        2 * error * (x**3),
        2 * error * (x**4),
        2 * error * (x**5),
        2 * error * (x**6),
        2 * error * (x**7)
    )).mean(-1)
    w -= gradients * LEARNING_RATE
    
    if t % 200 == 199:
        print(f"Loss at {t+1:> 5} iteration: {loss:> 5.3f}")

In [None]:
import torch

dtype = torch.float
device = torch.device("cuda")

N_SAMPLES = 4000
LEARNING_RATE = 1e-6

x = torch.linspace(-torch.pi, torch.pi, N_SAMPLES, device=device, dtype=dtype)
y = torch.sin(x)
w = torch.randn(8, device = device, dtype = dtype)

for t in range(8000):
    y_pred = w[0] + w[1]*x + w[2]*x**2 + w[3]*x**3 + w[4]*x**4 + w[5]*x**5 + w[6]*x**6 + w[7]*x**7

    error = y_pred - y
    loss = error.pow(2).mean().item()

    gradients = torch.empty_like(w, device = device)
    gradients[0] = (2 * error).mean(-1)
    gradients[1] = (2 * error * x).mean(-1)
    gradients[2] = (2 * error * (x**2)).mean(-1)
    gradients[3] = (2 * error * (x**3)).mean(-1)
    gradients[4] = (2 * error * (x**4)).mean(-1)
    gradients[5] = (2 * error * (x**5)).mean(-1)
    gradients[6] = (2 * error * (x**6)).mean(-1)
    gradients[7] = (2 * error * (x**7)).mean(-1)

    """ 
    ValueError: only tensor with size 1 can be converted to python scalar
    gradients = torch.tensor((
        2 * error,
        2 * error * x,
        2 * error * (x**2),
        2 * error * (x**3),
        2 * error * (x**4),
        2 * error * (x**5),
        2 * error * (x**6),
        2 * error * (x**7)), device = device).mean(-1)
    gradients = torch.from_numpy(__gradients, device = device)
    """

    w -= gradients * LEARNING_RATE
    
    if t % 200 == 199:
        print(f"Loss at {t+1:> 5} iteration: {loss:> 5.3f}")


In [None]:
# -*- coding: utf-8 -*-
import torch

dtype = torch.float
device = "cuda"
torch.set_default_device(device)

N_SAMPLES = 4000
LEARNING_RATE = 1e-6

x = torch.linspace(-torch.pi, torch.pi, N_SAMPLES, dtype=dtype)
y = torch.sin(x)
w = torch.randn(8, device = device, dtype = dtype, requires_grad = True)

for t in range(8000):
    y_pred = w[0] + w[1]*x + w[2]*x**2 + w[3]*x** 3 + w[4]*x**4 + w[5]*x**5 + w[6]*x**6 + w[7]*x**7

    error = y_pred - y
    loss = error.pow(2).mean()

    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        w -= w.grad * LEARNING_RATE

        # Manually zero the gradients after updating weights
        w.grad[:] = 0
        
    if t % 200 == 199:
        print(f"Loss at {t+1:> 5} iteration: {loss:> 5.3f}")
