# Gradient Descent Using Different Python Libraries

A Taylor series expansion of a function $f$ is defined as:

\begin{equation}

f(x) = \sum^N_{k=0}{\frac {f^{(k)}(x-a)}{n!}(x-a)^k}

\end{equation}

And the Taylor series expansion of sine function at $a=0$ is:

\begin{equation}

sin(x) = \sum^{N}_{k=0}{\frac{(-1)^{k}}{(2k+1)!} x^{2k+1}} = \frac{x^1}{1!} - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!}.....

\end{equation}

Suppose we somehow are not able to calculate the Taylor coefficients, which are the $\frac {(-1)^k}{(2k+1)!}$ terms, replacing the coefficients in the equation with unknown weights $w_{k}$, we can still calcualte them using the gradient descent method.

\begin{equation}
\hat{y} = \sum^{N}_{k=0}{w_{k}x^{2k+1}} = w_0 x^1 + w_1 x^3 + w_2 x^5 + w_3 x^7...
\end{equation}

<center><image src="img/TaylorSeriesNeuralNetwork.png"></center>

Use $L_2$ loss as the loss function, defined as:


\begin{align}
\text{Loss function } L = \sqrt{(\hat{y} - y)^2}
\end{align}

The gradient of a weight $w_i$ can be determined by the following formula:

\begin{align}
\frac{\partial L}{\partial w_i} &= \frac {1}{2} [(\hat{y} - y)^2]^{\frac {-1}{2}}  2(\hat{y}-y) \frac{\partial \hat{y}}{\partial w_i} = \frac{\partial \hat{y}}{\partial w_i} \\
\frac{\partial L}{\partial w_0} &= x \\
\frac{\partial L}{\partial w_1} &= x^3 \\
\frac{\partial L}{\partial w_2} &= x^5 \\
\frac{\partial L}{\partial w_3} &= x^7 \\
\end{align}

In [10]:
# Code adapted from https://pytorch.org/tutorials/beginner/pytorch_with_examples.html# 

import numpy as np

EPOCHES = 4000
N_SAMPLES = 4000
LEARNING_RATE = 1e-6

# Create random input and output data
x = np.linspace(-np.pi, np.pi, N_SAMPLES)
y = np.sin(x)

# Randomly initialize weights
w = np.random.random(8)

for t in range(2000):

    y_pred = w[0] + w[1]*x + w[2]*(x**2) + w[3]*(x**3) + w[4]*(x**4) + w[5]*(x**5) + w[6]*(x**6) + w[7]*(x**7)

    error = y_pred - y
    loss = (error ** 2).mean()
    gradients = np.array((
        2 * error,
        2 * error * x,
        2 * error * (x**2),
        2 * error * (x**3),
        2 * error * (x**4),
        2 * error * (x**5),
        2 * error * (x**6),
        2 * error * (x**7)
    )).mean(-1)
    w -= gradients * LEARNING_RATE
    if t % 100 == 99:
        print("loss: ", loss)

loss:  9.34317819718455e+213
loss:  inf


  ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
  loss = (error ** 2).mean()
  ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)


loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan
loss:  nan


In [None]:
import torch

dtype = torch.float
device = torch.device("cuda")

N_SAMPLES = 4000
LEARNING_RATE = 1e-6
EPOCHES = 10
BATCH_SIZE = 5
ITERATIONS = int(N_SAMPLES / BATCH_SIZE)
LEARNING_RATE = 1e-6

x = torch.linspace(-torch.pi, torch.pi, N_SAMPLES, device=device, dtype=dtype)
y = torch.sin(x)
w = torch.randn(8, device = device, dtype = dtype)

for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = w[0] + w[1]*x + w[2]*x**2 + w[3]*x**3 + w[4]*x**4 + w[5]*x**5 + w[6]*x**6 + w[7]*x**7

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    w[1] -= LEARNING_RATE * grad_a
    w[1] -= LEARNING_RATE * grad_b
    w[1] -= LEARNING_RATE * grad_c
    w[1] -= LEARNING_RATE * grad_d


print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

In [60]:
# -*- coding: utf-8 -*-
import numpy as np
import math

# Create random input and output data
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)

# Randomly initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    # y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')

99 456.5595922183849
199 315.4104678017535
299 219.0011175503335
399 153.07271311223187
499 107.93648186418463
599 76.99960154421294
699 55.7708589044827
799 41.187234405293395
899 31.157349632267845
999 24.251606046242276
1099 19.49165216381009
1199 16.20717963733107
1299 13.938406765232727
1399 12.369598078338889
1499 11.283690531878376
1599 10.531289710703202
1699 10.009460658061258
1799 9.647202354268897
1899 9.395488211489502
1999 9.220429484460588
Result: y = -0.0188733346205499 + 0.8477607747912781 x + 0.0032559642756885367 x^2 + -0.09205305252653492 x^3
