# Gradient Descent Using Different Python Libraries

A Taylor series expansion of a function $f$ is defined as:

\begin{equation}

f(x) = \sum^N_{k=0}{\frac {f^{(k)}(x-a)}{n!}(x-a)^k}

\end{equation}

And the Taylor series expansion of sine function at $a=0$ is:

\begin{equation}

sin(x) = \sum^{N}_{k=0}{\frac{(-1)^{k}}{(2k+1)!} x^{2k+1}} = \frac{x^1}{1!} - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!}.....

\end{equation}

Suppose we somehow are not able to calculate the Taylor coefficients, which are the $\frac {(-1)^k}{(2k+1)!}$ terms, replacing the coefficients in the equation with unknown weights $w_{k}$, we can still calcualte them using the gradient descent method.

\begin{equation}
\hat{y} = \sum^{N}_{k=0}{w_{k}x^{2k+1}} = w_0 x^1 + w_1 x^3 + w_2 x^5 + w_3 x^7...
\end{equation}

<center><image src="img/TaylorSeriesNeuralNetwork.png"></center>

Use $L_2$ loss as the loss function, defined as:


\begin{align}
\text{Loss function } L = \sqrt{(\hat{y} - y)^2}
\end{align}

The gradient of a weight $w_i$ can be determined by the following formula:

\begin{align}
\frac{\partial L}{\partial w_i} &= \frac {1}{2} [(\hat{y} - y)^2]^{\frac {-1}{2}}  2(\hat{y}-y) \frac{\partial \hat{y}}{\partial w_i} = \frac{\partial \hat{y}}{\partial w_i} \\
\frac{\partial L}{\partial w_0} &= x \\
\frac{\partial L}{\partial w_1} &= x^3 \\
\frac{\partial L}{\partial w_2} &= x^5 \\
\frac{\partial L}{\partial w_3} &= x^7 \\
\end{align}

In [57]:
import numpy as np
from collections import  deque

SAMPLE_COUNT = 6000
x = np.linspace(0, np.pi, SAMPLE_COUNT)
y = np.sin(x)

w = np.random.rand(4)

EPOCHES = 8
LEARNING_RATE = 1e-6
BATCH_SIZE = 1
ITERATIONS = int(SAMPLE_COUNT / BATCH_SIZE)

for epoch in range(EPOCHES):
    for iteration in range(ITERATIONS):
        batch = x[iteration*BATCH_SIZE : (iteration+1)*BATCH_SIZE]

        y_predicted = deque(map(
            lambda sample : (w[0]*sample + w[1]*(sample**3) + w[2]*(sample**5) + w[3]*(sample**7)).sum(),
            batch
        ))

        predict_error = y_predicted - y[iteration*BATCH_SIZE : (iteration+1)*BATCH_SIZE]
        loss = np.square(predict_error).mean()

        gradients = np.array((
            2*predict_error * batch ** 1,
            2*predict_error * batch ** 3,
            2*predict_error * batch ** 5,
            2*predict_error * batch ** 7)).mean()

        w += gradients*LEARNING_RATE

        if iteration % 100 == 0:
            print(f"Iteration {iteration} loss: {loss}")



Iteration 0 loss: 0.0
Iteration 100 loss: 2.6759906724882334e-05
Iteration 200 loss: 0.0001252901113645463
Iteration 300 loss: 0.0003584273820910932
Iteration 400 loss: 0.000859314584315562
Iteration 500 loss: 0.0018797890075463966
Iteration 600 loss: 0.003872087887272713
Iteration 700 loss: 0.007623983495333534
Iteration 800 loss: 0.014483409758998972
Iteration 900 loss: 0.026735221047669927
Iteration 1000 loss: 0.04823659670783413
Iteration 1100 loss: 0.08548941566164665
Iteration 1200 loss: 0.1494442939032218
Iteration 1300 loss: 0.2585171448006268
Iteration 1400 loss: 0.4435923509672864
Iteration 1500 loss: 0.7562404950379013
Iteration 1600 loss: 1.282068265134208
Iteration 1700 loss: 2.1621469362028654
Iteration 1800 loss: 3.626973131183707
Iteration 1900 loss: 6.049586646011822
Iteration 2000 loss: 10.02754817843819
Iteration 2100 loss: 16.50778113638307
Iteration 2200 loss: 26.974215153486252
Iteration 2300 loss: 43.72626134391463
Iteration 2400 loss: 70.28707970403683
Iteration