# 3 Curve Fitting (TensorFlow)
Two methods are used to fit a curve in this tutorial, using [TensorFlow](https://www.tensorflow.org/):
- Direct solution using least-squares method - this is the same method used in the previous tutorial that uses NumPy 
- Iterative optimisation using stochastic gradient descent

## 3.1 Data
First, we sample $n$ observed data from the underlying polynomial defined by weights $w$:

In [9]:
import random
import numpy as np

# get ground-truth data from the "true" model 
n = 100 
w = [4, 3, 2, 1]
deg = len(w)-1
x = np.linspace(-1,1,n)[:,np.newaxis]
t = np.matmul(np.power(np.reshape(x,[-1,1]), 
                       np.linspace(deg,0,deg+1)), w)
std_noise = 0.2
t_observed = np.reshape(
    [t[idx]+random.gauss(0,std_noise) for idx in range(n)],
    [-1,1])

## 3.2 Least-Squares Solution
This is mathematcally the same method used in previous NumPy tutorial. The advantage using TensorFlow here is not particularly obvious.

In [20]:
import tensorflow as tf

X = tf.pow(x, tf.linspace(deg,0,deg+1))

w_lstsq = tf.linalg.lstsq(X, t_observed)
print(w_lstsq)

tf.Tensor(
[[4.05248568]
 [2.95396288]
 [1.956563  ]
 [1.00741483]], shape=(4, 1), dtype=float64)


## 3.3 Stochastic Gradient Descend Method
Instead of least-squares, weights can be optimised by minimising a loss function between the predicted- and observed target values, using [SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). It is not an efficient method for this curve fitting problem, is only for the purpose of demonstrating how an iterative method can be implemented in TensorFlow.

In [79]:
w_sgd = tf.Variable(initial_value=tf.zeros([deg+1,1],tf.float64))
polynomial = lambda x_input : tf.matmul(x_input, w_sgd)

optimizer = tf.optimizers.SGD(5e-3)
total_iter = int(2e4)
for step in range(total_iter):
    index = step % n
    
    with tf.GradientTape() as g:
        loss = tf.reduce_mean((polynomial(X[None,index,:])-t_observed[index])**2) #MSE
        
    gradients = g.gradient(loss, [w_sgd])
    optimizer.apply_gradients(zip(gradients, [w_sgd]))
    
    if (step%1000)==0:
        print('Step %d: Loss=%f' % (step, loss))

print(w_sgd)

Step 0: Loss=4.267816
Step 1000: Loss=0.037259
Step 2000: Loss=0.042375
Step 3000: Loss=0.133096
Step 4000: Loss=0.154306
Step 5000: Loss=0.136392
Step 6000: Loss=0.108063
Step 7000: Loss=0.081463
Step 8000: Loss=0.060006
Step 9000: Loss=0.043758
Step 10000: Loss=0.031807
Step 11000: Loss=0.023136
Step 12000: Loss=0.016880
Step 13000: Loss=0.012372
Step 14000: Loss=0.009123
Step 15000: Loss=0.006774
Step 16000: Loss=0.005071
Step 17000: Loss=0.003832
Step 18000: Loss=0.002925
Step 19000: Loss=0.002258
<tf.Variable 'Variable:0' shape=(4, 1) dtype=float64, numpy=
array([[3.99621521],
       [2.96221558],
       [1.99494511],
       [1.00508842]])>


## Questions
- Try other optimisation hyperparameters, such as different optimiser, learning rate, number of iterations.
- Try add regularisers and different loss functions.
- Would batch gradient descent or minibatch gradient descent improve the optimisation?
- Would higher-degree models more prone to overfitting?
