# 3 Linear Regression - Curve Fitting (TensorFlow)
Two methods are used to fit a curve in this tutorial, using [TensorFlow](https://www.tensorflow.org/):
- Direct solution using least-squares method - this is the same method used in the previous tutorial that uses NumPy 
- Iterative optimisation using stochastic gradient descent

## 3.1 Data
First, we sample $n$ observed data from the underlying polynomial defined by weights $w$:

In [None]:
import random
import numpy as np

# get ground-truth data from the "true" model 
n = 100 
w = [4, 3, 2, 1]
x = np.linspace(-1,1,n)[:,np.newaxis]
t = np.matmul(np.power(np.reshape(x,[-1,1]), 
                       np.linspace(len(w)-1,0,len(w))), w)
std_noise = 0.2
t_observed = np.reshape(
    [t[idx]+random.gauss(0,std_noise) for idx in range(n)],
    [-1,1])

## 3.2 Computation Graph and Session
[Graphs and sessions](https://www.tensorflow.org/guide/graphs) are important features of TensorFlow. In most simple terms, a graph needs to be built to specify what computations are; then sessions are constructed to specify what computation to run, for example, what data to use and in what order. To facilitate the data feeding, [*placeholders*](https://www.tensorflow.org/api_docs/python/tf/placeholder) are used. The following two methods to fit the model provide two examples how these are used in practice.

First, we build a computation graph using "tf functions":

In [None]:
import tensorflow as tf


# placeholders are for feeding data in runtime
ph_x = tf.placeholder(tf.float32, [n, 1])
ph_t = tf.placeholder(tf.float32, [n, 1])

deg = 3
node_X = tf.pow(ph_x, tf.linspace(tf.to_float(deg),0,deg+1))

This above is a very simple computation graph to evaluate the polynomial using TensorFlow functions. This can be built without any real data and there has not been any computation taking place either.  

Then we construct a session. And, call the run method to evaluate the node *node_X* to actually run the computation and obtain the results.

In [None]:
# build a session
sess = tf.Session()  

# set an example data feed
dataFeed = {ph_x:x} 

# run the session to evaluate the node weights
X = sess.run(node_X, feed_dict=dataFeed)
print(X[:n:10,])

sess.close()

## 3.3 Least-Squares Solution
This is mathematcally the same method used in previous NumPy tutorial. The advantage using TensorFlow here is not particularly obvious.

In [None]:
# completing the computation graph with the least-square solution
node_w = tf.matrix_solve_ls(node_X, ph_t)

# run the session to evaluate the node weights
sess = tf.Session()  
dataFeed = {ph_x:x, ph_t:t_observed}  # feed data
w_lstsq = sess.run(node_w, feed_dict=dataFeed)
print(w_lstsq)

sess.close()

## 3.3 Stochastic Gradient Descend Method
Instead of least-squares, weights can be optimised by minimising a loss function between the predicted- and observed target values, using [SGD](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). It is not an efficient method for this curve fitting problem, is only for the purpose of demonstrating how an iterative method can be implemented in TensorFlow.

In [None]:
# build a new graph
ph_1x = tf.placeholder(tf.float32, [1, 1])
ph_1t = tf.placeholder(tf.float32, [1, 1])

deg = 3
node_X = tf.pow(ph_1x, tf.linspace(tf.to_float(deg),0,deg+1))

# first declare variables that need optimisation
var_w = tf.get_variable('weights', shape=[deg+1,1], 
                        initializer=tf.random_normal_initializer(0, 1e-3))

# completing the computation graph with SGD
node_1t = tf.matmul(node_X, var_w)
# building a square loss
loss = tf.reduce_mean(tf.square(node_1t-ph_1t))
# buiding a train-op to minimise the loss
train_op = tf.train.GradientDescentOptimizer(learning_rate=1e-1).minimize(loss)

# launch a session
sess = tf.Session()  
sess.run(tf.global_variables_initializer())  # initialise all the variables

# iteration to update variables with backprop gradients
total_iter = int(1e4)
indices_train = [i for i in range(n)]
for step in range(total_iter):

    idx = step % n
    if idx == 0:  # shuffle every epoch
        random.shuffle(indices_train)
    
    # single data point feed
    singleDataFeed = {
        ph_1x:x[indices_train[idx],np.newaxis], 
        ph_1t:t_observed[indices_train[idx],np.newaxis] }
    
    # update the variables
    sess.run(train_op, feed_dict=singleDataFeed)
    
    # print training information
    if (step % 200) == 0:
        loss_train = sess.run(loss, feed_dict=singleDataFeed)
        print('Step %d: Loss=%f' % (step, loss_train))
    if (step % 2000) == 0:
        w_sgd = sess.run(var_w)
        print('Estimated weights:')
        print(w_sgd)

w_sgd = sess.run(var_w)
print('Final weights at step %d:' % step)
print(w_sgd)
sess.close()

## Questions
- Try other optimisation hyperparameters, such as different optimiser, learning rate, number of iterations.
- Try add regularisers and different loss functions.
- Would batch gradient descent or minibatch gradient descent improve the optimisation?
- Would higher-degree models more prone to overfitting?
