# Implementation of Linear Regression with TensorFlow

TensorFlow is an industry- grade open-source framework for automating the repetitive work of implementing gradient-based learning algorithms.

In the previous implementation from scratch, we relied only on:
1. Tensors for data storage and linear algebra;
2. Auto differentiation for calculating gradients.

In practice, because data iterators, loss functions, optimizers,
and neural network layers
are so common, TensorFlow libraries implement and optimize these functions for us!

In this section, you learn how to implement the linear regression model concisely by using the `keras` high-level APIs in TensorFlow.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Implementation-of-Linear-Regression-with-TensorFlow" data-toc-modified-id="Implementation-of-Linear-Regression-with-TensorFlow-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Implementation of Linear Regression with TensorFlow</a></span><ul class="toc-item"><li><span><a href="#Generating-the-Dataset" data-toc-modified-id="Generating-the-Dataset-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Generating the Dataset</a></span></li><li><span><a href="#Reading-the-Dataset" data-toc-modified-id="Reading-the-Dataset-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Reading the Dataset</a></span></li><li><span><a href="#Defining-the-Model" data-toc-modified-id="Defining-the-Model-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Defining the Model</a></span></li><li><span><a href="#Initializing-Model-Parameters" data-toc-modified-id="Initializing-Model-Parameters-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Initializing Model Parameters</a></span></li><li><span><a href="#Defining-the-Loss-Function" data-toc-modified-id="Defining-the-Loss-Function-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Defining the Loss Function</a></span></li><li><span><a href="#Defining-the-Optimization-Algorithm" data-toc-modified-id="Defining-the-Optimization-Algorithm-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Defining the Optimization Algorithm</a></span></li><li><span><a href="#Training" data-toc-modified-id="Training-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Training</a></span></li><li><span><a href="#Summary" data-toc-modified-id="Summary-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Summary</a></span></li><li><span><a href="#Exercises" data-toc-modified-id="Exercises-1.9"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Exercises</a></span></li></ul></li></ul></div>

## Generating the Dataset

Let us generate a synthetic dataset, according to a linear model in a similar way to the previous case-study.


In [None]:
import numpy as np
import tensorflow as tf
from dl import tensorflow as dl

In [None]:
true_w = tf.constant([2, -3.4])
true_b = 4.2
features, labels = dl.synthetic_data(true_w, true_b, 1000)

## Reading the Dataset

Instead of invoking our own iterator (`data_iter` function),
we invoke the existing TensorFlow API to read data from tensor and construct a TensorFlow data iterator.


The `load_array` function takes `features` and `labels` as arguments, specifies `batch_size` and  instantes a data iterator object.

Note that the boolean value `is_train` indicates whether or not
the data iterator object has to shuffle the data on each epoch.



In [None]:
def load_array(data_arrays, batch_size, is_train=True):  #@save
    """Construct a TensorFlow data iterator."""
    dataset = tf.data.Dataset.from_tensor_slices(data_arrays)
    if is_train:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(batch_size)
    return dataset

In [None]:
batch_size = 10
data_iter = load_array((features, labels), batch_size)

Now we use `data_iter` variable in much the same way as we called
the `data_iter` function in the previous case-study. 

To verify that the `data_iter` is working properly, we read and print
the first minibatch of examples.

In [None]:
next(iter(data_iter))

In this code, we use `iter` to construct a Python iterator and use `next` to obtain the first item from the iterator.


## Defining the Model

In our implementation of linear regression from scratch
we defined our model parameters explicitly
and coded up the calculations to produce output
using basic linear algebra operations.


It is important that you understand and become familiar with this implementation. Doing it once or twice is rewarding and instructive.

However, in real-world problems your models get more complex. For such standard operations, we use TensorFlow's predefined layers,
which allow us to focus on the layers instead of low level details.

To this end, we use the Keras API in TensorFlow, which provide nencessary modules such as `net`, `Sequential`, `MeanSquaredError` etc.

Using Keras, we define a model variable `net`,
which will refer to an instance of the `Sequential` class. The `Sequential` class chains several layers where the output of one layer is the input of the next layer and so forth.


Note that the layer is said to be *fully-connected*
because each of its inputs is connected to each of its outputs
by means of a matrix-vector multiplication.

In Keras, the `Dense` class allows us to specify a fully-connected layer is. Since we only want to generate a single scalar output, we set that number to 1.

For convenience, Keras does not require us to specify the input shape for each layer. For this reason, we do not specify how many inputs go into this linear layer. In fact, when we first pass data through our model by executing the `net(X)` function, Keras will automatically infer the number of inputs to each layer.

In the following, we describe with details how to build the model concisely.

In [None]:
# `keras` is the high-level API for TensorFlow
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(1))

## Initializing Model Parameters

To initialize the model parameters, namely the weights and bias in the linear regression model, we specify that each weight parameter
should be randomly sampled from a normal distribution with mean 0 and standard deviation 0.01. In addition, we initialize the bias parameter with zeros.

Deep learning frameworks often have a predefined function to initialize parameters the model parameters, namely the weights and bias. TensorFlow has the `initializers` module which includes various methods for model parameter initialization. 

In Keras,the easiest way to specify the initialization method is to create the layer by specifying `kernel_initializer`. 

In the following snippet code we recreate the `net` function with Keras initialization.

In [None]:
initializer = tf.initializers.RandomNormal(stddev=0.01)
net = tf.keras.Sequential()
net.add(tf.keras.layers.Dense(1, kernel_initializer=initializer))

If you look closely to the code above you you should note
that the parameters initialization is performed even though Keras does not yet know how many dimensions the input will have!

You might ask whether it should be 2 as in our example or it might be 2000.
In fact, Keras lets us get away with the dimensions because behind the scenes, the initialization is actually *deferred* and will only take place 
when we first time pass data through the network.

Since the parameters have not been initialized yet you cannot access or manipulate them.

## Defining the Loss Function


In Keras, the `MeanSquaredError` class computes the mean squared error (a.k. squared $L_2$ norm) and returns the average loss over examples.


In [None]:
loss = tf.keras.losses.MeanSquaredError()

## Defining the Optimization Algorithm


In Keras, the `optimizers` module implements the Minibatch stochastic gradient descent and many of its variations for optimizing neural networks.


The Minibatch stochastic gradient descent takes the argument, `learning_rate`, which we set it to the value of 0.03.


In [None]:
trainer = tf.keras.optimizers.SGD(learning_rate=0.03)

## Training

Building neural networks with high-level APIs of a deep learning framework
requires few lines of code. Many details are abstracted by the framework modules to allocate parameters,define loss functions, and implement minibatch stochastic gradient descent.


All basic components are now defined and initialized. The training loop itself is similar to what we did when implementing the regression from scratch.

For every epoch, we pass over the dataset (`train_data`) by iteratively grabbing one minibatch of inputs at a time and its corresponding ground-truth labels to compute the gradient.


For each minibatch, we perform the following step:
* Forward propagation
    * Generate predictions by calling `net(X)` 
    * Calculate the loss `l`().

* Backpropagation
    * Calculate gradients by running the backpropagation.
    * Update the model parameters by invoking our optimizer.

In the following code, we compute and print the loss after each epoch.

In [None]:
num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        with tf.GradientTape() as tape:
            l = loss(net(X, training=True), y)
        grads = tape.gradient(l, net.trainable_variables)
        trainer.apply_gradients(zip(grads, net.trainable_variables))
    l = loss(net(features), labels)
    print(f'epoch {epoch + 1}, loss {l:f}')

The following code compares the model parameters learned by training on finite data and the actual parameters that synthetically generated our dataset.

To to do so, we have to access the model's parameters. The first step is to access the layer that we need from `net` and then access that layer's weights and bias.

Unsurprisingly,our estimated parameters are close to their ground-truth counterparts. We obtained similar results as in the from-scratch implementation.


In [None]:
w = net.get_weights()[0]
print('error in estimating w', true_w - tf.reshape(w, true_w.shape))
b = net.get_weights()[1]
print('error in estimating b', true_b - b)

## Summary


* TensorFlow's high-level APIs (Keras) allows us to implement models quickly.
* The `data` module in TensorFlow provides tools for data processing.
* The `keras` module provides a large number of neural network layers and common loss functions.
* The TensorFlow's module `initializers` provides various methods for model parameter initialization.
* Dimensionality and storage are automatically inferred.
* Parameters cannot be accessed before they have been initialized.


## Exercises


1. Review the TensorFlow documentation to see what loss functions and initialization methods are provided.
2. Replace the loss by Huber's loss and rerun the model