# Simulated Linear Regression

## Abstract

In order to understand what TensorFlow can do, here is a little demo that makes up some phony data following a certain rulw, and then fits a line to it using Linear Regression. In the end, we expect that TensorFlow will be able to find out the parameters used to make up the phony data.

Linear Regression is a Machine Learning algorithm that models the relationship between a dependent variable and one or more independent variables.

## Introduction

This tutorial is taken, with slight modification and different annotations, from [TensorFlow's official documentation](https://www.tensorflow.org/versions/r0.11/get_started/index.html) and [Professor Jordi Torres' _First Contact with TensorFlow_](http://www.jorditorres.org/first-contact-with-tensorflow/).

This tutorial is intended for readers who are new to both machine learning and TensorFlow. 

## Data Preparation

Let's first start by creating 1000 phony x, y data points. In order to accomplish this, we will use NumPy. NumPy is an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these.

In particular, we will take advantage of the _numpy.random.normal()_ function, which draws random samples from a normal (Gaussian) distribution (also called the bell curve because of its characteristic shape). The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution

The rule that our phony data points will follow is:

_y = x * 0.1 + 0.3_

To this, we will add an "error" following a normal distribution.

In [13]:
import numpy as np

num_points = 1000
vectors_set = []
for i in range(num_points):
         x1= np.random.normal(0.0, 0.55)
         y1= x1 * 0.1 + 0.3 + np.random.normal(0.0, 0.03)
         vectors_set.append([x1, y1])

x_data = [v[0] for v in vectors_set]
y_data = [v[1] for v in vectors_set]

## Data Analysis

Linear Regression models can be represented with just two parameters: _W_ (the slope) and _b_ (the y-intercept).

We want to generate a TensorFlow algorithm to find the best parameters _W_ and _b_ that from input data x_data describe the underlying rule.

First, let's begin by defining two _Variable_ ops: one for the slope and one the y-intercept.

In [14]:
import tensorflow as tf

In [15]:
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))

Then, let's use two other ops for describing the relationship between x_data , _W_ and _b_, that is the linear function (first degree polynomial).

In [16]:
y = tf.add(tf.mul(x_data, W), b) # W * x_data + b

In order to find the best _W_ and _b_, we need to minimize the mean squared error between the predicted _y_ and the actual y_data. The way we accomplish this is using a Gradient Descent Optimizer.

In [17]:
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

Before starting, initialize the variables.  We will 'run' this first.

In [18]:
init = tf.initialize_all_variables()

Then, we launch the graph.

In [19]:
sess = tf.Session()
sess.run(init)

Now, fit the line. In order to do this, let's iterate 200 times (epochs) on the training data.

In [20]:
for step in range(200):
    sess.run(train)

Finally, let's see if TensorFlow learned that the best fit is near W: [0.1], b: [0.3] (because, in our example, the input data were "phony", contained some noise: the "error")

In [21]:
print(sess.run(W), sess.run(b))

[ 0.0989345] [ 0.30105537]
