<link rel='stylesheet' href='../assets/css/main.css'/>

[<< back to main index](../README.md)

# Linear Regression in using Tensorflow 

### Overview
Instructor to demo this on screen.
 
### Builds on
None

### Run time
approx. 20-30 minutes

### Notes

We can do linear regression with ordinary. 

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.formula.api as sm
import matplotlib.pyplot as plt
import tensorflow as tf


## Example : Tips
Here is our tip data.  This shows 10 observations of bill with tip amounts.

| bill | tip | 
|------|-----| 
| 50   | 12  | 
| 30   | 7   | 
| 60   | 13  | 
| 40   | 8   | 
| 65   | 15  | 
| 20   | 5   | 
| 10   | 2   | 
| 15   | 2   | 
| 25   | 3   | 
| 35   | 4   | 

## Step 1: Let's create a Pandas dataframe with the data


In [None]:
tip_data = pd.DataFrame({'bill' : [50.00, 30.00, 60.00, 40.00, 65.00, 20.00, 10.00, 15.00, 25.00, 35.00],
              'tip' : [12.00, 7.00, 13.00, 8.00, 15.00, 5.00, 2.00, 2.00, 3.00, 4.00]
             })

tip_data


## Step 2: Let's do a quick plot of the data

Let us use matplotlib to do a quick scatter plot of the data.

**=>TODO: plot the bill (X-axis), versus the tip (Y-axis)**

In [None]:
plt.scatter(tip_data.bill, tip_data.tip)
plt.ylabel('tip')
plt.xlabel('bill')
plt.show()

In [None]:
x = tip_data['bill'].values
y = tip_data['tip'].values
print('bill = ' + str(x))
print('tip = ' + str(y))


## Step 4: Define Feature Columns

We need to define our feature columns

In [None]:
feature_columns = [ 
    tf.feature_column.numeric_column(key="bill"),
]

## Step 5: Define Optimize and Model

In [None]:
my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0002)
linear_regressor = tf.estimator.LinearRegressor(feature_columns=feature_columns, optimizer=my_optimizer)

In [None]:
def input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
    features = {key : np.array(value) for key,value in dict(features).items()}
    
    ds = tf.data.Dataset.from_tensor_slices((features, targets))
    ds = ds.batch(batch_size).repeat(num_epochs)
    
    features, labels = ds.make_one_shot_iterator().get_next()
    return features, labels

In [None]:
training_input_fn = tf.estimator.inputs.pandas_input_fn(x = tip_data[['bill']],
                                                        y=tip_data['tip'],
                                                        batch_size=1,
                                                        shuffle= True,
                                                        num_epochs = 1)
linear_regressor.train(training_input_fn)

In [None]:
linear_regressor.train(input_fn=lambda:training_input_fn(tip_data[['bill']], tip_data['tip'] ), steps=20)

## Step: Create the prediction function

In [None]:
prediction_input_fn = lambda: input_fn(tip_data[['bill']], tip_data['tip'], num_epochs=1, shuffle=False)

In [None]:
predictions = linear_regressor.predict(input_fn=prediction_input_fn)

predictions = np.array( [item['predictions'][0] for item in predictions])

In [None]:
from sklearn.metrics import *
import math

mse = mean_squared_error(predictions, tip_data['tip'])
rmse = math.sqrt(mse)

print("MSE: %0.3f" % mse)

In [None]:
weight = linear_regressor.get_variable_value('linear/linear_model/bill/weights')[0]
bias = linear_regressor.get_variable_value('linear/linear_model/bias_weights')
print(weight)
print(bias)

## Step 7: Plot the fit line (abline)

**=>TODO: Do a scatterplot of bill versus tip **

In [None]:
# Create a list of values in the best fit line
# TODO: Fill in gradient and intercept in formula
abline_values = [??? * i + ??? for i in tip_data.bill]

# Plot the best fit line over the actual values
plt.scatter(tip_data.bill, tip_data.tip)
plt.plot(tip_data.bill, abline_values, 'b')
plt.ylabel('tip')
plt.xlabel('bill')
plt.title("Fit Line")
plt.show()

In [None]:
# Make a prediction using the slope and the intercept
y_pred = tip_data.bill.values.reshape(10,1)
y_test = tip_data.tip.values.reshape(10,1)




## Step 8: Print out the Outputs

Here is a sample output:

## Step 9: Plot the residuals

Residuals are the error, or difference between the model predicted and model actual.  We'd like these to be as small as possible, with residuals roughly balanced.   We don't want a model that consistently predicts values too high or too low.

**=>TODO: do a plot of the bill (x-value) versus residuals (y-value) **

In [None]:
resid = (??? * tip_data.bill + ???) - tip_data.tip

In [None]:
plt.scatter(tip_data.bill, resid)
plt.axhline(y=0, color='r', linestyle='-')  # horizon
plt.ylabel('Residuals')
plt.xlabel('bill')
plt.title("Residuals")
plt.show()




## Step 10 : Identify Coefficients

### Intercept and Slope
We can see them from output

Coefficients:
            Estimate 
(Intercept) -0.8217112049846651
bill        0.226334605857

- **Slope** (of line) : ** 0.226334605857**
- **Intercept** (where line meets Y-axis) : **-0.8217112049846651**  (below zero line)

We can also get these programatically.  
If `tip = a * amount + b`

In [None]:

# Print the coefficients and intercept for linear regression
print("Coefficients: %s" % str(weight))
print("Intercept: %s" % str(bias))

a = weight
b = bias


**==>  Question : Does bill amount influence tip amount? (are they strongly linked?) **




### TODO: Calcluate Tip for 100 bill.

In [None]:
tip_for_100 = ??? * 100 + ??? 
print(tip_for_100)


## Step 12: Add a estimated_tip column to pandas dataframe

**=>TODO: create a new pandas column called est_tip **

In [None]:
tip_data['est_tip'] =  ???
tip_data

## Step 13: Perform a prediction


Let's try to run a prediction on some data: $45.00, $55.00, and $65.00 


**=>TODO: use model to transform dataframe with feature vectors to make predictions **

In [None]:
test_data_pd = pd.DataFrame({'bill' : [45., 55., 65.,]
             })

test_data_pd['predicted_tip'] = result.predict(test_data_pd)



test_data_pd