## Linear Regression 

**Linear Regression** is a statistical method that is used to construct a line that accurately models the relationship between two variables. This line - represented by a linear equation of the form *y = mx + b*, can be used to predict the value of the dependent variable *y* given a value of the independent variable *x*.

Let us consider the relationship between the *height* and *weight* of a given person. We expect that height and weight will be correlated - taller people generally (though not always) weigh more than shorter people. We begin with a list of heights and weights of ten (fictitious) people:

In [3]:
import numpy

heights_in_cm = numpy.array([182, 150, 197, 164, 171, 155, 187, 148, 162, 168]).reshape(-1, 1)
weights_in_kg = numpy.array([81, 55, 90, 60, 65, 57, 86, 52, 61, 62])

Before attempting to construct a model for this data, we should visualize it and form a qualitative understanding of its form:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(heights_in_cm, weights_in_kg)
plt.title('Weight vs. Height')
plt.show()

The data looks roughly linear, so a linear regression should yield an accurate model. We will now perform linear regression using a function from the `sklearn` module. This will give us a linear model of our data:

In [None]:
from sklearn import linear_model;
linear = linear_model.LinearRegression()
linear.fit(heights_in_cm, weights_in_kg)
print("Linear model: y = {}x + {}".format(linear.coef_, linear.intercept_))

We can graph this line to see how well it fits our data:

In [None]:
predicted_weights_in_kg = linear.predict(heights_in_cm)
plt.plot(heights_in_cm, predicted_weights_in_kg)
plt.scatter(heights_in_cm, weights_in_kg)
plt.title('Weight vs. Height')
plt.show()

The power of regression is not drawing graphs; it is that we can make **predictions**. 
For example, what is the most likely weight for someone who is 178cm in height?

In [10]:
predicted_weight = linear.predict(178)
print(predicted_weight)


[74.77637849]


### Exercise: Linear Regression

Let's revisit the account balance vs income dataset from the previous lesson. 

In [1]:
bank_account_balance = [100000, 130000, 40000, 50000, 120000, 48000, 50000,78000, 150000]
avg_income_zip = [80000, 90000, 40000,45000, 85000,36000,32000,50000,100000]

Perform a linear regression on this dataset, and plot both the raw data and the fitted line. Use a different color for each one. 

Now, write a function that accepts a parameter `client_zip_avg_income` and returns a single value - the predicted bank account balance of the user, based on the output of the model. 

### Exercise: Productionizing Machine Learning

Now that we have a predictive model, our engineers should incorporate that model into 
our production code. Using the balance prediction function above, write a new higher-level function called `accept_client` with the following traits: 

1. It accepts three parameters: `client_zip_avg_income`, `interest_rate`, and `maintenance_cost`
2. It returns `True` if the interest earnings from the predicted account balance exceed the maintenance costs, `False` otherwise. 