# Linear Regression
In this section, we'll use linear regression to predict life expectancy from [body mass index (BMI)].
Before doing that, let's go over the tools required to build this model.

For our linear regression model, we'll be using scikit-learn's [LinearRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) class. This class provides the function [fit()](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit) to fit the model to our data.

```
>>> from sklearn.linear_model import LinearRegression
>>> model = LinearRegression()
>>> model.fit(x_values, y_values)
```

In the example above, the `model` variable is a linear regression model that has been fitted to the data `x_values` and `y_values`. Fitting the model means finding the best line that fits the training data. Let's make two predictions using the model's [predict()](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.predict) function.

```
>>> print(model.predict([ [127], [248] ]))
[[ 438.94308857, 127.14839521]]
```

The model returned an array of predictions, one prediction for each input array. The first input, `[127]`, got a prediction of `438.94308857`. The second input, `[248]`. got a prediction of `127.14839521`. The reason for predicting on an array like `[127]` and not just `127`, is because you can have a model that makes a prediction using multiple features. We'll go over using multiple variables in linear regression later. For now, let's stick to a single value.

### Linear Regression Exercise
In this example, we'll be working with data on the average life expectancy at birth and the average BMI for males across the world. The data comes from [Gapminder](https://www.gapminder.org/).

The data file "bmi_and_life_expectancy.csv" includes three columns, containing the following data:
* **Country** - The country the person was born in.
* **Life expectancy** - The average life expectancy at birth for a person in that coutry.
* **BMI** - The mean BMI of males in that coutry.

#### 0. Import necessary libraries

In [1]:
import pandas as pd
from sklearn.linear_model import LinearRegression

#### 1. Load the data
* The data is in the file bmi_and_life_expectancy.csv, located in data/
* Use pandas [read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) to load the data into a dataframe.
* Assign the dataframe to the variable `bmi_life_data`

In [2]:
bmi_life_data = pd.read_csv('data/bmi_and_life_expectancy.csv')

#### 2. Build a linear regression model
* Create a regression model using scikit-learn's [LinearRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) and assign it to `bmi_life_model`.

In [3]:
bmi_life_model = LinearRegression()

* Fit the model to the data.

In [4]:
bmi_life_model.fit(bmi_life_data[['BMI']], bmi_life_data[['Life expectancy']])

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

#### 3. Predict using the model
* Predict using a BMI of 21.07931 and assign it to the variable `laos_life_exp`.

In [5]:
laos_life_exp = bmi_life_model.predict([[21.07931]])

In [6]:
print(laos_life_exp)

[[60.31564716]]
