# Linear Regression
In this section, you'll use linear regression to predict life expectancy from body mass index (BMI). Before you do that, let's go over the tools required to build this model.

For your linear regression model, you'll be using scikit-learn's LinearRegression class. This class provides the function fit() to fit the model to your data.

> from sklearn.linear_model import LinearRegression

> model = LinearRegression()

> model.fit(x_values, y_values)

In the example above, the model variable is a linear regression model that has been fitted to the data x_values and y_values. Fitting the model means finding the best line that fits the training data. Let's make two predictions using the model's predict() function.

> print(model.predict([ [127], [248] ]))

> [[ 438.94308857, 127.14839521]]

The model returned an array of predictions, one prediction for each input array. The first input, [127], got a prediction of 438.94308857. The second input, [248], got a prediction of 127.14839521. The reason for predicting on an array like [127] and not just 127, is because you can have a model that makes a prediction using multiple features. We'll go over using multiple variables in linear regression later in this lesson. For now, let's stick to a single value.

# Linear Regression Quiz
In this quiz, you'll be working with data on the average life expectancy at birth and the average BMI for males across the world. The data comes from Gapminder.

The data file can be found under the "bmi_and_life_expectancy.csv" tab in the quiz below. It includes three columns, containing the following data:

Country – The country the person was born in.
Life expectancy – The average life expectancy at birth for a person in that country.
BMI – The mean BMI of males in that country.
You'll need to complete each of the following steps:
## 1. Load the data

The data is in the file called "bmi_and_life_expectancy.csv".
Use pandas read_csv to load the data into a dataframe (don't forget to import pandas!)
Assign the dataframe to the variable bmi_life_data.
## 2. Build a linear regression model

Create a regression model using scikit-learn's LinearRegression and assign it to bmi_life_model.
Fit the model to the data.
## 3. Predict using the model

Predict using a BMI of 21.07931 and assign it to the variable laos_life_exp.

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# Load the data
bmi_life_data = pd.read_csv('BMI_and_Life_expectancy.csv') 

In [2]:
bmi_life_data.head()

Unnamed: 0,Country,Life expectancy,BMI
0,Afghanistan,52.8,20.62058
1,Albania,76.8,26.44657
2,Algeria,75.5,24.5962
3,Andorra,84.6,27.63048
4,Angola,56.7,22.25083


In [3]:
x_values = bmi_life_data[['BMI']]
y_values = bmi_life_data[['Life expectancy']]
print(x_values.shape, y_values.shape)

(129, 1) (129, 1)


In [4]:
# Fit the model and Assign it to bmi_life_model
bmi_life_model = LinearRegression()
bmi_life_model.fit(x_values, y_values)

# Mak a prediction using the model. Predict life expectancy for a BMI value of 21.07931
# Scikit does not work with scalars (just one single value). 
# It expects a shape (m×n) where m is the number of features and n is the number of observations, both are 1 in your case.
# https://stackoverflow.com/questions/54296377/valueerror-expected-2d-array-got-scalar-array-instead
laos_life_exp = bmi_life_model.predict(np.array([21.07931]).reshape(1, 1))

In [5]:
laos_life_exp

array([[60.1926397]])

In [6]:
# Convert back the array into a float
laos_life_exp.tolist()[0][0]

60.19263970441605