# Predict house price using Linear Regression

## 1 - Import necessary packages

Let's first import all the packages that you will need during this assignment.

- **numpy** is the main package for scientific computing with Python.
- **matplotlib** is a library to plot graphs in Python.
- **sklearn** features various algorithms of machine learning in Python.

In [None]:
import numpy as np
from sklearn import datasets
from sklearn import model_selection
from sklearn import metrics

import matplotlib.pyplot as plt
%matplotlib inline

## 2 - Load dataset

Let's load our sample dataset which is provided by sklearn.

In [None]:
bhouse = datasets.load_boston()

`dir()` method tries to return a list of valid attributes of the object.

In [None]:
dir(bhouse)

**DESCR**: str
<br>
The full description of the dataset.

**data**: ndarray of shape
<br>
The data matrix.

**feature_names**: ndarray
<br>
The names of the features.

**filename**: str
<br>
The physical location of boston csv dataset.

**target**: ndarray of shape
<br>
The regression target.

Check the number of data.

In [None]:
bhouse.target.shape

Check feature names

In [None]:
# TODO: Replace {} with your solution to check feature names
bhouse.{}

Load features and targets

In [None]:
# TODO: Replace {} with your solution to load features to "data"
{} = bhouse.data.astype(np.float32)

# TODO: Replace {} with your solution to load targets to "target"
{} = bhouse.target.astype(np.float32)

Import `train_test_split` from `sklearn.model_selection` to split the dataset into training set and test set.

In [None]:
from sklearn.model_selection import train_test_split

Split the dataset

In [None]:
# TODO: Replace {} with your solution to split the dataset into training and testing subset with ratio 7:3
X_train, X_test, y_train, y_test = {}split(data, target, test_size=0.{}, random_state=123)

## 3 - Linear Regression

Import `LinearRegression` from `sklearn.linear_model` to use the linear regression model.

In [None]:
from sklearn.linear_model import LinearRegression

Load linear regression model

In [None]:
# TODO: Replace {} with your solution to Load Linear Regression Model
model = {}Regression()

Fit the model with training data

In [None]:
# TODO: Replace {} with your solution to fit the training data into the model
model.{}(X_train, y_train)

Print out the coefficients

In [None]:
# TODO: Replace {} with your solution to print out the coefficients.
print(model.{})

> The coefficients are the weights.

In [None]:
print(len(model.coef_))

> There are 13 coefficients because there are 13 columns of features.

Print out the intercept.

In [None]:
# TODO: Replace {} with your solution to print out the intercept
print(model.{})

> The intercept of the equation is also the bias.

## 4 - Predict using the model

Predict the results by using the trained model

In [None]:
# TODO: Replace {} with your solution to predict the result using the traiined model
# Input data is the test dataset.
predictions = model.{}(X_test)

Plot the result.

In [None]:
plt.scatter(y_test, predictions)
plt.xlabel('Y Test')
plt.ylabel('Predicted Y')

## 5 - Evaluating the model

RMSE, MSE and MAE are the common metrics used to evaluate a regression model.
- MAE : Mean absolute error regression loss
- MSE : Mean squared error regression loss
- RMSE : Square root of Mean squared error regression loss

Print out the MAE, MSE and RMSE.

In [None]:
# TODO: Replace {} with your solution to print out MAE
print('MAE:', metrics.mean{}_error(y_test, predictions))

# TODO: Replace {} with your solution to print out MSE
print('MSE:', metrics.mean{}_error(y_test, predictions))

# TODO: Replace {} with your solution to print out RMSE
print('RMSE:', np.{}(metrics.mean_squared_error(y_test, predictions)))

# Exercise: Predict medical cost

> Dataset from Kaggle: [Medical Cost Personal Datasets](https://www.kaggle.com/mirichoi0218/insurance/notebooks)

## 1 - Load Dataset

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("data/insurance.csv")
df.head(10)

In [None]:
# TODO: Replace {} with your solution to check the shape of the dataset
df.{}

> This dataset has 1338 examples with 7 features

Make sure the dataset has no null value.

In [None]:
# TODO: Replace {} with your solution to check whether the dataset contains null value
df.{}().sum()

Change categorical data into binary.

In [None]:
dummies = pd.get_dummies(df[['sex', 'smoker', 'region']], drop_first=True) 
df.drop(['sex', 'smoker', 'region'], axis=1, inplace=True)
df = pd.concat([df, dummies], axis=1)
df.head(10)

Load the features into 'x' and the targets into 'y'.

In [None]:
x = df.drop(['charges'], axis=1)
y = df.charges

Split the dataset into training set and test set with ratio 7:3

In [None]:
# TODO: Replace {} with your solution to split the dataset into 70% training set and 30% test set
x_train, x_test, y_train, y_test = {}(x, y, test_size={}, random_state=123)

## 2 - Linear Regression

In [None]:
# TODO: Replace {} with your solution to load the linear regression model
model = {}

# TODO: Replace {} with your solution to fit the training data into model
model.fit({}, {})

Print the model's coefficients.

In [None]:
# TODO: Replace {} with your solution to print the coefficient
print(model.{})

Print the intercept.

In [None]:
# TODO: Replace {} with your solution to print the intercept
print(model.{})

## 3 - Evaluate Model

In [None]:
# # TODO: Replace {} with your solution to make prediction with test set
predictions = model.predict({})

Plot the results.

In [None]:
plt.scatter(y_test, predictions)
plt.xlabel('Y Test')
plt.ylabel('Predicted Y')

Print the model's score.

In [None]:
print(model.score(x_test, y_test))

Print the MAE, MSE, RMSE of the model.

In [None]:
# TODO: Replace {} with your solution to print out MAE
print('MAE:', metrics.{}(y_test, predictions))

# TODO: Replace {} with your solution to print out MSE
print('MSE:', metrics.{}(y_test, predictions))

# TODO: Replace {} with your solution to print out RMSE
print('RMSE:', np.{}(metrics.{}(y_test, predictions)))