# Case: Boston Housing Price Forecast

## Introduction to case background

- Data introduction (omitted)

## Analysis of the Case  
- Data segmentation and standardization  
- Regression prediction  
- Effect evaluation of the Linear regression algorithm 

##  Evaluation of regression performance  
- Mean Squared Error (MSE)
- sklearn.metrics.mean_squared_error(y_true, y_pred)  

##  Implementation

1. Load the data set
2. Basic data processing ( Data segmentation)
3. Feature engineering ( Data standardization )
4. Machine learning
5. Dump and load the model
5. Model evaluation


### Import the module

In [2]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge,RidgeCV
from sklearn.metrics import mean_squared_error
import joblib

### Load the data set

In [3]:
data = load_boston()

In [4]:
type(data)

sklearn.utils.Bunch

### Basic data processing ( Data segmentation)

In [5]:
x_train, x_test, y_train, y_test = train_test_split(data.data, 
                                                    data.target,
                                                   random_state = 22)

In [6]:
x_train[:4]

array([[1.80028e+00, 0.00000e+00, 1.95800e+01, 0.00000e+00, 6.05000e-01,
        5.87700e+00, 7.92000e+01, 2.42590e+00, 5.00000e+00, 4.03000e+02,
        1.47000e+01, 2.27610e+02, 1.21400e+01],
       [2.14090e-01, 2.20000e+01, 5.86000e+00, 0.00000e+00, 4.31000e-01,
        6.43800e+00, 8.90000e+00, 7.39670e+00, 7.00000e+00, 3.30000e+02,
        1.91000e+01, 3.77070e+02, 3.59000e+00],
       [2.99160e-01, 2.00000e+01, 6.96000e+00, 0.00000e+00, 4.64000e-01,
        5.85600e+00, 4.21000e+01, 4.42900e+00, 3.00000e+00, 2.23000e+02,
        1.86000e+01, 3.88650e+02, 1.30000e+01],
       [1.88110e+01, 0.00000e+00, 1.81000e+01, 0.00000e+00, 5.97000e-01,
        4.62800e+00, 1.00000e+02, 1.55390e+00, 2.40000e+01, 6.66000e+02,
        2.02000e+01, 2.87900e+01, 3.43700e+01]])

### Feature engineering (Data standardization)

In [7]:
transfer = StandardScaler()

In [8]:
x_train = transfer.fit_transform(x_train)

In [9]:
x_test = transfer.fit_transform(x_test)

### Machine learning (train the model with Gradient descent method)

- sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)

In [10]:
estimator = Ridge(alpha = 1)

In [20]:
estimator = RidgeCV(alphas = (0.1,1,10))

In [11]:
estimator.fit(x_train,y_train)

Ridge(alpha=1)

### Dump and load the model

- Save the model

In [12]:
joblib.dump(estimator,"./data/test.pkl")

['./data/test.pkl']

- load the model

In [21]:
estimator = joblib.load("./data/test.pkl")

### Model evaluation

#### Parameters of the regresion model

In [22]:
y_predict = estimator.predict(x_test)

In [23]:
y_predict[:4]

array([28.13514381, 31.28742806, 20.54637256, 31.45779505])

In [24]:
# print("the prediction",y_predict)

In [25]:
print("coefficience",estimator.coef_)

coefficience [-0.63591916  1.12109181 -0.09319611  0.74628129 -1.91888749  2.71927719
 -0.08590464 -3.25882705  2.41315949 -1.76930347 -1.74279405  0.87205004
 -3.89758657]


In [26]:
print("Interception", estimator.intercept_)

Interception 22.62137203166228


### Evaluation

- MSE

In [27]:
error = mean_squared_error(y_test, y_predict)

In [28]:
print("error: ", error)

error:  20.064724392806895
