# **House Price Prediction**

Build a machine learning model to predict the median house prices based on different independent variables.

There are 14 attributes in each case of the dataset. They are:

- CRIM - per capita crime rate by town
- ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS - proportion of non-retail business acres per town.
- CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- NOX - nitric oxides concentration (parts per 10 million)
- RM - average number of rooms per dwelling
- AGE - proportion of owner-occupied units built prior to 1940
- DIS - weighted distances to five Boston employment centres
- RAD - index of accessibility to radial highways
- TAX - full-value property-tax rate per dollar 10,000
- PTRATIO - pupil-teacher ratio by town
- B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT - % lower status of the population
- MEDV - Median value of owner-occupied homes in dollar 1000's

Dataset : https://github.com/ybifoundation/Dataset/raw/main/Boston.csv

**Import Libraries**

In [None]:
import pandas as pd

**Importing Dataset**

In [None]:
House = pd.read_csv("https://github.com/ybifoundation/Dataset/raw/main/Boston.csv")

**Exploring Dataset**

In [None]:
House.head()

In [None]:
House.info()

In [None]:
House.describe()

**Initializing target and feature data**

In [None]:
House.columns

In [None]:
y = House["MEDV"]
X = House[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT']]

**Splitting of data into train and test**

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8, random_state = 2529)

**Creating model with linear regression**

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

In [None]:
model.fit(X_train, y_train)

In [None]:
model.intercept_

In [None]:
model.coef_

**Prediction and error analysis**

In [None]:
y_pred = model.predict(X_test)
y_pred

In [None]:
from sklearn.metrics import mean_absolute_percentage_error

In [None]:
error = float(mean_absolute_percentage_error(y_test, y_pred))
print(round(error, 3)*100, "%")

## **Deployment**

In [None]:
import pickle

In [None]:
pickle.dump(model, open('House_predictor.pkl', 'wb'))

In [None]:
pickle_model = pickle.load(open('House_predictor.pkl', 'rb'))

In [None]:
pickle_model.predict([[0.0063, 18.0, 2.31, 0, 0.538, 6.575, 65.2, 4.0900, 1, 296.0, 15.3, 396.90, 4.98]])