## Polynomial Regression

Polynomial regression is a special case of linear regression where we fit a polynomial equation on the data with a curvilinear relationship between the target variable and the independent variables.

In a curvilinear relationship, the value of the target variable changes in a non-uniform manner with respect to the predictor (s).

In Linear Regression, with a single predictor, we have the following equation:

$$ Y=𝜃_0+𝜃_1x $$
where,

   **Y** is the target,

   **x** is the predictor,

   **𝜃0** is the bias,

   and **𝜃1** is the weight in the regression equation

This linear equation can be used to represent a linear relationship. But, in polynomial regression, we have a polynomial equation of degree n represented as:
$$ Y=𝜃_0+𝜃_1x+𝜃_2x^2+𝜃_3x^3+...+𝜃_nx^n $$

Here:

  **𝜃0** is the bias,

   **𝜃1, 𝜃2, …, 𝜃n** are the weights in the equation of the polynomial regression,

   and **n** is the degree of the polynomial

The number of higher-order terms increases with the increasing value of **n**, and hence the equation becomes more complicated.

**I previously write a notebook for using simple Linear Regression on real-estate that you can check [ [HERE](https://www.kaggle.com/zahrajai/linear-regression-real-estate) ].**

### Import libraries


In [None]:
import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

### Load Dataset

In [None]:
realestate_df=pd.read_csv('../input/real-estate-price-prediction/Real estate.csv')
realestate_df.head()

In [None]:
realestate_df.shape

### Dataset Information

In [None]:
realestate_df.info()

In [None]:
realestate_df.describe()

In [None]:
sns.pairplot(realestate_df)

## Data Preprocessing

### Determine the Features & Target Variable

In [None]:
X=realestate_df.drop(['No','Y house price of unit area'],axis=1)
y=realestate_df['Y house price of unit area']

In [None]:
# X.head()
# y

#### Generate polynomial and interaction features.

In [None]:
from sklearn.preprocessing import PolynomialFeatures

In [None]:
polynomial_converter=PolynomialFeatures(degree=2, include_bias=False)
poly_features=polynomial_converter.fit(X)
poly_features=polynomial_converter.transform(X)

In [None]:
poly_features.shape
#Poly_Features: X1, X2, X3, X1^2, X2^2, X3^2, X1X2, X1X3, X2X3

In [None]:
X.shape

#### Train - Test Split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train,X_test,y_train,y_test=train_test_split(poly_features,y, test_size=0.3,random_state=101)

## Linear Regression Model
### Train the Model

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
polymodel=LinearRegression()

In [None]:
polymodel.fit(X_train,y_train)

### Predicting Test Data

In [None]:
y_pred=polymodel.predict(X_test)

In [None]:
pd.DataFrame({'y_test':y_test,'y_pred':y_pred,'Residuals':(y_test-y_pred)})

### Evaluating the Model

In [None]:
from sklearn import metrics

In [None]:
MAE_Poly=metrics.mean_absolute_error(y_test,y_pred)
MSE_Poly=metrics.mean_squared_error(y_test,y_pred)
RMSE_Poly=np.sqrt(MSE_Poly)

pd.DataFrame([MAE_Poly,MSE_Poly,RMSE_Poly],index=['MAE','MSE','RMSE'],columns=['Metrics'])

In [None]:
XS_train, XS_test, ys_train, ys_test = train_test_split(X, y, test_size=0.3, random_state=101)
simplemodel=LinearRegression()
simplemodel.fit(XS_train, ys_train)
ys_pred=simplemodel.predict(XS_test)

MAE_simple = metrics.mean_absolute_error(ys_test,ys_pred)
MSE_simple = metrics.mean_squared_error(ys_test,ys_pred)
RMSE_simple = np.sqrt(MSE_simple)

### Compare to the simple linear regression

In [None]:
pd.DataFrame({'Poly Metrics': [MAE_Poly, MSE_Poly, RMSE_Poly], 'Simple Metrics':[MAE_simple, MSE_simple, RMSE_simple]}, index=['MAE', 'MSE', 'RMSE'])

In [None]:
# Train List of RMSE per degree
train_RMSE_list=[]
#Test List of RMSE per degree
test_RMSE_list=[]

for d in range(1,10):
    
    #Preprocessing
    #create poly data set for degree (d)
    polynomial_converter= PolynomialFeatures(degree=d, include_bias=False)
    poly_features= polynomial_converter.fit(X)
    poly_features= polynomial_converter.transform(X)
    
    #Split the dataset
    X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)
    
    #Train the Model
    polymodel=LinearRegression()
    polymodel.fit(X_train, y_train)
    
    #Predicting on both Train & Test Data
    y_train_pred=polymodel.predict(X_train)
    y_test_pred=polymodel.predict(X_test)
    
    #Evaluating the Model
    
    #RMSE of Train set
    train_RMSE=np.sqrt(metrics.mean_squared_error(y_train, y_train_pred))
    
    #RMSE of Test Set
    test_RMSE=np.sqrt(metrics.mean_squared_error(y_test, y_test_pred))
    
    #Append the RMSE to the Train and Test List
    
    train_RMSE_list.append(train_RMSE)
    test_RMSE_list.append(test_RMSE)

In [None]:
train_RMSE_list

In [None]:
test_RMSE_list

In [None]:
plt.plot(range(1,6), train_RMSE_list[:5], label='Train RMSE')
plt.plot(range(1,6), test_RMSE_list[:5], label='Test RMSE')

plt.xlabel('Polynomial Degree')
plt.ylabel('RMSE')
plt.legend()

**It seems degree 2 is the best choice for model.**