<h1> <p style=" font-family: "Times New Roman", Times, serif;">Introduction</p></h1>

<b>Polynomial Regression</b>

<p>One common pattern within machine learning is to use linear models trained on nonlinear functions of the data. This approach maintains the generally fast performance of linear methods, while allowing them to fit a much wider range of data.

For example,<a href="https://www.kaggle.com/mahyamahjoob/real-estate-valuation-using-linear-regression">a simple linear regression</a> can be extended by constructing polynomial features from the coefficients. In the standard linear regression case, you might have a model that looks like this for two-dimensional data:
<H2>y^(w,x)=w0+w1x1+w2x2</H2>
If we want to fit a paraboloid to the data instead of a plane, we can combine the features in second-order polynomials, so that the model looks like this:

<H2>y^(w,x)=w0+w1x1+w2x2+w3x1x2+w4x21+w5x22</H2>
 
The (sometimes surprising) observation is that this is still a linear model: to see this, imagine creating a new variable

<H2>z=[x1,x2,x1x2,x21,x22]</H2>
 
With this re-labeling of the data, our problem can be written:
    
<H2> y^(w,x)=w0+w1z1+w2z2+w3z3+w4z4+w5z5 </H2>
 
We see that the resulting polynomial regression is in the same class of linear models we’d considered above (i.e. the model is linear in w) and can be solved by the same techniques. By considering linear fits within a higher-dimensional space built with these basis functions, the model has the flexibility to fit a much broader range of data.</p>
<p>
<h4><b>Source:</b><a href="https://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression-extending-linear-models-with-basis-functions">scikit-learn</a>
</h4></p>
<h2>Dataset</h2>
The data set is Real estate price prediction that is used for regression analysis, mutiple regression,linear regression, prediction. Since house price is a continues variable, this is a regression problem. The data contains 8columns that include sixFeatures(X) and one Label(y): house price of unit area.

<h3>Import libraries and dataset.</h3>


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import PolynomialFeatures

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn import metrics

from joblib import dump, load

%matplotlib inline

In [None]:
#Create dataframe from a Real estate price prediction dataset.

df = pd.read_csv('../input/real-estate-price-prediction/Real estate.csv')


<h3>2-check out the data</h3>

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df.describe

<h3>3-EDA</h3>

In [None]:
g= sns.pairplot(df)
g.map_upper(plt.scatter)

In [None]:
# find the pairwise correlation of all columns in the dataframe.

df.corr()

In [None]:
#Heatmap for correlation
sns.heatmap(df.corr(), annot=True,cmap='winter')

In [None]:
plt.figure(figsize=(10,4))
sns.displot(df['Y house price of unit area'],kde=True,bins=20, aspect=2)
plt.xlabel('house price of unit area')

In [None]:
plt.figure(figsize=(8, 8), dpi=50)

sns.rugplot(df['Y house price of unit area'], height=0.2)


In [None]:
plt.figure(figsize=(5, 5), dpi=100)

sns.scatterplot(data=df, y=df['Y house price of unit area'], x=df['X1 transaction date'] , hue= 'X2 house age', palette="rocket")


In [None]:
plt.figure(figsize=(5, 5), dpi=100)

sns.scatterplot(data=df, y=df['Y house price of unit area'], x=df['X3 distance to the nearest MRT station'] , hue= 'X4 number of convenience stores', palette="rocket")

In [None]:
plt.figure(figsize=(5, 5), dpi=100)

sns.scatterplot(data=df, y=df['Y house price of unit area'], x=df['X5 latitude'] , hue= 'X6 longitude', palette="rocket")

<p>First <b>split</b> up the data into an X array that contains the <b>features</b> to train on, and a y array with the <b>target</b> variable, in this case the (Y house price of unit area) column.<p>


In [None]:
X = df.drop('Y house price of unit area',axis=1)
y = df['Y house price of unit area']

<p><b>Split</b> a data into <b>train</b> and <b>test</b></p>


<h3>Training a Polynomial Regression Model</h3>

In [None]:
from sklearn.preprocessing import PolynomialFeatures
PF=PolynomialFeatures(degree=2, include_bias=False)
poly_features=PF.fit_transform(X)

In [None]:
poly_features.shape

In [None]:
# train out model on the training set and then use the test set to evaluate the model.


X_train, X_test, y_train, y_test = train_test_split(
    poly_features, y, test_size=0.3, random_state=101)


In [None]:
from sklearn.linear_model import LinearRegression
polymodel=LinearRegression()
polymodel.fit(X_train, y_train)

<h3>Test Data Prediction</h3>

In [None]:
y_pred=polymodel.predict(X_test)




In [None]:
pd.DataFrame({'Y_Test': y_test,'Y_Pred':y_pred, 'Residuals':(y_test-y_pred) }).head(5)


<h3>Model Evalution</h3>

In [None]:
MAE_Poly = metrics.mean_absolute_error(y_test,y_pred)
MSE_Poly = metrics.mean_squared_error(y_test,y_pred)
RMSE_Poly = np.sqrt(MSE_Poly)

pd.DataFrame([MAE_Poly, MSE_Poly, RMSE_Poly],
             index=['MAE', 'MSE', 'RMSE'], columns=['metrics'])

<b>Compare Linear regrassion Vs  Polynomial Regression</b>

In [None]:
XS_train, XS_test, ys_train, ys_test = train_test_split(X, y, test_size=0.3, random_state=101)
simplemodel=LinearRegression()
simplemodel.fit(XS_train, ys_train)
ys_pred=simplemodel.predict(XS_test)

MAE_simple = metrics.mean_absolute_error(ys_test,ys_pred)
MSE_simple = metrics.mean_squared_error(ys_test,ys_pred)
RMSE_simple = np.sqrt(MSE_simple)

In [None]:
pd.DataFrame({'Poly Metrics': [MAE_Poly, MSE_Poly, RMSE_Poly], 'Simple Metrics':[MAE_simple, MSE_simple,
                                                                                 RMSE_simple]}, index=['MAE', 'MSE', 'RMSE'])


<h3> Adjust model parameters</h3>

In [None]:
train_RMSE_list=[]
test_RMSE_list=[]

for d in range(1,10):
    
    polynomial_converter= PolynomialFeatures(degree=d, include_bias=False)
    poly_features= polynomial_converter.fit_transform(X)
    
    X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)
    
    polymodel=LinearRegression()
    polymodel.fit(X_train, y_train)
    
    y_train_pred=polymodel.predict(X_train)
    y_test_pred=polymodel.predict(X_test)
    
    train_RMSE=np.sqrt(metrics.mean_squared_error(y_train, y_train_pred))
    
    test_RMSE=np.sqrt(metrics.mean_squared_error(y_test, y_test_pred))
        
    train_RMSE_list.append(train_RMSE)
    test_RMSE_list.append(test_RMSE)
plt.plot(range(1,6), train_RMSE_list[:5], label='Train RMSE')
plt.plot(range(1,6), test_RMSE_list[:5], label='Test RMSE')

plt.xlabel('Polynomial Degree')

plt.legend(loc=(1.1, 0.5))
