<a href="https://colab.research.google.com/github/arorahemant1020/AIML/blob/main/Linear_and_Polynomial_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear and Polynomial Regression Tutorial

In this notebook, we'll be looking at the basic implementation of Linear Regression. The dataset used for this tutorial is House Price Prediction. It has been modified by me to make it simpler to understand.

The modified dataset **"data.csv"** contains 3 independent variables and 1 dependent variable.

### Independent Variables:

1stFlrSF - First Floor square feet <br />
2ndFlrSF - Second floor square feet <br />
YearBuilt - Original construction date

### Dependent Variables:

SalePrice - the property's sale price in dollars. (This is the target variable that you're trying to predict.)

If you wish to work on the complete dataset, it can be found at:
[House Prices: Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/).

Let's get into the code right away!

## 1. Importing Python Libraries

In [None]:
import datetime as dt
import sys
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures
from math import sqrt

## 2. Reading the Dataset as a Dataframe using Pandas

In [None]:
df = pd.read_csv("data.csv")

## 3. Initializing the Independent and the Dependent Variables

In [None]:
X = df.drop("SalePrice", 1)
y = df["SalePrice"]

TypeError: DataFrame.drop() takes from 1 to 2 positional arguments but 3 were given

## 4. Training and Testing Data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

## 5. Linear Regression Model

In [None]:
model_lin = LinearRegression()
model_lin.fit(X_train, y_train)
y_pred_lin = model_lin.predict(X_test)

## 6. Polynomial Features

In [None]:
poly = PolynomialFeatures(degree = 4)
X_poly = poly.fit_transform(X_train)
poly.fit(X_poly, y_train)

## 7. Polynomial Regression Model

In [None]:
model_poly = LinearRegression()
model_poly.fit(X_poly, y_train)
y_pred_poly = model_poly.predict(poly.fit_transform(X_test))

## 8. Tabulating Results

In [None]:
results = pd.DataFrame({
    "LinPred": y_pred_lin,
    "PolyPred": y_pred_poly,
    "TrueValues": y_test
})

## 9. Evaluation Metrics - Root Mean Square Error (RMSE)

In [None]:
rmse_lin = sqrt(mean_squared_error(results["TrueValues"], results["LinPred"]))
rmse_poly = sqrt(mean_squared_error(results["TrueValues"], results["PolyPred"]))

print("RMSE (Linear Regression): ", rmse_lin)
print("RMSE (Polynomial Regression): ", rmse_poly)

### Goal Achieved, Linear and Polynomial Regression learned!

So in this notebook, we saw how to implement Linear and Polynomial Regression using Scikit-learn.

I know the RMSEs are too bad. We will use the complete data, perform better feature engineering, and implement more robust algorithms to obtain better results in the future.

We can observe one thing that RMSE for the Polynomial Regression is better than that for the Linear Regression. Hence we can conclude that Polynomial Regression generally outperforms Linear Regression as Polynomial basically fits wide range of curvature.

**Till then, Keep coding!**

**Thank You!**