# **Mielage Prediction - Regression Analysis**

**Source:**

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the 1983 American Statistical Association Exposition.


**Data Set Information:**

This dataset is a slightly modified version of the dataset provided in the StatLib library. In line with the use by Ross Quinlan (1993) in predicting the attribute "mpg", 8 of the original instances were removed because they had unknown values for the "mpg" attribute. The original dataset is available in the file "auto-mpg.data-original".

"The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes." (Quinlan, 1993)


**Attribute Information:**

1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)

## **Import Library**

In [1]:
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

In [None]:
import numpy as np

In [None]:
import matplotlib.pyplot as plt

In [None]:
import seaborn as sns

## **Import Data**

In [None]:
df = pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/MPG.csv')

In [None]:
df.head()

In [None]:
df.nunique()

## **Data Preprocessing**

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.corr()

## **Remove Missing Values**

In [None]:
df = df.dropna()

In [None]:
df.info()

## **Data Visualization**

In [None]:
sns.pairplot(df, x_vars= ['displacement', 'horsepower', 'weight', 'acceleration',	'mpg'], y_vars=['mpg']);

In [None]:
sns.regplot(x = 'displacement', y = 'mpg', data = df);

## **Define Target Variable y and Feature X**

In [None]:
df.columns

In [None]:
y = df['mpg']

In [None]:
y.shape

In [None]:
X = df[['displacement', 'horsepower', 'weight', 'acceleration']]

In [None]:
X.shape

In [None]:
X

## **Scaling Data**

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
ss = StandardScaler()

In [None]:
X = ss.fit_transform(X)

In [None]:
X

In [None]:
pd.DataFrame(X).describe()

**After Standardization Mean is Zero and Standard Deviation is One**

## **Train Test Split Data**

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.7, random_state = 2529)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

## **Linear Regression Model**

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
lr = LinearRegression()

In [None]:
lr.fit(X_train, y_train)

In [None]:
lr.intercept_

In [None]:
lr.coef_

**Mileage = 23.4 - 1.05Displacemet - 1.68Horsepower - 4.10Weight - 0.115Acceleration + error**

## **Predict Test Data**

In [None]:
y_pred = lr.predict(X_test)

In [None]:
y_pred

## **Model Accuracy**

In [None]:
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, r2_score

In [None]:
mean_absolute_error(y_test, y_pred)

In [None]:
mean_absolute_percentage_error(y_test, y_pred)

In [None]:
r2_score(y_test, y_pred)

## **Polynomial Regression**

In [None]:
from sklearn.preprocessing import PolynomialFeatures

In [None]:
poly = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)

In [None]:
X_train2 = poly.fit_transform(X_train)

In [None]:
X_test2 = poly.fit_transform(X_test)

In [None]:
lr.fit(X_train2, y_train)

In [None]:
lr.intercept_

In [None]:
lr.coef_

In [None]:
y_pred_poly = lr.predict(X_test2)

## **Model Accuracy**

In [None]:
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, r2_score

In [None]:
mean_absolute_error(y_test, y_pred_poly)

In [None]:
mean_absolute_percentage_error(y_test, y_pred_poly)

In [None]:
r2_score(y_test, y_pred_poly)

------------