## Problem
The given dataset contains price of second-hand Hyundai grand i10 car with
respect to year of making. Find the best linear relationship between year and
price. Can you predict the possible price of a 2022 model second-hand grand i10?
Please learn about lasso regression and create a model along with linear
regression. Find out which one is performing better.

In [1]:
#importing the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# loading the dataset
data = pd.read_csv('/content/car_age_price.csv')
data.head()

Unnamed: 0,Year,Price
0,2018,465000
1,2019,755000
2,2019,700000
3,2018,465000
4,2018,465000


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112 entries, 0 to 111
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Year    112 non-null    int64
 1   Price   112 non-null    int64
dtypes: int64(2)
memory usage: 1.9 KB


In [4]:
data.describe()

Unnamed: 0,Year,Price
count,112.0,112.0
mean,2016.669643,483866.044643
std,1.629616,91217.450533
min,2013.0,300000.0
25%,2015.0,423750.0
50%,2017.0,500000.0
75%,2017.0,550000.0
max,2020.0,755000.0


In [6]:
data.columns

Index(['Year', 'Price'], dtype='object')

In [11]:
# Check for missing values
data.isna().sum()

Year     0
Price    0
dtype: int64

It is observed that there is no need of preprocessing. Since the `Year` column is in a fixed scale, there is no need of scaling. The `Price` column is the target column so we don't need to scale that.

In [17]:
#splitting the data into features and labels
X = data['Year']
y = data['Price']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25)
X_train.shape, y_train.shape, X_test.shape, y_test.shape


((84,), (84,), (28,), (28,))

In [18]:
# Expanding the dims of splits
X_train = np.expand_dims(X_train, axis=1)
X_test = np.expand_dims(X_test, axis=1)
y_train = np.expand_dims(y_train, axis=1)
y_test= np.expand_dims(y_test, axis=1)
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((84, 1), (84, 1), (28, 1), (28, 1))

### Linear Regression

In [19]:
from sklearn.linear_model import LinearRegression
#instantiating the model
lr = LinearRegression()
#Fitting the model
lr.fit(X_train, y_train)
#predicting
y_preds = lr.predict(X_test)
y_preds

array([[504225.40169896],
       [457410.38654156],
       [597855.43201378],
       [597855.43201378],
       [504225.40169896],
       [504225.40169896],
       [504225.40169896],
       [504225.40169896],
       [363780.35622676],
       [597855.43201378],
       [504225.40169896],
       [410595.37138416],
       [644670.44717118],
       [504225.40169896],
       [597855.43201378],
       [504225.40169896],
       [504225.40169896],
       [410595.37138416],
       [551040.41685636],
       [644670.44717118],
       [504225.40169896],
       [504225.40169896],
       [597855.43201378],
       [551040.41685636],
       [551040.41685636],
       [504225.40169896],
       [457410.38654156],
       [597855.43201378]])

In [25]:
# Evaluating the model
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
def evaluate_model(model,y_pred, y_true=y_test):
  '''
    takes the model, y_pred and y_true as arguments and prints the metrics
  '''
  print(f"Mean Absolute Error : {mean_absolute_error(y_true, y_pred)}")
  print(f"Mean Squared Error : {mean_squared_error(y_true, y_pred)}")
  print(f"R2 Score : {r2_score(y_true, y_pred)}")

evaluate_model(lr,y_pred=y_preds)

Mean Absolute Error : 43304.25433068829
Mean Squared Error : 3226744008.3420434
R2 Score : 0.4162746790122237


### Lasso Regression

In [27]:
from sklearn.linear_model import Lasso
#instantiating the model instance
lasso_model = Lasso(alpha=0.1)
#fitting the model
lasso_model.fit(X_train,y_train)
#Making predictions
y_preds = lasso_model.predict(X_test)
#evaluating the model
evaluate_model(lasso_model, y_pred=y_preds)

Mean Absolute Error : 43304.243137525125
Mean Squared Error : 3226740847.436344
R2 Score : 0.41627525082724504


### Ridge Regression


In [28]:
from sklearn.linear_model import Ridge
#instantiating the model
ridge_reg = Ridge(alpha=0.1)
#fitting the model
ridge_reg.fit(X_train, y_train)
#making predictions
y_preds = ridge_reg.predict(X_test)
#evaluating the model
evaluate_model(ridge_reg, y_pred=y_preds)

Mean Absolute Error : 43298.019048915376
Mean Squared Error : 3224984674.8086476
R2 Score : 0.41659294644492084


### SVM Regression

In [31]:
from sklearn.svm import SVR
#instantiating the model
svr = SVR(kernel='rbf')
#fitting the model
svr.fit(X_train, y_train)
#predicting
y_preds = svr.predict(X_test)
#evaluating the model
evaluate_model(svr,y_pred=y_preds)

Mean Absolute Error : 61345.97125113085
Mean Squared Error : 5742902331.884842
R2 Score : -0.038904077427383665


  y = column_or_1d(y, warn=True)


### Random Forest Regressor

In [32]:
from sklearn.ensemble import RandomForestRegressor
#instantiating the model
rf_reg = RandomForestRegressor(n_estimators = 100)
#fitting the model
rf_reg.fit(X_train,y_train)
#predicting
y_preds = rf_reg.predict(X_test)
#evaluating the model
evaluate_model(rf_reg, y_pred=y_preds)

  rf_reg.fit(X_train,y_train)


Mean Absolute Error : 44417.997402166584
Mean Squared Error : 3419140847.956707
R2 Score : 0.38146965367685215


Lasso Regression has comparable perfomance to LinearRegression

### Calculating Price of 2022 model car

In [35]:
lasso_model.predict(np.expand_dims(np.array([2022]), axis=1))

array([738300.25875299])