# Multiple Linear Regression for Car Price Forecasting

This notebook focuses on predicting car prices using Multiple Linear Regression, a cornerstone technique in machine learning for understanding and forecasting outcomes influenced by multiple factors. My objective is to model the intricate relationships between car prices and their determining features, such as make, model, fueltype, and enginesize, among others. I aim to showcase the application of MLR through two distinct methodologies, enabling a comprehensive understanding of both the theory and practical implementation of this regression technique.

##### Section 1: Predicting Car Prices with Scikit-learn

In the first section, I employ Scikit-learn, a widely acclaimed Python library for machine learning, to construct and evaluate my Multiple Linear Regression model. Scikit-learn simplifies the process of data preprocessing, model training, and evaluation, allowing us to focus on achieving the best model performance with minimal coding effort. This section serves as a demonstration of how machine learning can be applied efficiently in predictive modeling, leveraging the robust functionalities offered by Scikit-learn for a streamlined development process.

##### Section 2: Building a Multiple Linear Regression Model from Scratch

The second section takes a more fundamental approach by developing the Multiple Linear Regression model without the aid of Scikit-learn or similar libraries. This hands-on method involves coding the algorithm's essential components, such as computing the regression coefficients and predicting outcomes based on the input variables. The aim is to deepen my understanding of the MLR algorithm's mechanics and appreciate the intricacies of manual model building. This section highlights the challenges and rewards of implementing machine learning algorithms from the ground up, providing valuable insights into the mathematical and computational aspects of predictive modeling.

Through both sections, I'll explore the nuances of machine learning model development, from high-level library utilization to the foundational building blocks of algorithms, all within the context of car price prediction.

## Importing the Libraries

In [87]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 1000)

## Importing the Data

In [88]:
car_price = pd.read_csv("CarPrice.csv")
car_price.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,carheight,curbweight,enginetype,cylindernumber,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,176.6,66.2,54.3,2337,ohc,four,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,176.6,66.4,54.3,2824,ohc,five,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [89]:
car_price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   car_ID            205 non-null    int64  
 1   symboling         205 non-null    int64  
 2   CarName           205 non-null    object 
 3   fueltype          205 non-null    object 
 4   aspiration        205 non-null    object 
 5   doornumber        205 non-null    object 
 6   carbody           205 non-null    object 
 7   drivewheel        205 non-null    object 
 8   enginelocation    205 non-null    object 
 9   wheelbase         205 non-null    float64
 10  carlength         205 non-null    float64
 11  carwidth          205 non-null    float64
 12  carheight         205 non-null    float64
 13  curbweight        205 non-null    int64  
 14  enginetype        205 non-null    object 
 15  cylindernumber    205 non-null    object 
 16  enginesize        205 non-null    int64  
 1

## Preprocessing

### Removing/Adding extra columns

In [90]:
Company = car_price["CarName"].apply(lambda x:x.split()[0].lower()).to_list()
Company

['alfa-romero',
 'alfa-romero',
 'alfa-romero',
 'audi',
 'audi',
 'audi',
 'audi',
 'audi',
 'audi',
 'audi',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'chevrolet',
 'chevrolet',
 'chevrolet',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'isuzu',
 'isuzu',
 'isuzu',
 'isuzu',
 'jaguar',
 'jaguar',
 'jaguar',
 'maxda',
 'maxda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'mercury',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'nissan',
 'nissan',
 'nissan',


In [91]:
car_price.drop(columns=["CarName","car_ID"], inplace=True)

In [92]:
car_price.insert(loc=0,column="Company",value=Company)

In [93]:
car_price.columns

Index(['Company', 'symboling', 'fueltype', 'aspiration', 'doornumber',
       'carbody', 'drivewheel', 'enginelocation', 'wheelbase', 'carlength',
       'carwidth', 'carheight', 'curbweight', 'enginetype', 'cylindernumber',
       'enginesize', 'fuelsystem', 'boreratio', 'stroke', 'compressionratio',
       'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price'],
      dtype='object')

### Correcting Spellings

In [94]:
len(pd.DataFrame(car_price["Company"].unique()))

27

In [95]:
print(pd.DataFrame(car_price["Company"].unique(), columns=["Company"]))

        Company
0   alfa-romero
1          audi
2           bmw
3     chevrolet
4         dodge
5         honda
6         isuzu
7        jaguar
8         maxda
9         mazda
10        buick
11      mercury
12   mitsubishi
13       nissan
14      peugeot
15     plymouth
16      porsche
17     porcshce
18      renault
19         saab
20       subaru
21       toyota
22      toyouta
23    vokswagen
24   volkswagen
25           vw
26        volvo


In [96]:
### Correction spellings of Company
car_price["Company"].replace("alfa-romero", "alfa-romeo", inplace=True)
car_price["Company"].replace("maxda", "mazda", inplace=True)
car_price["Company"].replace("porcshce", "porsche", inplace=True)
car_price["Company"].replace("toyouta", "toyota", inplace=True)
car_price["Company"].replace("vokswagen", "volkswagen", inplace=True)
car_price["Company"].replace("vw", "volkswagen", inplace=True)

In [97]:
car_price["drivewheel"].unique()

array(['rwd', 'fwd', '4wd'], dtype=object)

In [98]:
### Correcting spellings in drivewheel column
car_price["drivewheel"].replace("4wd","fwd",inplace=True)

### Data Exploration

In [99]:
car_price.head()

Unnamed: 0,Company,symboling,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,carwidth,carheight,curbweight,enginetype,cylindernumber,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,alfa-romeo,3,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,alfa-romeo,3,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,alfa-romeo,1,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,audi,2,gas,std,four,sedan,fwd,front,99.8,176.6,66.2,54.3,2337,ohc,four,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,audi,2,gas,std,four,sedan,fwd,front,99.4,176.6,66.4,54.3,2824,ohc,five,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


#### Categorical Data

In [100]:
categorical = car_price.select_dtypes(include='object').columns.tolist()
categorical

['Company',
 'fueltype',
 'aspiration',
 'doornumber',
 'carbody',
 'drivewheel',
 'enginelocation',
 'enginetype',
 'cylindernumber',
 'fuelsystem']

In [101]:
for col in categorical:
    print(col.upper()+":")
    print(car_price[col].value_counts())
    print("\n")

COMPANY:
toyota        32
nissan        18
mazda         17
mitsubishi    13
honda         13
volkswagen    12
subaru        12
peugeot       11
volvo         11
dodge          9
buick          8
bmw            8
audi           7
plymouth       7
saab           6
porsche        5
isuzu          4
jaguar         3
chevrolet      3
alfa-romeo     3
renault        2
mercury        1
Name: Company, dtype: int64


FUELTYPE:
gas       185
diesel     20
Name: fueltype, dtype: int64


ASPIRATION:
std      168
turbo     37
Name: aspiration, dtype: int64


DOORNUMBER:
four    115
two      90
Name: doornumber, dtype: int64


CARBODY:
sedan          96
hatchback      70
wagon          25
hardtop         8
convertible     6
Name: carbody, dtype: int64


DRIVEWHEEL:
fwd    129
rwd     76
Name: drivewheel, dtype: int64


ENGINELOCATION:
front    202
rear       3
Name: enginelocation, dtype: int64


ENGINETYPE:
ohc      148
ohcf      15
ohcv      13
dohc      12
l         12
rotor      4
dohcv      1


In [102]:
car_melted = car_price.melt(id_vars=["Company"], value_vars=categorical,var_name="category",value_name="value")
car_melted.head()

Unnamed: 0,Company,category,value
0,alfa-romeo,fueltype,gas
1,alfa-romeo,fueltype,gas
2,alfa-romeo,fueltype,gas
3,audi,fueltype,gas
4,audi,fueltype,gas


In [103]:
car_pivot = car_melted.pivot_table(index="Company",columns=["category","value"],aggfunc="size",fill_value=0)
car_pivot.head()

category,aspiration,aspiration,carbody,carbody,carbody,carbody,carbody,cylindernumber,cylindernumber,cylindernumber,cylindernumber,cylindernumber,cylindernumber,cylindernumber,doornumber,doornumber,drivewheel,drivewheel,enginelocation,enginelocation,enginetype,enginetype,enginetype,enginetype,enginetype,enginetype,enginetype,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fueltype,fueltype
value,std,turbo,convertible,hardtop,hatchback,sedan,wagon,eight,five,four,six,three,twelve,two,four,two,fwd,rwd,front,rear,dohc,dohcv,l,ohc,ohcf,ohcv,rotor,1bbl,2bbl,4bbl,idi,mfi,mpfi,spdi,spfi,diesel,gas
Company,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2
alfa-romeo,3,0,2,0,1,0,0,0,0,2,1,0,0,0,0,3,0,3,3,0,2,0,0,0,0,1,0,0,0,0,0,0,3,0,0,0,3
audi,5,2,0,0,1,5,1,0,6,1,0,0,0,0,5,2,7,0,7,0,0,0,0,7,0,0,0,0,0,0,0,0,7,0,0,0,7
bmw,8,0,0,0,0,8,0,0,0,2,6,0,0,0,5,3,0,8,8,0,0,0,0,8,0,0,0,0,0,0,0,0,8,0,0,0,8
buick,4,4,1,2,0,4,1,4,4,0,0,0,0,0,5,3,0,8,8,0,0,0,0,4,0,4,0,0,0,0,4,0,4,0,0,4,4
chevrolet,3,0,0,0,2,1,0,0,0,2,0,1,0,0,1,2,3,0,3,0,0,0,1,2,0,0,0,0,3,0,0,0,0,0,0,0,3


In [104]:
car_pivot

category,aspiration,aspiration,carbody,carbody,carbody,carbody,carbody,cylindernumber,cylindernumber,cylindernumber,cylindernumber,cylindernumber,cylindernumber,cylindernumber,doornumber,doornumber,drivewheel,drivewheel,enginelocation,enginelocation,enginetype,enginetype,enginetype,enginetype,enginetype,enginetype,enginetype,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fuelsystem,fueltype,fueltype
value,std,turbo,convertible,hardtop,hatchback,sedan,wagon,eight,five,four,six,three,twelve,two,four,two,fwd,rwd,front,rear,dohc,dohcv,l,ohc,ohcf,ohcv,rotor,1bbl,2bbl,4bbl,idi,mfi,mpfi,spdi,spfi,diesel,gas
Company,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2
alfa-romeo,3,0,2,0,1,0,0,0,0,2,1,0,0,0,0,3,0,3,3,0,2,0,0,0,0,1,0,0,0,0,0,0,3,0,0,0,3
audi,5,2,0,0,1,5,1,0,6,1,0,0,0,0,5,2,7,0,7,0,0,0,0,7,0,0,0,0,0,0,0,0,7,0,0,0,7
bmw,8,0,0,0,0,8,0,0,0,2,6,0,0,0,5,3,0,8,8,0,0,0,0,8,0,0,0,0,0,0,0,0,8,0,0,0,8
buick,4,4,1,2,0,4,1,4,4,0,0,0,0,0,5,3,0,8,8,0,0,0,0,4,0,4,0,0,0,0,4,0,4,0,0,4,4
chevrolet,3,0,0,0,2,1,0,0,0,2,0,1,0,0,1,2,3,0,3,0,0,0,1,2,0,0,0,0,3,0,0,0,0,0,0,0,3
dodge,6,3,0,0,5,3,1,0,0,9,0,0,0,0,4,5,9,0,9,0,0,0,0,9,0,0,0,0,6,0,0,1,2,0,0,0,9
honda,13,0,0,0,7,5,1,0,0,13,0,0,0,0,5,8,13,0,13,0,0,0,0,13,0,0,0,11,1,0,0,0,1,0,0,0,13
isuzu,4,0,0,0,1,3,0,0,0,4,0,0,0,0,2,2,2,2,4,0,0,0,0,4,0,0,0,0,3,0,0,0,0,0,1,0,4
jaguar,3,0,0,0,0,3,0,0,0,0,2,0,1,0,2,1,0,3,3,0,2,0,0,0,0,1,0,0,0,0,0,0,3,0,0,0,3
mazda,17,0,0,0,10,7,0,0,0,13,0,0,0,4,8,9,11,6,17,0,0,0,0,13,0,0,4,0,10,3,2,0,2,0,0,2,15


#### Numeric Data

In [131]:
numerical = car_price.select_dtypes(exclude='object').columns.tolist()
numerical.pop()
numerical

['symboling',
 'wheelbase',
 'carlength',
 'carwidth',
 'carheight',
 'curbweight',
 'enginesize',
 'boreratio',
 'stroke',
 'compressionratio',
 'horsepower',
 'peakrpm',
 'citympg',
 'highwaympg']

In [132]:
for col in numerical:
    print(col.upper()+":")
    print("Max:",car_price[col].max())
    print("Min:",car_price[col].min())
    print("\n")

SYMBOLING:
Max: 3
Min: -2


WHEELBASE:
Max: 120.9
Min: 86.6


CARLENGTH:
Max: 208.1
Min: 141.1


CARWIDTH:
Max: 72.3
Min: 60.3


CARHEIGHT:
Max: 59.8
Min: 47.8


CURBWEIGHT:
Max: 4066
Min: 1488


ENGINESIZE:
Max: 326
Min: 61


BORERATIO:
Max: 3.94
Min: 2.54


STROKE:
Max: 4.17
Min: 2.07


COMPRESSIONRATIO:
Max: 23.0
Min: 7.0


HORSEPOWER:
Max: 288
Min: 48


PEAKRPM:
Max: 6600
Min: 4150


CITYMPG:
Max: 49
Min: 13


HIGHWAYMPG:
Max: 54
Min: 16




### Train Test Splitting

In [107]:
X = car_price.drop("price", axis=1)
Y = car_price["price"]

In [108]:
len(X),len(Y)

(205, 205)

In [138]:
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2,random_state=7)

In [110]:
len(x_train),len(y_train)

(164, 164)

### Data Encoding

In [114]:
numeric_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())
])
categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numerical),
        ('cat', categorical_transformer, categorical)
])

In [139]:
# Apply transformations
X_train = preprocessor.fit_transform(x_train)
X_test = preprocessor.transform(x_test)

In [140]:
Y_train = y_train.values
Y_test = y_test.values

In [145]:
X_train.dtype,y_train.dtype

(dtype('float64'), dtype('float64'))