# Multiple Linear Regression for Car Price Forecasting

This notebook focuses on predicting car prices using Multiple Linear Regression, a cornerstone technique in machine learning for understanding and forecasting outcomes influenced by multiple factors. My objective is to model the intricate relationships between car prices and their determining features, such as make, model, fueltype, and enginesize, among others. I aim to showcase the application of MLR through two distinct methodologies, enabling a comprehensive understanding of both the theory and practical implementation of this regression technique.

##### Section 1: Predicting Car Prices with Scikit-learn

In the first section, I employ Scikit-learn, a widely acclaimed Python library for machine learning, to construct and evaluate my Multiple Linear Regression model. Scikit-learn simplifies the process of data preprocessing, model training, and evaluation, allowing us to focus on achieving the best model performance with minimal coding effort. This section serves as a demonstration of how machine learning can be applied efficiently in predictive modeling, leveraging the robust functionalities offered by Scikit-learn for a streamlined development process.

##### Section 2: Building a Multiple Linear Regression Model from Scratch

The second section takes a more fundamental approach by developing the Multiple Linear Regression model without the aid of Scikit-learn or similar libraries. This hands-on method involves coding the algorithm's essential components, such as computing the regression coefficients and predicting outcomes based on the input variables. The aim is to deepen my understanding of the MLR algorithm's mechanics and appreciate the intricacies of manual model building. This section highlights the challenges and rewards of implementing machine learning algorithms from the ground up, providing valuable insights into the mathematical and computational aspects of predictive modeling.

Through both sections, I'll explore the nuances of machine learning model development, from high-level library utilization to the foundational building blocks of algorithms, all within the context of car price prediction.

## Importing the Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.max_rows', 1000)

## Importing the Data

In [33]:
car_price = pd.read_csv("CarPrice.csv")
car_price.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [34]:
car_price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   car_ID            205 non-null    int64  
 1   symboling         205 non-null    int64  
 2   CarName           205 non-null    object 
 3   fueltype          205 non-null    object 
 4   aspiration        205 non-null    object 
 5   doornumber        205 non-null    object 
 6   carbody           205 non-null    object 
 7   drivewheel        205 non-null    object 
 8   enginelocation    205 non-null    object 
 9   wheelbase         205 non-null    float64
 10  carlength         205 non-null    float64
 11  carwidth          205 non-null    float64
 12  carheight         205 non-null    float64
 13  curbweight        205 non-null    int64  
 14  enginetype        205 non-null    object 
 15  cylindernumber    205 non-null    object 
 16  enginesize        205 non-null    int64  
 1

## Preprocessing

#### Removing/Adding extra columns

In [35]:
Company = car_price["CarName"].apply(lambda x:x.split()[0].lower()).to_list()
Company

['alfa-romero',
 'alfa-romero',
 'alfa-romero',
 'audi',
 'audi',
 'audi',
 'audi',
 'audi',
 'audi',
 'audi',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'bmw',
 'chevrolet',
 'chevrolet',
 'chevrolet',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'dodge',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'honda',
 'isuzu',
 'isuzu',
 'isuzu',
 'isuzu',
 'jaguar',
 'jaguar',
 'jaguar',
 'maxda',
 'maxda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'mazda',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'buick',
 'mercury',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'mitsubishi',
 'nissan',
 'nissan',
 'nissan',


In [36]:
car_price.drop(columns=["CarName","car_ID"], inplace=True)

In [37]:
car_price.insert(loc=0,column="Company",value=Company)

In [38]:
car_price.columns

Index(['Company', 'symboling', 'fueltype', 'aspiration', 'doornumber',
       'carbody', 'drivewheel', 'enginelocation', 'wheelbase', 'carlength',
       'carwidth', 'carheight', 'curbweight', 'enginetype', 'cylindernumber',
       'enginesize', 'fuelsystem', 'boreratio', 'stroke', 'compressionratio',
       'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price'],
      dtype='object')

#### Correcting Spellings

In [43]:
len(pd.DataFrame(car_price["Company"].unique()))

22

In [44]:
print(pd.DataFrame(car_price["Company"].unique(), columns=["Company"]))

       Company
0   alfa-romeo
1         audi
2          bmw
3    chevrolet
4        dodge
5        honda
6        isuzu
7       jaguar
8        mazda
9        buick
10     mercury
11  mitsubishi
12      nissan
13     peugeot
14    plymouth
15     porsche
16     renault
17        saab
18      subaru
19      toyota
20  volkswagen
21       volvo


In [42]:
### Correction spellings in CarName
car_price["Company"].replace("alfa-romero", "alfa-romeo", inplace=True)
car_price["Company"].replace("maxda", "mazda", inplace=True)
car_price["Company"].replace("porcshce", "porsche", inplace=True)
car_price["Company"].replace("toyouta", "toyota", inplace=True)
car_price["Company"].replace("vokswagen", "volkswagen", inplace=True)
car_price["Company"].replace("vw", "volkswagen", inplace=True)

In [47]:
car_price["drivewheel"].unique()

array(['rwd', 'fwd'], dtype=object)

In [46]:
### Correcting spellings in drivewheel
car_price["drivewheel"].replace("4wd","fwd",inplace=True)

#### Data Exploration

In [48]:
car_price.head()

Unnamed: 0,Company,symboling,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,alfa-romeo,3,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,alfa-romeo,3,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,alfa-romeo,1,gas,std,two,hatchback,rwd,front,94.5,171.2,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,audi,2,gas,std,four,sedan,fwd,front,99.8,176.6,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,audi,2,gas,std,four,sedan,fwd,front,99.4,176.6,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [52]:
print("Company Distribution:")
print(car_price["Company"].value_counts())
print("\nFuel Types:")
print(car_price["fueltype"].value_counts())
print("\nAspiration Types:")
print(car_price["aspiration"].value_counts())
print("\nNo. of doors:")
print(car_price["doornumber"].value_counts())
print("\nCarbody Types:")
print(car_price["carbody"].value_counts())
print("\nDrivewheel Types:")
print(car_price["drivewheel"].value_counts())
print("\nEngine Locations:")
print(car_price["enginelocation"].value_counts())
print("\nFuel System Types:")
print(car_price["fuelsystem"].value_counts())

Company Distribution:
toyota        32
nissan        18
mazda         17
mitsubishi    13
honda         13
volkswagen    12
subaru        12
peugeot       11
volvo         11
dodge          9
buick          8
bmw            8
audi           7
plymouth       7
saab           6
porsche        5
isuzu          4
jaguar         3
chevrolet      3
alfa-romeo     3
renault        2
mercury        1
Name: Company, dtype: int64

Fuel Types:
gas       185
diesel     20
Name: fueltype, dtype: int64

Aspiration Types:
std      168
turbo     37
Name: aspiration, dtype: int64

No. of doors:
four    115
two      90
Name: doornumber, dtype: int64

Carbody Types:
sedan          96
hatchback      70
wagon          25
hardtop         8
convertible     6
Name: carbody, dtype: int64

Drivewheel Types:
fwd    129
rwd     76
Name: drivewheel, dtype: int64

Engine Locations:
front    202
rear       3
Name: enginelocation, dtype: int64

Fuel System Types:
mpfi    94
2bbl    66
idi     20
1bbl    11
spdi     9