# Multiple Linear Regression for Car Price Forecasting

This notebook focuses on predicting car prices using Multiple Linear Regression, a cornerstone technique in machine learning for understanding and forecasting outcomes influenced by multiple factors. My objective is to model the intricate relationships between car prices and their determining features, such as make, model, fueltype, and enginesize, among others. I aim to showcase the application of MLR through two distinct methodologies, enabling a comprehensive understanding of both the theory and practical implementation of this regression technique.

##### Section 1: Predicting Car Prices with Scikit-learn

In the first section, I employ Scikit-learn, a widely acclaimed Python library for machine learning, to construct and evaluate my Multiple Linear Regression model. Scikit-learn simplifies the process of data preprocessing, model training, and evaluation, allowing us to focus on achieving the best model performance with minimal coding effort. This section serves as a demonstration of how machine learning can be applied efficiently in predictive modeling, leveraging the robust functionalities offered by Scikit-learn for a streamlined development process.

##### Section 2: Building a Multiple Linear Regression Model from Scratch

The second section takes a more fundamental approach by developing the Multiple Linear Regression model without the aid of Scikit-learn or similar libraries. This hands-on method involves coding the algorithm's essential components, such as computing the regression coefficients and predicting outcomes based on the input variables. The aim is to deepen my understanding of the MLR algorithm's mechanics and appreciate the intricacies of manual model building. This section highlights the challenges and rewards of implementing machine learning algorithms from the ground up, providing valuable insights into the mathematical and computational aspects of predictive modeling.

Through both sections, I'll explore the nuances of machine learning model development, from high-level library utilization to the foundational building blocks of algorithms, all within the context of car price prediction.

## Importing the Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.max_rows', 1000)

## Importing the Data

In [15]:
car_price = pd.read_csv("CarPrice.csv")
car_price.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [25]:
car_price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   symboling         205 non-null    int64  
 1   CarName           205 non-null    object 
 2   fueltype          205 non-null    object 
 3   aspiration        205 non-null    object 
 4   doornumber        205 non-null    object 
 5   carbody           205 non-null    object 
 6   drivewheel        205 non-null    object 
 7   enginelocation    205 non-null    object 
 8   wheelbase         205 non-null    float64
 9   carlength         205 non-null    float64
 10  carwidth          205 non-null    float64
 11  carheight         205 non-null    float64
 12  curbweight        205 non-null    int64  
 13  enginetype        205 non-null    object 
 14  cylindernumber    205 non-null    object 
 15  enginesize        205 non-null    int64  
 16  fuelsystem        205 non-null    object 
 1

## Preprocessing

#### Correcting Spellings

In [20]:
len(pd.DataFrame(car_price["CarName"].unique()))

141

In [21]:
print(pd.DataFrame(car_price["CarName"].unique(), columns=["CarNames"]))

                            CarNames
0                 alfa-romero giulia
1                alfa-romero stelvio
2           alfa-romero Quadrifoglio
3                         audi 100ls
4                           audi fox
5                          audi 5000
6                          audi 4000
7                audi 5000s (diesel)
8                           bmw 320i
9                             bmw x1
10                            bmw x3
11                            bmw z4
12                            bmw x4
13                            bmw x5
14                  chevrolet impala
15             chevrolet monte carlo
16               chevrolet vega 2300
17                     dodge rampage
18               dodge challenger se
19                        dodge d200
20                 dodge monaco (sw)
21                dodge colt hardtop
22                   dodge colt (sw)
23              dodge coronet custom
24                 dodge dart custom
25         dodge coronet custom (sw)
2

In [19]:
### Correction spellings in CarName
car_price["CarName"].replace("audi 100 ls", "audi 100ls", inplace=True)
car_price["CarName"].replace("maxda rx3", "mazda rx3", inplace=True)
car_price["CarName"].replace("maxda glc deluxe", "mazda glc deluxe", inplace=True)
car_price["CarName"].replace("porcshce panamera", "porsche panamera", inplace=True)
car_price["CarName"].replace("toyouta tercel", "toyota tercel", inplace=True)
car_price["CarName"].replace("vokswagen rabbit", "volkswagen rabbit", inplace=True)
car_price["CarName"].replace("vw rabbit", "volkswagen rabbit", inplace=True)
car_price["CarName"].replace("vw dasher", "volkswagen dasher", inplace=True)

In [40]:
car_price["drivewheel"].unique()

array(['rwd', 'fwd', '4wd'], dtype=object)

In [41]:
### Correcting spellings in drivewheel
car_price["drivewheel"].replace("4wd","fwd",inplace=True)

#### Removing extra columns

In [23]:
car_price.drop(columns=["car_ID"], inplace=True)

In [28]:
car_price.columns

Index(['symboling', 'CarName', 'fueltype', 'aspiration', 'doornumber',
       'carbody', 'drivewheel', 'enginelocation', 'wheelbase', 'carlength',
       'carwidth', 'carheight', 'curbweight', 'enginetype', 'cylindernumber',
       'enginesize', 'fuelsystem', 'boreratio', 'stroke', 'compressionratio',
       'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price'],
      dtype='object')

#### Data Exploration

In [29]:
car_price.head()

Unnamed: 0,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,171.2,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,2,audi 100ls,gas,std,four,sedan,fwd,front,99.8,176.6,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,176.6,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [54]:
print("Fuel Types:")
print(car_price["fueltype"].value_counts())
print("\nAspiration Types:")
print(car_price["aspiration"].value_counts())
print("\nNo. of doors:")
print(car_price["doornumber"].value_counts())
print("\nCarbody Types:")
print(car_price["carbody"].value_counts())
print("\nDrivewheel Types:")
print(car_price["drivewheel"].value_counts())
print("\nEngine Locations:")
print(car_price["enginelocation"].value_counts())
print("\nFuel System Types:")
print(car_price["fuelsystem"].value_counts())

Fuel Types:
gas       185
diesel     20
Name: fueltype, dtype: int64

Aspiration Types:
std      168
turbo     37
Name: aspiration, dtype: int64

No. of doors:
four    115
two      90
Name: doornumber, dtype: int64

Carbody Types:
sedan          96
hatchback      70
wagon          25
hardtop         8
convertible     6
Name: carbody, dtype: int64

Drivewheel Types:
fwd    129
rwd     76
Name: drivewheel, dtype: int64

Engine Locations:
front    202
rear       3
Name: enginelocation, dtype: int64

Fuel System Types:
mpfi    94
2bbl    66
idi     20
1bbl    11
spdi     9
4bbl     3
mfi      1
spfi     1
Name: fuelsystem, dtype: int64
