# Cars Mileage Prediction

#### Business Problem:          
In the automotive industry, fuel efficiency is a significant factor for consumers, affecting their purchasing decisions due to rising fuel costs and environmental concerns.By analyzing the factors that contribute to higher mpg, manufacturers can make informed decisions to optimize their vehicle designs.

#### Objective:
To determine the key factors that influence the fuel efficiency (miles per gallon, mpg) of cars and provide actionable insights for automobile manufacturers to enhance their vehicle designs for improved fuel economy.

 ####  Constraints:
- Cost Savings for Consumers
- Better Purchase Decisions: Consumers can use mileage predictions to make more informed decisions when purchasing a vehicle, selecting models that offer better fuel economy and lower operational costs.
- Fuel Cost Reduction: Accurate predictions of fuel efficiency help consumers estimate their potential fuel expenses, leading to better financial planning and savings.


### Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df=pd.read_excel(r'C:\Users\Y SAI KUMAR\Downloads\mtcars (1).xlsx')

In [3]:
df.head()

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   mpg     32 non-null     float64
 1   cyl     32 non-null     int64  
 2   disp    32 non-null     float64
 3   hp      32 non-null     int64  
 4   drat    32 non-null     float64
 5   wt      32 non-null     float64
 6   qsec    32 non-null     float64
 7   vs      32 non-null     int64  
 8   am      32 non-null     int64  
 9   gear    32 non-null     int64  
 10  carb    32 non-null     int64  
dtypes: float64(5), int64(6)
memory usage: 2.9 KB


In [7]:
df.isna().sum()


mpg     0
cyl     0
disp    0
hp      0
drat    0
wt      0
qsec    0
vs      0
am      0
gear    0
carb    0
dtype: int64

In [8]:
df.duplicated().sum()

0

### Model Building

#### Splitting Into Input and output

In [9]:
a=df.drop('mpg',axis=1)
b=df['mpg']

In [10]:
a.head()

Unnamed: 0,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [11]:
b.head()

0    21.0
1    21.0
2    22.8
3    21.4
4    18.7
Name: mpg, dtype: float64

### Splitting Into train and test

In [12]:
from sklearn.model_selection import train_test_split

In [13]:
a_train,a_test,b_train,b_test=train_test_split(a,b,test_size=0.2,random_state=3)

In [14]:
a_train.shape

(25, 10)

In [15]:
a_test.shape

(7, 10)

In [16]:
b_train.shape

(25,)

In [17]:
b_test.shape

(7,)

## Algorithms=KNN,Linear Regression,svm,DT¶

####    KNN

In [18]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error,r2_score

In [19]:
knn=KNeighborsRegressor()
knn.fit(a_train,b_train)
b_pred=knn.predict(a_test)
print(np.sqrt(mean_squared_error(b_test,b_pred)))

3.6015631526959435


###  Linear Regression

In [20]:
from sklearn.linear_model import LinearRegression

In [21]:
lr=LinearRegression()
lr.fit(a_train,b_train)
b_pred=lr.predict(a_test)
print(np.sqrt(mean_squared_error(b_test,b_pred)))

9.674109549795201


### Support Vector Machine

In [22]:
from sklearn.svm import SVR

In [23]:
sv=SVR()
sv.fit(a_train,b_train)
b_pred=sv.predict(a_test)
print(np.sqrt(mean_squared_error(b_test,b_pred)))

5.984657094589362


### Decision Tree

In [24]:
from sklearn.tree import DecisionTreeRegressor

In [25]:
dt=DecisionTreeRegressor()
dt.fit(a_train,b_train)
b_pred=dt.predict(a_test)
print(np.sqrt(mean_squared_error(b_test,b_pred)))

4.2441556454561296


### Prediction

In [24]:
dt.predict([[5,100.0,58,3.00,2.3,16.40,0,0,4,1]])



array([24.4])

In [25]:
df.head()

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [26]:
lr.predict([[5,100.0,58,3.00,2.3,16.40,0,0,4,1]])



array([18.99812452])

In [27]:
lr.coef_

array([ 0.58382872,  0.02094722, -0.02132637,  2.33212927, -3.63737215,
        1.15885272,  0.37576879,  2.50355841,  1.6802353 , -0.43436816])

In [28]:
lr.intercept_

-8.701001424961241

In [52]:
cyl=5
disp=100.0
hp=58
drat=3.00
wt=2.3
qsec=16.40
vs=0
am=0
gear=4
carb=1
mileage=0.58382872*cyl+0.02094722*disp-0.02132637*hp+2.33212927*drat-3.63737215*wt+1.15885272*qsec+0.37576879*vs+2.50355841*am+1.6802353*gear-0.43436816*carb-8.701001424961241
print(mileage)

18.998124228038762


### Pickling

In [26]:
import pickle

In [27]:
pickle.dump(lr,open("lin.pkl","wb"))

In [28]:
model= pickle.load(open("lin.pkl","rb"))

In [34]:
model.predict([[5,100.0,58,3.00,2.3,16.40,0,0,4,1]])



array([28.08])