<h1><p style="text-align:center">Vehicle Fuel Efficiency Analysis and Modelling</p></h1>

## Business Context

Fuel Economy is a key driver of total cost of ownership, pricing competitiveness, and regulatory compliance in the automotive industry.

Consumer Reports Magazine, a trusted source for information on consumer products and services seeks to understand which vehicle characteristics influence fuel efficiency.

## Business Problem

Fuel economy is a critical driver of vehicle ownership cost and a key consideration for consumers. To support an upcoming article and inform strategic decision-making, the organization seeks to analyze how automobile characteristics influence fuel efficiency.

Specifically, the organization aims to:

* Identify the vehicle attributes that has the greatest impact on fuel economy
* Develop a predict model to estimate fuel economy based on vehicle characterisitics
* Translate analytical and modelling results into actionable insights to inform product positioning, pricing strategies, and consumer guidance.

## Analytical Objective

Prepare and explore data, build and evaluate a multiple linear regression model to predict vehicle fuel efficiency(MPG), and interpret model outputs.

## Data Overview

Field  |  Description
------- | -----------
mpg  | The fuel economy of the car in terms of miles travelled per gallon of gasoline
cylinders | The number of cylinders in the car's engine
displacement | The volume of air displaced by all the pistons of a piston engine
horsepower | Horsepower is a measure of power the engine produces
weight | The total weight of the car
acceleration | The time in seconds it takes for the car to reach 60 miles per hour
model year | The year (in the 20th century) the car model was released
origin | The region where the car was manufactured. 1 - USA. 2 - Europe. 3 - Japan
car name | The name of the car model.

In [1]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor as vif
import scipy.stats as stats
from sklearn.model_selection import train_test_split, KFold
from sklearn.metrics import mean_absolute_error as mae, r2_score as r2

In [2]:
# load data
file = "auto-mpg.csv"
auto_df = pd.read_csv(file)

# display few rows of data
auto_df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


In [3]:
auto_df.tail()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
393,27.0,4,140.0,86,2790,15.6,82,1,ford mustang gl
394,44.0,4,97.0,52,2130,24.6,82,2,vw pickup
395,32.0,4,135.0,84,2295,11.6,82,1,dodge rampage
396,28.0,4,120.0,79,2625,18.6,82,1,ford ranger
397,31.0,4,119.0,82,2720,19.4,82,1,chevy s-10


In [4]:
# Dataframe info
auto_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    398 non-null    object 
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model year    398 non-null    int64  
 7   origin        398 non-null    int64  
 8   car name      398 non-null    object 
dtypes: float64(3), int64(4), object(2)
memory usage: 28.1+ KB
