Skip to content

7MustafaAdelIbrahim/Mileage-per-gallon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Content

Sources and Relevant Information:

The dataset is downloaded from UCI Machine Learning Repository which contains one row per car model. This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University* and was used in the July 7, 1993 American Statistical Association Exposition. This dataset is a slightly modified version of the dataset provided in the StatLib library. In line with the use by Ross Quinlan (1993) in predicting the attribute "mpg", 8 of the original instances were removed because they had unknown values for the "mpg" attribute. The original dataset is available in the file "auto-mpg.data-original". "The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes." (Quinlan, 1993)

Number of Instances: 398. Number of Attributes: 9

The Data Dictionary

Column Contains
MPG Miles per Gallon.
cylinders No. of Cylinders.
displacement Engine size.
horsepower Engine power.
Weight Car weight.
acceleration how fast can accelerate in seconds
model_year Car year model.
Origin Car country of manufacture.
car_name Car model.

The model that you'll be building will depend on the physical characteristics of the cars rather than the model names or manufacturers, so you'll remove the corresponding columns from the data. [model_year, car_name]

EDA and Data visualization.

The ultimate goal of data visualization is to tell a story. We are trying to convey information about the data as efficiently as possible. Visualizing means encoding information about the data in a graphical way. Doing this well allows the reader can quickly and accurately understand the message we are trying to transmit.

  • 63.5 % of cars are American, about 17.5% are European, and about 19.8% are Japanese made. origin

But they may be important in the analysis. We'll take about year in this readme file. Butfor the name, Somewhat expected observation is that the models differ greatly according to the country in which they are manufactured. As every country has it's own company and it's brands in the field of car. For example, we foundt that:

24322577-ebe6a4c02f5e7948cf0424ff724d13f5

The relation between origin country and car model.

car and model

There are little number cars in our dataset whose high-efficiency, and which can travel a large number of miles per gallon.

performance

Cars with american origin country are the least efficient, as the average is 18 MPG, and reach 39 mpg as the maximum.‎ But the Japanese ones, they reach 46 mpg, and the European reaches 44 mpg.

performance and mpg

This is due to several reasons, but before we talk about the reasons, it's worth to mentioned that the average MPG has increased over time, but with a high standard deviation. Meaning that there is high differences or variation in MPG. This is because the MPG for American cars increased in 1971 from 1970, and decreased in the period from 1971 to 1973 to the point where it decreased from 1970. It increased in the period from 1973 to 1974 and it happened A minor setback in 1975. Then, it continues to increase almost regularly, but it is always less than the Japanese and European. But for the European and Japanese cars, the MPG was randomly increased and decreased and was always higher than the American one, and this is the reason ‎why the std is high.

mpg over time

Back to the reasons that American Arabs are the least efficient.

The 'horsepower' of a car is in an inverse or negative relationship with MPG, as the higher the horsepower, the less MPG; Because the car will burn more fuel. With each increase in horsepower, the MPG will be cut by -0.156. And American cars have large average horsepower compared to Japanese and European ones The number of cylinders, the higher the number of cylinders in a car, the more petrol it burns. As the number of cylinders increases by 1, the mpg decreases by 3.57.

displacment + mpg

The number of cylinders in American vehicles is greater than Japanese and European ones. Whereas, the number of cylinders in American cars starts from 4 and reaches 8, while the Japanese and European cars are almost 4 only.

cylinders+ horsepower no cylinders

cylinders + origin

Weight of vehicles is in an inverse negative relationship with MPG, and this is normal because resistance will increase, fuel consumption will increase, and therefore MPG will decrease. American vehicles are the most in weight. As for Japanese and European Arabic, the MPG changes randomly, increases and decreases with time. And this is the explanation, because it has a great stander-deviation.

weight + mpg

Displacement or engine capacity - the higher the MPG, the lower the MPG And American Arabs were the highest in displacement, ranging from 80 to 460, while the Japanese were from 60 to 170, and the European ones were similar.

displacment + mpg

The amount of acceleration is not very effective and it’s along is not a good indicator for mpg. And the distribution has a bell shaped, and a small number of cars in the dataset which can reach a speed of 60 miles per hours in a few seconds, and they were European.acc + mpg

acc

Our target is a building predictive model based on the physical characteristics of each Arabic, meaning the year, name columns are not important, and we do the machine learning model.

car models

[opel, saab, mercedes-benz, bmw…etc] is with Europe orign country. [Toyota, Datsun, honda…etc] is with japanes origin country. [Ford, Chevrolet, Plymouth, amc, dodge…etc] are American.

wich makes sense, and somewhat expected.

Inspiration

I have used this dataset for practicing my exploratory analysis skills and bulding a machine learning model to predict MPG.

About

Explore, analyze & building predictive model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published