# Part II - Vehicle Properties and the effects on CO2 emissions
## by Boddington Anesu Muzvidzwa



## Investigation Overview


>The aim of this data analysis was to find out which vehicle properties led to the least production of Co2 emissions from vehicles and what relationship did some of these properties have with CO2 emissions. For this presentation we focused  more on the engine sizes, combined fuel consumption and vehicle class and how they affect CO2 emissions





## Dataset Overview


> The dataset consists of 7385 cars with the following features Make, Model, Vehicle class, Engine Size(L), Cylinders, Transmission, Fuel Type, Fuel Consumption City (L/100 km), Fuel Consumption Hwy (L/100 km), Fuel Consumption Comb (L/100 km),  Fuel Consumption Comb (mpg) and finally CO2 Emissions(g/km)




### Data description
#### Fuel type meanings
> D - Diesel

> E - Ethanol

> N - Natural Gas

> X - Regualr Gas

> Z - Premium Gasoline

#### Transmission Types
> A - Automatic

> AM - Automated Manuel

> AS - Aoutomatic with select shift

> AV - Continously Variable 

> M - Manual

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline


# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [2]:
# load in the dataset into a pandas dataframe
co2 = pd.read_csv('CO2 Emissions_Canada.csv')


In [3]:
#data wragnling
#renaming the columns to remove spaces (as to reduce invalid syntax errors)
co2.rename(columns = {'Vehicle Class':'Vehicle_Class',
                       'Engine Size(L)':'Engine_Size(L)',
                      'Fuel Type':'Fuel_Type', 
                      'Fuel Consumption City (L/100 km)':'Fuel_Consumption_City(L/100km)',
                      'Fuel Consumption Hwy (L/100 km)':'Fuel_Consumption_Hwy(L/100km)',
                      'Fuel Consumption Comb (L/100 km)':'Fuel_Consumption_Comb(L/100km)',
                      'Fuel Consumption Comb (mpg)':'Fuel_Consumption_Comb(mpg)',
                      'CO2 Emissions(g/km)':'CO2_Emissions(g/km)'}, inplace = True)

In [4]:
#data wragnling 
#replacing  hyphens with underscores
co2['Vehicle_Class'] = co2['Vehicle_Class'].str.replace(" - ","_")
co2['Vehicle_Class'] = co2['Vehicle_Class'].str.replace("-","_")



In [5]:
#not all trasnmissions have gear numbers 
#number of gears will be removed for the sake of this analysis

co2['Transmission'] = co2.Transmission.str.replace('(\d)','', regex = True )


In [6]:
#data wrangling
#changing datatypes Transmission, vehicle class and Fuel Type

for col in ['Vehicle_Class','Fuel_Type','Transmission','Make']:
   
    co2[col] = co2[col].astype('category')

> Note that the above cells have been set as "Skip"-type slides. That means
that when the notebook is rendered as http slides, those cells won't show up.

## Destribution of CO2 emissions

> Firstly we need to see how the carbon dioxide emissions are spread out. The histogram plot shows a symmetric, unimodal distribution with the majority of vehicles releasing about 250g/km of carbon dioxide. The highest releasing 522 g/km and the lowest giving out 96g/km. This is quite a wide range.



In [7]:
#histogram showing the destribution of co2
sb.displot(co2['CO2_Emissions(g/km)'], kde=False)
plt.title('Destribution of CO2 Emissions');

AttributeError: module 'seaborn' has no attribute 'displot'

## Relationship between CO2 emissions and engine size 

The bigger the engine size the carbon dioxide will be produced, with 8L engines producing around 500g/km of carbon dioxide.


In [None]:
#plotting scatter plot with transparency
plt.figure(figsize=(20,10))

plt.scatter(data = co2, x = 'Engine_Size(L)', y = 'CO2_Emissions(g/km)', alpha= 1/10);
plt.xlabel('Engine_Size(L)')
plt.ylabel('CO2_Emissions(g/km)')
plt.title('Relationship between Engine size and co2 emissions');

## Co2 emissions vs Fuel Consumption

This relationship  shows a linear relationship between carbon dioxide emissions and combined fuel consumption. however from the plot three distinct clusters of points could be observed. A short one on top the main  line of best fit and another below the line of best fit, this one longer than the first but shorter than the main regression line. 


In [None]:
# Convert g/km to g/100km by mathematical operation (multiply co2(g/km) by 100)
co2['CO2_Emissions(g/100km)'] = 100*co2["CO2_Emissions(g/km)"]

#regression line plot of co emissions vs fuel consumption 
plt.figure(figsize=(20,10))
sb.regplot(data = co2, x = 'Fuel_Consumption_Comb(L/100km)', y = 'CO2_Emissions(g/100km)', x_jitter = 0.5,
          scatter_kws={'alpha':1/10} );
plt.xlabel('Combined Fuel Consumption')
plt.ylabel('CO2_Emissions(g/100km)');

## Types of fuel and their CO2 emissions

Fuel type has a lot of influence on CO2 emissions. Diesel burns less quickly than gasoline or petrol but does it produce less carbon dioxide? The violin plot shows that Diesel has a gier fuel co2 emission than regular gasoline (X) and almost the same with premium gasoline(Z). But an unexpected result was shown by this graph, Ethanol(E) had a higher CO2 emissions than the three conventional fuels clearly something was not adding up so further analyisis was made.


In [None]:
#setting fihure size to get the bigger pucture
plt.figure(figsize=(10,5))


base_color = sb.color_palette()[0]
sb.violinplot(data=co2, x='Fuel_Type', y='CO2_Emissions(g/km)', color=base_color, inner='quartile');

## CO2 emissions and Fuel Consumption by Fuel type

The data was divided into columns of vehicle class so that we could get a clear picture as to which factors actually contribute more to CO2 emissions. A lot of discrepancies were clarified with this plot. First we now understand why in the CO2 emissions vs Combined fuel consumption had 3 distinct scatters, it is because they represented different fuel types. Secondly when divided by vehicle class Ethanol(E) has the lowest CO2 emissions but with a higher fuel consumption rate. Diesel has the highest CO2 emission with least fuel consumption, regular gasoline and premium gasoline have an identical relationship between CO2 and fuel consumption but premium gasoline starts at a much higher fuel consumption than regular gas. 




In [None]:

f = sb.FacetGrid(data = co2, hue = 'Fuel_Type', col='Vehicle_Class', col_wrap = 4,
                 hue_order= ['X','Z','E','D','N'],size = 4, aspect = 1)
f = f.map(sb.regplot, 'Fuel_Consumption_Comb(L/100km)', 'CO2_Emissions(g/100km)', x_jitter = 0.5);
f.add_legend()
plt.xlabel('Fuel_Consumption_Comb(L/100km)')
plt.ylabel('CO2_Emissions(g/km)');

### Generate Slideshow
Once you're ready to generate your slideshow, use the `jupyter nbconvert` command to generate the HTML slide show.  

In [None]:
# Use this command if you are running this file in local
!jupyter nbconvert Part_II_slide_deck_Anesu.ipynb --to slides --post serve --no-input --no-promp

### Submission
If you are using classroom workspace, you can choose from the following two ways of submission:

1. **Submit from the workspace**. Make sure you have removed the example project from the /home/workspace directory. You must submit the following files:
   - Part_I_notebook.ipynb
   - Part_I_notebook.html or pdf
   - Part_II_notebook.ipynb
   - Part_I_slides.html
   - README.md
   - dataset (optional)


2. **Submit a zip file on the last page of this project lesson**. In this case, open the Jupyter terminal and run the command below to generate a ZIP file. 
```bash
zip -r my_project.zip .
```
The command abobve will ZIP every file present in your /home/workspace directory. Next, you can download the zip to your local, and follow the instructions on the last page of this project lesson.
