# European Car Fleet and CO2 emissions
## (Prepared by Shamil Murzin, GyungYoon Park, Marco Gagliano. February 27, 2023)

This project was chosen to understand the changes in the automotive market in the EU countries from 2010 to 2021, considering the 2015 Paris Agreement. By using the EU car registration dataset, we can analyze the evolution of the vehicles fleet in Europe, study the relationship between the car features and its CO2 emissions, understand whether certain countries/manufacturers produce less carbon-intensive cars and see which countries have started to switch to lower CO2 emission vehicles, by looking at the historical trend over the past 10 years.  

The EU car dataset can be downloaded from: http://co2cars.apps.eea.europa.eu/ (in total 12 csv files). The dataset contains 58 000 000 records of the registered cars during 2010-2021. There are several limitations of this dataset:
1. Not all characteristics are recorded for each vehicle, some are missing.
2. The dataset does not indicate the condition of the car at the present time (for example, whether it is still in use). We only have data when the car was bought.
3. The size of data ~ 58.000.000 records in total.

Features used in this study:
1. Country - in which country the new car was registered (categorical feature)
2. Mk - Manufacture name, which produced the car (categorical feature)
3. Mass, kg - mass of the car (numeric feature)
4. ENEDC, g/km - emission of CO2 in g/km based on test for emission certification (numeric feature)
5. Engine capacity, cm3 - (numeric feature)
6. Ft - fuel type (categorical feature)
7. Fm - Fuel mode (categorical feature)
8. Cn - commercial name of the vehicle model (categorical)

The second set of data which was used in this study is  countries boundary geometry for visualization purposes (https://github.com/leakyMirror/map-of-europe/blob/master/TopoJSON/europe.topojson)

# Part 1 Data availability

In this notebook we QCed our data downloaded from the data source, checked data availability for further downstream analysis.

Data Source:

CO2 emissions from new passenger cars registered in EU27, Iceland (from 2018) and Norway (from 2019): http://co2cars.apps.eea.europa.eu/

Geographical topojson data for Europe: https://github.com/leakyMirror/map-of-europe/blob/master/TopoJSON/europe.topojson

## 1. Importing libraries

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
import os
import altair as alt
directory = os.getcwd()

In [51]:
# We used geopandas library to import and read the spatial geometry details for each counrty and visualize it in altair,
# in case if geopandas is not installed in your Jupyter, please uncomment the next line:
#! pip install geopandas

## 2. Importing data

The EU car dataset can be downloaded from: http://co2cars.apps.eea.europa.eu/ (in total 12 csv files). The dataset contains 58 000 000 records of the registered cars during 2010-2021. The overall percentage of data that is missing is important. We first checked amount of data per year, and then percentage of available data per dataset.

### 2a. Checking data availability per year

In [2]:
year = range(2010,2022)
data_availability ={}
for i in year:
    data = '/data/data'+str(i)+'.csv'
    df = pd.read_csv(directory + data, low_memory=False)
    #Features selected
    df = df[['Country','Mh','Man','Mk','Cn','m (kg)','Enedc (g/km)', 'Ft', 'Fm', 'ec (cm3)']]
    data_availability[i] = (df.count()/len(df)*100).to_dict()
    data_availability[i]['total_rows'] = len(df)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/shamil/Desktop/MADS/2023/01 January/SIADS 696 Milestone 2/Final report/source/data/data2010.csv'

In [11]:
df_data_availability = pd.DataFrame(data_availability)

In [12]:
df_data_availability

Unnamed: 0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
Country,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,99.99676,99.99412,100.0
Mh,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
Man,100.0,99.98281,100.0,100.0,100.0,100.0,100.0,100.0,99.99999,99.99776,99.9956,100.0
Mk,99.721798,99.851921,94.975918,95.439064,94.94376,98.426398,99.846597,99.87225,99.77937,99.992,99.85492,99.99709
Cn,99.052015,97.845769,99.291134,99.502119,99.395845,99.704297,99.647051,99.97635,99.97698,99.96543,99.72758,99.86802
m (kg),98.386781,99.713191,99.833144,99.75524,99.930133,99.951435,99.945155,99.99469,99.99084,99.9991,99.99781,99.99968
Enedc (g/km),98.590095,99.659207,99.556787,99.698966,99.750442,99.819129,99.841335,99.96695,99.90783,99.95632,99.94608,24.37105
Ft,98.516608,99.995175,99.996158,97.177242,99.967699,99.935776,99.905287,99.99849,99.84639,100.0,100.0,100.0
Fm,89.164135,94.854334,97.862426,95.311374,98.878544,99.999773,99.996155,99.99978,99.99935,99.98178,99.99865,99.98191
ec (cm3),91.332358,92.339707,94.148771,98.79496,99.353255,99.525922,99.436578,99.12107,99.00519,97.76548,93.65751,89.91332


In [13]:
total_rows_per_dataset = df_data_availability.iloc[-1,:]
total_rows_per_dataset = total_rows_per_dataset.to_frame().reset_index()
total_rows_per_dataset['index'] = pd.to_datetime(total_rows_per_dataset['index'], format='%Y')

In [14]:
visualization1 = alt.Chart(total_rows_per_dataset, title="Number of registred cars in datasets"
                               ).mark_bar(size=20, color = '#96EE77').encode(
    x=alt.X('index:T', title='Year'),
    y=alt.Y('total_rows:Q', title='Number of registered cars'),
    tooltip=[alt.Tooltip('total_rows', title='Registered cars'), alt.Tooltip('index:T', format='%Y')]
).properties(height = 350, width = 350)
visualization1

We checked other sources (https://www.acea.auto/figure/passenger-car-registrations-in-europe-since-1990-by-country/) and found that the uploaded dataset is only complete for 2018, 2019, 2020 years. and 2021. We see a decline in newly registered vehicles due to the COVID pandemic in 2020-2021.

### 2b. Visualizing missing values

In [15]:
features_availability = df_data_availability.iloc[:10,:]

In [16]:
features_availability

Unnamed: 0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
Country,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,99.996761,99.994115,100.0
Mh,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
Man,100.0,99.98281,100.0,100.0,100.0,100.0,100.0,100.0,99.999987,99.997761,99.995597,100.0
Mk,99.721798,99.851921,94.975918,95.439064,94.94376,98.426398,99.846597,99.872246,99.779368,99.992,99.854919,99.997087
Cn,99.052015,97.845769,99.291134,99.502119,99.395845,99.704297,99.647051,99.97635,99.976979,99.965432,99.727578,99.868016
m (kg),98.386781,99.713191,99.833144,99.75524,99.930133,99.951435,99.945155,99.994693,99.990844,99.999097,99.997811,99.999677
Enedc (g/km),98.590095,99.659207,99.556787,99.698966,99.750442,99.819129,99.841335,99.966946,99.907832,99.956322,99.946076,24.371045
Ft,98.516608,99.995175,99.996158,97.177242,99.967699,99.935776,99.905287,99.998487,99.846393,100.0,100.0,100.0
Fm,89.164135,94.854334,97.862426,95.311374,98.878544,99.999773,99.996155,99.999778,99.999352,99.98178,99.998646,99.981905
ec (cm3),91.332358,92.339707,94.148771,98.79496,99.353255,99.525922,99.436578,99.121075,99.00519,97.765477,93.65751,89.913315


In [17]:
features_availability = features_availability.reset_index()

In [18]:
features_availability = features_availability.melt(id_vars=['index'])

In [19]:
base = alt.Chart(features_availability, title="Feature availability per year").encode(
    x=alt.X('variable:N', title="Year"),
    y=alt.Y('index:N', title="Feature")
).properties(height = 350, width = 350)

heatmap = base.mark_rect(stroke='white',strokeWidth=1).encode(
    color=alt.Color('value:Q', title="Percentage", scale=alt.Scale(domain=[20, 120],
                                      scheme='viridis', reverse=True)))

text = base.mark_text(baseline='middle', color='white', size=9).encode(
    text=alt.Text ('value:Q', format='.1f'),
    color=alt.condition(alt.expr.datum['value'] > 30,
                            alt.value('white'),
                            alt.value('black')))


visualization2 = (heatmap + text)

In [20]:
visualization2

There is a small percentage of missing data for each feature per year - around 0-8%, except for CO2 emissions for the 2021 dataset where ~75% of the data is missing.

### 3. Combining visualizations

In [21]:
alt.hconcat(visualization1, visualization2).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_title(fontSize=16)

## 4. Feature representativness

As noted above, the datasets for 2010-2017 not complete. For further analysis, we need to make sure that these samples are representative, so we will compare some statistics of their characteristics with the complete ones.

In [22]:
# Loading not complete datasets

In [23]:
df2010 = pd.read_csv('data/data2010.csv', low_memory=False)
df2011 = pd.read_csv('data/data2011.csv', low_memory=False)
df2012 = pd.read_csv('data/data2012.csv', low_memory=False)
df2013 = pd.read_csv('data/data2013.csv', low_memory=False)
df2014 = pd.read_csv('data/data2014.csv', low_memory=False)
df2015 = pd.read_csv('data/data2015.csv', low_memory=False)
df2016 = pd.read_csv('data/data2016.csv', low_memory=False)
df2017 = pd.read_csv('data/data2017.csv', low_memory=False)

In [24]:
# Loading complete datasets

In [25]:
df2018 = pd.read_csv('data/data2018.csv', low_memory=False)
df2019 = pd.read_csv('data/data2019.csv', low_memory=False)
df2020 = pd.read_csv('data/data2020.csv', low_memory=False)
df2021 = pd.read_csv('data/data2021.csv', low_memory=False)

In [26]:
# Filtering data (selecting features)
features_list = ['Country','Mh','Man','Mk','Cn','m (kg)','Enedc (g/km)', 'Ft', 'Fm', 'ec (cm3)']

In [27]:
df2010 = df2010[features_list]
df2011 = df2011[features_list]
df2012 = df2012[features_list]
df2013 = df2013[features_list]
df2014 = df2014[features_list]
df2015 = df2015[features_list]
df2016 = df2016[features_list]
df2017 = df2017[features_list]

In [28]:
df2018 = df2018[features_list]
df2019 = df2019[features_list]
df2020 = df2020[features_list]
df2021 = df2021[features_list]

In [29]:
# Let's check what we have

In [30]:
df2010.head()

Unnamed: 0,Country,Mh,Man,Mk,Cn,m (kg),Enedc (g/km),Ft,Fm,ec (cm3)
0,GB,OPEL,ADAM OPEL GMBH,VAUXHALL,ZAFIRA SRI XP 150 CDTI A,1613.0,191.0,diesel,M,1910.0
1,GB,OPEL,ADAM OPEL GMBH,VAUXHALL,ZAFIRA ACTIVE,1503.0,177.0,petrol,M,1796.0
2,GB,OPEL,ADAM OPEL GMBH,VAUXHALL,AGILA DESIGN,1160.0,120.0,diesel,M,1248.0
3,GB,OPEL,ADAM OPEL GMBH,VAUXHALL,ASTRA SRI TURBO,1393.0,138.0,petrol,M,1364.0
4,GB,OPEL,ADAM OPEL GMBH,VAUXHALL,ZAFIRA ELITE CDTI AUTO,1613.0,186.0,diesel,M,1910.0


### List of countries and country representativeness

In [26]:
## for each dataset, we computed each country share in whole dataset

In [27]:
country2010 = (df2010.pivot_table(columns=['Country'], aggfunc='size'))/len(df2010)
country2011 = (df2011.pivot_table(columns=['Country'], aggfunc='size'))/len(df2011)
country2012 = (df2012.pivot_table(columns=['Country'], aggfunc='size'))/len(df2012)
country2013 = (df2013.pivot_table(columns=['Country'], aggfunc='size'))/len(df2013)
country2014 = (df2014.pivot_table(columns=['Country'], aggfunc='size'))/len(df2014)
country2015 = (df2015.pivot_table(columns=['Country'], aggfunc='size'))/len(df2015)
country2016 = (df2016.pivot_table(columns=['Country'], aggfunc='size'))/len(df2016)
country2017 = (df2017.pivot_table(columns=['Country'], aggfunc='size'))/len(df2017)

In [28]:
country2018 = (df2018.pivot_table(columns=['Country'], aggfunc='size'))/len(df2018)
country2019 = (df2019.pivot_table(columns=['Country'], aggfunc='size'))/len(df2019)
country2020 = (df2020.pivot_table(columns=['Country'], aggfunc='size'))/len(df2020)
country2021 = (df2021.pivot_table(columns=['Country'], aggfunc='size'))/len(df2021)

In [29]:
Country_summary = pd.concat([country2010,country2011,country2012, country2013,country2014,country2015,
                             country2016,country2017,country2018, country2019,country2020,country2021],axis=1)

In [30]:
Country_summary.columns = range(2010, 2022)

In [31]:
# Country representativeness
Country_summary

Unnamed: 0_level_0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AT,0.046132,0.041682,0.047403,0.039672,0.068575,0.130459,0.136581,0.005394,0.022302,0.021206,0.020969,0.024097
BE,0.060588,0.066032,0.071693,0.06764,0.066452,0.065667,0.059594,0.110642,0.035878,0.035525,0.036786,0.038688
BG,0.009165,0.009877,0.009262,0.008292,0.008803,0.008424,0.007889,0.005231,0.00187,0.002267,0.001887,0.002446
CY,0.006565,0.005121,0.003891,0.0028,0.003278,0.003395,0.003859,0.002571,0.000836,0.000775,0.000838,0.001056
CZ,0.022837,0.019455,0.01867,0.01888,0.027073,0.031502,0.02128,0.044568,0.015578,0.015806,0.016807,0.020265
DE,0.075093,0.096119,0.143143,0.1664,0.167367,0.13689,0.126412,0.013503,0.221682,0.228246,0.241259,0.255051
DK,0.016426,0.021328,0.023955,0.020788,0.02201,0.021074,0.019683,0.044302,0.01403,0.01437,0.016646,0.018217
EE,0.035841,0.008052,0.008348,0.008028,0.006238,0.00787,0.008036,0.005188,0.001726,0.001764,0.001571,0.00225
ES,0.043001,0.038021,0.017866,0.038784,0.069134,0.05346,0.033427,0.003171,0.090359,0.087177,0.078292,0.091576
FI,0.01721,0.025961,0.024809,0.019949,0.021582,0.021857,0.020373,0.002058,0.007595,0.007029,0.00797,0.00959


From the table above, Norway, Iceland were not represented in first 8-9 datasets accordingly and Hungary for the first 4 ones. There is no UK data in 2021. There is no any data for Switzerland. Let's transform our DataFrame to altair friendly long-shape DataFrame format with pandas melt function.

In [32]:
Country_summary = Country_summary.reset_index()
Country_summary = Country_summary.melt(id_vars=['Country'])
Country_summary['variable'] = pd.to_datetime(Country_summary['variable'], format='%Y')
Country_summary.head()

Unnamed: 0,Country,variable,value
0,AT,2010-01-01,0.046132
1,BE,2010-01-01,0.060588
2,BG,2010-01-01,0.009165
3,CY,2010-01-01,0.006565
4,CZ,2010-01-01,0.022837


In [33]:
Country_representativity_per_year = alt.Chart(Country_summary, title="Country representativity per year"
                                             ).mark_bar(size=25).encode(
    x=alt.X('value:Q', stack=True, title='Share', scale=alt.Scale(domain=[0, 1.0])),
    y=alt.Y('variable:T',title='Year'),
    color='Country',
    tooltip=['Country', alt.Tooltip('value', title='Share', format='.1%')]
).properties(width = 800, height = 400).configure_axis(
    grid=False,
    labelFontSize=12,
    titleFontSize=14
).configure_title(fontSize=16)
Country_representativity_per_year

From the visualization above, we can see that the 2017 data set is different from the others, 50% of the cars were registered in France.

### Mass of the vehicle, kg

In [31]:
mass2010 = df2010['m (kg)']
mass2011 = df2011['m (kg)']
mass2012 = df2012['m (kg)']
mass2013 = df2013['m (kg)']
mass2014 = df2014['m (kg)']
mass2015 = df2015['m (kg)']
mass2016 = df2016['m (kg)']
mass2017 = df2017['m (kg)']
mass2018 = df2018['m (kg)']
mass2019 = df2019['m (kg)']
mass2020 = df2020['m (kg)']
mass2021 = df2021['m (kg)']
mass_list = [mass2010, mass2011, mass2012, 
             mass2013, mass2014, mass2015, mass2016, mass2017, mass2018, mass2019, mass2020, mass2021]

In [32]:
sample_mass_1000 = []
for i in mass_list:
    clean = i.dropna()
    k = clean.sample(n = 1000)
    sample_mass_1000.append(k.reset_index(drop=True))
mass_df=pd.concat(sample_mass_1000,axis=1)
mass_df.columns = range(2010,2022)

In [33]:
mass_df

Unnamed: 0,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,1055.0,1202.0,1575.0,1635.0,1540.0,2195.0,1395.0,1225.0,1698.0,1200.0,1800.0,1395.0
1,1280.0,2043.0,1298.0,975.0,1510.0,1701.0,1571.0,1165.0,1385.0,1370.0,1607.0,1305.0
2,1585.0,1228.0,2330.0,1146.0,2381.0,1110.0,1772.0,925.0,1829.0,1441.0,2169.0,978.0
3,2275.0,1733.0,1350.0,1289.0,1126.0,1614.0,1278.0,1035.0,1160.0,1755.0,1280.0,1393.0
4,1054.0,1595.0,1264.0,2515.0,1107.0,1505.0,1660.0,1329.0,1254.0,1150.0,1533.0,1639.0
...,...,...,...,...,...,...,...,...,...,...,...,...
995,1597.0,940.0,1698.0,1054.0,1370.0,1337.0,1735.0,1180.0,1157.0,1590.0,1305.0,1636.0
996,2550.0,2650.0,1199.0,1631.0,1345.0,1275.0,1618.0,1265.0,1495.0,1343.0,930.0,930.0
997,1517.0,2254.0,1575.0,1075.0,2141.0,929.0,1891.0,1393.0,1250.0,1445.0,1055.0,1165.0
998,1280.0,1005.0,1179.0,1725.0,1139.0,1465.0,1695.0,1525.0,1805.0,1197.0,1350.0,1627.0


In [34]:
mass_df = mass_df.melt()

In [35]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [36]:
box_plot_mass = alt.Chart(mass_df, title = 'Vehicle mass box plot').mark_boxplot(size=15, extent=0.5, color = 'red').encode(
    x=alt.Y('variable:O', title = 'Year'),
    y=alt.Y('value:Q',scale=alt.Scale(zero=False), title = 'Vehicle mass, kg'),
    #color=alt.Color('variable',legend=None)
).properties(width=250)
box_plot_mass

## Emission New European Driving Cycle (ENEDC) g/km

In [37]:
ENEDC2010 = df2010['Enedc (g/km)']
ENEDC2011 = df2011['Enedc (g/km)']
ENEDC2012 = df2012['Enedc (g/km)']
ENEDC2013 = df2013['Enedc (g/km)']
ENEDC2014 = df2014['Enedc (g/km)']
ENEDC2015 = df2015['Enedc (g/km)']
ENEDC2016 = df2016['Enedc (g/km)']
ENEDC2017 = df2017['Enedc (g/km)']
ENEDC2018 = df2018['Enedc (g/km)']
ENEDC2019 = df2019['Enedc (g/km)']
ENEDC2020 = df2020['Enedc (g/km)']
ENEDC2021 = df2021['Enedc (g/km)']
ENEDC_list = [ENEDC2010, ENEDC2011, ENEDC2012, 
             ENEDC2013, ENEDC2014, ENEDC2015, ENEDC2016, ENEDC2017, ENEDC2018, ENEDC2019, ENEDC2020, ENEDC2021]

In [38]:
sample_ENEDC_1000 = []
for i in ENEDC_list:
    clean = i.dropna()
    k = clean.sample(n = 1000)
    sample_ENEDC_1000.append(k.reset_index(drop=True))
ENEDC_df=pd.concat(sample_ENEDC_1000,axis=1)
ENEDC_df.columns = range(2010,2022)

In [39]:
ENEDC_df = ENEDC_df.melt()

In [40]:
box_plot_ENEDC = alt.Chart(ENEDC_df, title = 'Vehicle ENEDC box plot').mark_boxplot(size=15, extent=0.5, color = 'blue').encode(
    x=alt.Y('variable:O', title = 'Year'),
    y=alt.Y('value:Q',scale=alt.Scale(zero=False), title = 'Vehicle ENEDC g/km'),
    #color=alt.Color('variable',legend=None)
).properties(width=250)
box_plot_ENEDC

## Engine capacity

In [41]:
ENGINE_CAP2010 = df2010['ec (cm3)']
ENGINE_CAP2011 = df2011['ec (cm3)']
ENGINE_CAP2012 = df2012['ec (cm3)']
ENGINE_CAP2013 = df2013['ec (cm3)']
ENGINE_CAP2014 = df2014['ec (cm3)']
ENGINE_CAP2015 = df2015['ec (cm3)']
ENGINE_CAP2016 = df2016['ec (cm3)']
ENGINE_CAP2017 = df2017['ec (cm3)']
ENGINE_CAP2018 = df2018['ec (cm3)']
ENGINE_CAP2019 = df2019['ec (cm3)']
ENGINE_CAP2020 = df2020['ec (cm3)']
ENGINE_CAP2021 = df2021['ec (cm3)']
ENGINE_CAP_list = [ENGINE_CAP2010, ENGINE_CAP2011, ENGINE_CAP2012, 
             ENGINE_CAP2013, ENGINE_CAP2014, ENGINE_CAP2015, ENGINE_CAP2016, ENGINE_CAP2017, ENGINE_CAP2018, ENGINE_CAP2019, ENGINE_CAP2020, ENGINE_CAP2021]

In [42]:
sample_ENGINE_CAP_1000 = []
for i in ENGINE_CAP_list:
    clean = i.dropna()
    k = clean.sample(n = 1000)
    sample_ENGINE_CAP_1000.append(k.reset_index(drop=True))
ENGINE_CAP_df=pd.concat(sample_ENGINE_CAP_1000,axis=1)
ENGINE_CAP_df.columns = range(2010,2022)

In [43]:
ENGINE_CAP_df = ENGINE_CAP_df.melt()

In [44]:
box_plot_ENGINE = alt.Chart(ENGINE_CAP_df, title = 'Vehicle engine capacity box plot').mark_boxplot(size=15, extent=0.5, color = 'green').encode(
    x=alt.Y('variable:O', title = 'Year'),
    y=alt.Y('value:Q',scale=alt.Scale(zero=False), title = 'Vehicle engine capacity cm3'),
    #color=alt.Color('variable',legend=None)
).properties(width=250)
box_plot_ENGINE

In [45]:
alt.hconcat(box_plot_mass, box_plot_ENEDC, box_plot_ENGINE).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_title(fontSize=16)

From the plots above we see vehicle mass remained the same through the 2010-2021 years, except 2017. The average emissions declined through the years, as well as engine capacity. ENEDC feature for 2021 dataset differs from other years.

## 5. Topography dataset

We used geopandas library to import and read the spatial geometry details for each counrty and visualize it in altair,
in case if geopandas is not installed in your Jupyter, please uncomment the next line. The topography dataset was downloaded from https://github.com/leakyMirror/map-of-europe/blob/master/TopoJSON/europe.topojson . TopoJSON files contain both attribute data (country id, NAME) and geospatial data (geometry).

In [46]:
#! pip install geopandas

In [3]:
country_topo = gpd.read_file('europe.topojson')

In [4]:
country_topo.head(10)

Unnamed: 0,id,NAME,geometry
0,AZ,Azerbaijan,"MULTIPOLYGON (((46.17921 38.84211, 46.07431 38..."
1,AL,Albania,"POLYGON ((19.37115 41.85084, 19.34118 41.91335..."
2,AM,Armenia,"MULTIPOLYGON (((45.51238 40.60901, 45.49739 40..."
3,BA,Bosnia and Herzegovina,"POLYGON ((17.64788 42.88431, 17.58045 42.93848..."
4,BG,Bulgaria,"POLYGON ((28.00996 41.98419, 27.97250 41.98419..."
5,CY,Cyprus,"POLYGON ((33.65180 35.34998, 33.71174 35.38332..."
6,DK,Denmark,"MULTIPOLYGON (((11.51154 54.83172, 11.56399 54..."
7,IE,Ireland,"MULTIPOLYGON (((-9.65469 53.22317, -9.70713 53..."
8,EE,Estonia,"MULTIPOLYGON (((23.99400 58.09882, 23.96403 58..."
9,AT,Austria,"POLYGON ((13.82672 48.77259, 13.85669 48.77259..."


In [5]:
alt.Chart(country_topo).mark_geoshape(
    fill='#555566',
    stroke='white'
).project(
    type= 'mercator',
    scale= 350,                          # Magnify
    center= [20,50],                     # [lon, lat]
    clipExtent= [[0, 0], [400, 300]],    # [[left, top], [right, bottom]]
).properties(
    title='Europe (Mercator)',
    width=400, height=300
)

# 6. Test plots

In [59]:
test = df2019[['Country','m (kg)','Enedc (g/km)', 'Ft','ec (cm3)']]
sample = test.sample(15000)

In [60]:
eng_cap_CO2 = alt.Chart(sample).mark_point().encode(
    x='ec (cm3)',
    y='Enedc (g/km)'
)

In [61]:
mass_CO2 = alt.Chart(sample).mark_point().encode(
    x='m (kg)',
    y='Enedc (g/km)'
)

In [62]:
eng_cap_CO2 | mass_CO2

From a quick plots above, we see that car with low CO2 (ENEDC) emission can be separated. We focused our analyses on cars with low CO2 emissions vehicles, which includes EV, hybrid and fuel cars (so not only EV cars).