# Zuhlke - NewAutoMotive Hackathon



## Data sources

### Setup

In [2]:
import requests
import pandas as pd
from pandas_ods_reader import read_ods

### 1. New Vehicle Registrations by vehicle type, quarterly

The Department for Transport (DfT) and their statistical team publishes a large amount of vehicle information.  
The main page for this information is here: https://www.gov.uk/government/collections/vehicles-statistics  
We've selected a few tables that are of particular interest. The code below downloads the data file to the `data` folder, and pulls a relevant part of the spreadsheet into a dataframe.

In [2]:
file_url =  'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/917425/veh0253.ods'
filepath = f'data/veh0253.ods'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [3]:
df_raw = read_ods(filepath, 1)

headers = df_raw.iloc[6, :].values
cols = dict(zip(df_raw.columns, df_raw.iloc[6, :].values))

df = (df_raw
          .iloc[27: 105, :]
          .rename(columns=cols)
          .reset_index(drop=True)
         )

In [4]:
df.columns = ['Date', 'Petrol', 'Diesel', 'Hybrid Electric',
       'Plug-in Hybrid Electric', 'Battery Electric',
       'Range-Extended Electric', 'Fuel Cell Electric', 'Gas', 'Other',
       'Total', 'Alternative Fuels']

In [5]:
df.head()

Unnamed: 0,Date,Petrol,Diesel,Hybrid Electric,Plug-in Hybrid Electric,Battery Electric,Range-Extended Electric,Fuel Cell Electric,Gas,Other,Total,Alternative Fuels
0,2001 Q1,598.295,108.329,0.248,0,0.015,0,0,1.035,0.001,707.923,1.299
1,2001 Q2,512.979,105.089,0.157,0,0.009,0,0,0.321,0.004,618.559,0.491
2,2001 Q3,596.655,132.522,0.155,0,0.043,0,0,0.18,0.002,729.557,0.38
3,2001 Q4,414.583,114.706,0.075,0,0.01,0,0,0.567,0.002,529.943,0.654
4,2002 Q1,591.202,166.614,0.074,0,0.004,0,0,0.794,0.0,758.688,0.872


In [6]:
df.tail()

Unnamed: 0,Date,Petrol,Diesel,Hybrid Electric,Plug-in Hybrid Electric,Battery Electric,Range-Extended Electric,Fuel Cell Electric,Gas,Other,Total,Alternative Fuels
73,2019 Q2,370.189,154.288,25.047,6.548,6.238,0.013,0.012,0.009,0.001,562.345,37.868
74,2019 Q3,387.98,147.991,32.441,7.845,12.727,0.026,0.028,0.012,0.002,589.052,53.081
75,2019 Q4,285.114,113.55,26.604,11.934,12.68,0.007,0.014,0.002,0.001,449.906,51.242
76,2020 Q1,306.607,103.165,38.253,13.518,18.086,0.01,0.011,0.047,0.0,479.697,69.925
77,2020 Q2,105.211,29.892,14.498,5.736,12.639,0.006,0.007,0.074,0.0,168.063,32.96


In [7]:
# what is the % share of battery electric vehicles in 2020 Q2?
round((df.tail(1)[['Battery Electric']].sum().sum() / df[df['Date']=='2020 Q2']['Total']).values[0] * 100, 2)

7.52

## 2. Miles travelled by vehicle make, model and fuelType 2015-2019

In [8]:
file_url =  'https://storage.googleapis.com/new_automotive/yearly_mileage_make-model-fuelType_2015-2019.csv'
filepath = f'data/yearly_mileage_make-model-fuelType_2015-2019.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [9]:
df_miles = pd.read_csv('data/yearly_mileage_make-model-fuelType_2015-2019.csv')

In [10]:
df_miles.head()

Unnamed: 0.1,Unnamed: 0,mileage,y,make,model,fuelType
0,0,10737400000.0,2017,FORD,TRANSIT,Diesel
1,1,10494190000.0,2018,FORD,TRANSIT,Diesel
2,2,10168430000.0,2016,FORD,TRANSIT,Diesel
3,3,9952579000.0,2019,FORD,TRANSIT,Diesel
4,4,9661517000.0,2018,FORD,FIESTA,Petrol


## 3. CO2 Emissions over time

Two datasets, one with just age and fueltype, and one with make and model.  
These have been aggregated from a set of 1.5m vehicles sampled from the DVLA vehicle checker API.  
Try it yourself here: https://vehicleenquiry.service.gov.uk/

In [11]:
file_url =  'https://storage.googleapis.com/new_automotive/avg_co2Emissions_by_fuelType_yearOfManufacture.csv'
filepath = f'data/avg_co2Emissions_by_fuelType_yearOfManufacture.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [12]:
file_url =  'https://storage.googleapis.com/new_automotive/avg_co2Emissions_by_fuelType_yearOfManufacture_make_model.csv'
filepath = f'data/avg_co2Emissions_by_fuelType_yearOfManufacture_make_model.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [13]:
df_emissions = pd.read_csv(filepath)

In [14]:
df_emissions.head()

Unnamed: 0,avg_co2Emissions,avg_engineCapacity,fuelType,yearOfManufacture,make,model
0,,3995.25,PETROL,1967.0,ASTON MARTIN,DB6
1,,6750.0,PETROL,1976.0,ROLLS ROYCE,SILVER SHADOW 1
2,,4235.0,PETROL,1974.0,DAIMLER,DS420
3,,2303.222222,PETROL,1987.0,BMW,3 SERIES
4,,1414.1,PETROL,1973.0,VOLKSWAGEN,BEETLE


In [15]:
df_emissions.tail()

Unnamed: 0,avg_co2Emissions,avg_engineCapacity,fuelType,yearOfManufacture,make,model
49235,0.0,124.0,PETROL,2017.0,PIAGGIO,LIBERTY 125
49236,0.0,6700.0,DIESEL,2017.0,DODGE,UNKNOWN
49237,0.0,125.0,PETROL,2017.0,HONDA,MSX
49238,234.0,5461.0,PETROL,2017.0,MERCEDES-BENZ,SL
49239,113.0,1995.0,DIESEL,2017.0,VAUXHALL,ADAM


## 4. EV Growth by Local Authority (LA) over time

In [3]:
file_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/853462/veh0131.ods'
filepath = f'data/veh0131.ods'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [4]:
df_raw = read_ods(filepath, 1)

headers = df_raw.iloc[5, :].values
cols = dict(zip(df_raw.columns, df_raw.iloc[5, :].values))

df_growth = (df_raw
      .iloc[6:478, :]
      .rename(columns=cols)
      .reset_index(drop=True)
     )

# dropping null regions
df_growth = df_growth[df_growth[['ONS LA Code', 'Region/Local Authority']].isnull().sum(axis=1) == 0]

df_growth.head()

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,2014 Q1,2013 Q4,2013 Q3,2013 Q2,2013 Q1,2012 Q4,2012 Q3,2012 Q2,2012 Q1,2011 Q4
0,K02000001,United Kingdom,300931,283910,253957,230811,211440,199886,186407,172220,...,13616,11868,10905,10122,9213,8606,7843,7211,6563,6228
1,K03000001,Great Britain,297305,280453,250865,227881,208677,197217,183909,169827,...,13427,11706,10760,9995,9119,8530,7778,7169,6537,6206
2,E92000001,England,268326,255106,225804,204890,187854,177784,166265,153128,...,11933,10426,9578,8849,8135,7544,6998,6420,5909,5705
3,E12000001,North East,4666,4565,4112,3917,3613,3448,3384,3198,...,413,343,321,290,256,235,226,202,182,189
4,E06000047,County Durham UA,930,906,786,745,714,673,671,631,...,69,58,53,39,30,27,26,22,20,39


In [5]:
len(df_growth)

441

### EV Growth by LAD Population

In [6]:
# Read LAD lookup
lad_lookup = pd.read_csv("data/ONS/Local_Authority_Districts_(December_2019)_Names_and_Codes_in_the_United_Kingdom.csv")

# Filter out non-LAD rows
df_lad = pd.merge(df_growth, lad_lookup, left_on='ONS LA Code', right_on='LAD19CD')

df_lad.head()

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,2013 Q1,2012 Q4,2012 Q3,2012 Q2,2012 Q1,2011 Q4,FID,LAD19CD,LAD19NM,LAD19NMW
0,E06000047,County Durham UA,930,906,786,745,714,673,671,631,...,30,27,26,22,20,39,45,E06000047,County Durham,
1,E06000005,Darlington UA,244,239,226,211,182,178,167,169,...,9,5,5,5,c,c,5,E06000005,Darlington,
2,E06000001,Hartlepool UA,109,105,96,90,82,77,74,72,...,c,c,c,c,c,c,1,E06000001,Hartlepool,
3,E06000002,Middlesbrough UA,86,84,80,82,72,66,61,53,...,c,c,c,c,c,c,2,E06000002,Middlesbrough,
4,E06000057,Northumberland UA,936,928,826,779,728,684,665,641,...,33,33,34,30,31,30,54,E06000057,Northumberland,


In [7]:
# df_lad = df_lad.drop(columns=['Region/Local Authority', 'FID', 'LAD19NMW'])
df_lad["2020"] = df_lad["2020 Q2"] + df_lad["2020 Q1"]
df_lad["2019"] = df_lad["2019 Q4"] + df_lad["2019 Q3"] + df_lad["2019 Q2"] + df_lad["2019 Q1"]
df_lad["2018"] = df_lad["2018 Q4"] + df_lad["2018 Q3"] + df_lad["2018 Q2"] + df_lad["2018 Q1"]
df_lad["2017"] = df_lad["2017 Q4"] + df_lad["2017 Q3"] + df_lad["2017 Q2"] + df_lad["2017 Q1"]
df_lad["2016"] = df_lad["2016 Q4"] + df_lad["2016 Q3"] + df_lad["2016 Q2"] + df_lad["2016 Q1"]
# df_lad["2015"] = df_lad["2015 Q4"] + df_lad["2015 Q3"] + df_lad["2015 Q2"] + df_lad["2015 Q1"]
# df_lad["2014"] = df_lad["2014 Q4"] + df_lad["2014 Q3"] + df_lad["2014 Q2"] + df_lad["2014 Q1"]
# df_lad["2013"] = df_lad["2013 Q4"] + df_lad["2013 Q3"] + df_lad["2013 Q2"] + df_lad["2013 Q1"]
# df_lad["2012"] = df_lad["2012 Q4"] + df_lad["2012 Q3"] + df_lad["2012 Q2"] + df_lad["2012 Q1"]
# df_lad["2011"] = df_lad["2011 Q4"] + df_lad["2011 Q3"] + df_lad["2011 Q2"] + df_lad["2011 Q1"]

In [8]:
# Read LAD population data
lad_pop = pd.read_csv("data/ONS/LAD_population.csv")
lad_pop = lad_pop[["Code", "All ages"]]
lad_pop["All ages"] = lad_pop["All ages"].str.replace(',', '').astype(float)

# Merge population and EV data
df_pop = pd.merge(df_lad, lad_pop, left_on="LAD19CD", right_on="Code")
df_pop = df_pop.drop(["Code"], axis=1)
df_pop = df_pop.rename(columns={"All ages": "population"})
df_pop.head()

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,FID,LAD19CD,LAD19NM,LAD19NMW,2020,2019,2018,2017,2016,population
0,E06000047,County Durham UA,930,906,786,745,714,673,671,631,...,45,E06000047,County Durham,,1836,2918,2499,2255,1996,530094.0
1,E06000005,Darlington UA,244,239,226,211,182,178,167,169,...,5,E06000005,Darlington,,483,797,639,485,293,106803.0
2,E06000001,Hartlepool UA,109,105,96,90,82,77,74,72,...,1,E06000001,Hartlepool,,214,345,279,203,122,93663.0
3,E06000002,Middlesbrough UA,86,84,80,82,72,66,61,53,...,2,E06000002,Middlesbrough,,170,300,218,180,176,140980.0
4,E06000057,Northumberland UA,936,928,826,779,728,684,665,641,...,54,E06000057,Northumberland,,1864,3017,2501,1981,1460,322434.0


In [9]:
# Normalise by total number of EVs
tot_2020 = df_pop["2020"].sum()
df_pop["norm_2020"] = df_pop["2020"] / tot_2020 * 100

tot_2019 = df_pop["2019"].sum()
df_pop["norm_2019"] = df_pop["2019"] / tot_2019 * 100

tot_2018 = df_pop["2018"].sum()
df_pop["norm_2018"] = df_pop["2018"] / tot_2018 * 100

tot_2017 = df_pop["2017"].sum()
df_pop["norm_2017"] = df_pop["2017"] / tot_2017 * 100

tot_2016 = df_lad["2016"].sum()
df_pop["norm_2016"] = df_pop["2016"] / tot_2016 * 100

df_pop.head()
# df_lad.to_csv("data/")

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,2019,2018,2017,2016,population,norm_2020,norm_2019,norm_2018,norm_2017,norm_2016
0,E06000047,County Durham UA,930,906,786,745,714,673,671,631,...,2918,2499,2255,1996,530094.0,0.323911,0.338247,0.392903,0.507839,0.687967
1,E06000005,Darlington UA,244,239,226,211,182,178,167,169,...,797,639,485,293,106803.0,0.0852118,0.0923862,0.100466,0.109225,0.100989
2,E06000001,Hartlepool UA,109,105,96,90,82,77,74,72,...,345,279,203,122,93663.0,0.0377543,0.0399915,0.0438655,0.0457168,0.0420501
3,E06000002,Middlesbrough UA,86,84,80,82,72,66,61,53,...,300,218,180,176,140980.0,0.0299917,0.0347752,0.0342748,0.0405371,0.0606625
4,E06000057,Northumberland UA,936,928,826,779,728,684,665,641,...,3017,2501,1981,1460,322434.0,0.32885,0.349723,0.393217,0.446133,0.503223


In [10]:
# Normalise by LAD population
df_pop["per_index_2020"] = df_pop["2020"] / df_pop["population"] * 100
df_pop["per_index_2019"] = df_pop["2019"] / df_pop["population"] * 100
df_pop["per_index_2018"] = df_pop["2018"] / df_pop["population"] * 100
df_pop["per_index_2017"] = df_pop["2017"] / df_pop["population"] * 100
df_pop["per_index_2016"] = df_pop["2016"] / df_pop["population"] * 100

# Cumulative
df_pop["cum_per_index_2016"] = df_pop["per_index_2016"]
df_pop["cum_per_index_2017"] = df_pop["per_index_2017"] + df_pop["cum_per_index_2016"]
df_pop["cum_per_index_2018"] = df_pop["per_index_2018"] + df_pop["cum_per_index_2017"]
df_pop["cum_per_index_2019"] = df_pop["per_index_2019"] + df_pop["cum_per_index_2018"]
df_pop["cum_per_index_2020"] = df_pop["per_index_2020"] + df_pop["cum_per_index_2019"]

df_pop.head()

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,per_index_2020,per_index_2019,per_index_2018,per_index_2017,per_index_2016,cum_per_index_2016,cum_per_index_2017,cum_per_index_2018,cum_per_index_2019,cum_per_index_2020
0,E06000047,County Durham UA,930,906,786,745,714,673,671,631,...,0.346354,0.550468,0.471426,0.425396,0.376537,0.376537,0.801933,1.27336,1.82383,2.17018
1,E06000005,Darlington UA,244,239,226,211,182,178,167,169,...,0.452234,0.746234,0.598298,0.454107,0.274337,0.274337,0.728444,1.32674,2.07298,2.52521
2,E06000001,Hartlepool UA,109,105,96,90,82,77,74,72,...,0.228479,0.368342,0.297876,0.216734,0.130254,0.130254,0.346989,0.644865,1.01321,1.24169
3,E06000002,Middlesbrough UA,86,84,80,82,72,66,61,53,...,0.120584,0.212796,0.154632,0.127678,0.12484,0.12484,0.252518,0.40715,0.619946,0.740531
4,E06000057,Northumberland UA,936,928,826,779,728,684,665,641,...,0.578103,0.935695,0.775663,0.614389,0.452806,0.452806,1.0672,1.84286,2.77855,3.35666


In [11]:
df_pop.to_csv("data/ev_growth_analysis.csv")

### Correlation with Index of Multiple Deprivation

In [36]:
# Read deprivation data
deprivation_en = pd.read_csv("data/ONS/Index_of_Multiple_Deprivation_(December_2019)_Lookup_in_England.csv")
deprivation_en = deprivation_en.groupby('LAD19CD').sum('IMD19').drop(columns=['FID'])

deprivation_wa = pd.read_csv("data/ONS/Index_of_Multiple_Deprivation_(December_2019)_Lookup_in_Wales.csv")
deprivation_wa = deprivation_wa.rename(columns={'ladcd': 'LAD19CD', 'wimd_2019': 'IMD19'})
deprivation_wa = deprivation_wa.groupby('LAD19CD').sum('IMD19').drop(columns=['FID'])

deprivation_idx = pd.concat([deprivation_en, deprivation_wa], axis=0)
deprivation_idx.head()

Unnamed: 0_level_0,IMD19
LAD19CD,Unnamed: 1_level_1
E06000001,602966
E06000002,790303
E06000003,1097446
E06000004,1857259
E06000005,913636


In [48]:
# Merge with EV data
df_depr = pd.merge(df_pop, deprivation_idx, how='right', on='LAD19CD')
df_depr["imd_by_pop"] = df_depr['IMD19'] / df_pop["population"]

dep_total = df_depr['IMD19'].sum()
df_depr["imd_normalised"] = df_depr['IMD19'] / dep_total * 100

df_depr.to_csv("data/ev_growth_and_deprivation.csv")
df_depr.tail()

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,per_index_2017,per_index_2016,cum_per_index_2016,cum_per_index_2017,cum_per_index_2018,cum_per_index_2019,cum_per_index_2020,IMD19,imd_by_pop,imd_normalised
334,W06000020,Torfaen,108,110,94,91,80,75,72,66,...,0.208597,0.160705,0.160705,0.369302,0.651334,1.01319,1.2452,51014,0.22309,0.009426
335,W06000021,Monmouthshire,340,332,295,270,243,218,201,197,...,0.571942,0.313987,0.313987,0.885929,1.69151,2.77619,3.48663,71222,0.272662,0.01316
336,W06000022,Newport,248,251,229,202,189,178,159,152,...,0.273475,0.192661,0.192661,0.466136,0.842406,1.35832,1.68093,79832,0.687022,0.014751
337,W06000023,Powys,340,323,294,267,241,216,197,177,...,0.373768,0.252954,0.252954,0.626723,1.15075,1.91943,2.42006,90010,1.048212,0.016631
338,W06000024,Merthyr Tydfil,46,42,43,39,37,33,29,27,...,0.0845407,0.0547028,0.0547028,0.139243,0.305009,0.556974,0.702848,22218,0.431083,0.004105


### EV Growth by Region

In [9]:
# Read LAD lookup
region_lookup = pd.read_csv("data/ONS/Regions__December_2019__Boundaries_EN_BUC.csv")
# Filter out non-LAD rows
df_region = pd.merge(df_growth, region_lookup, left_on='ONS LA Code', right_on='rgn19cd')
df_region= df_region.drop(columns=['rgn19nm', 'bng_e', 'bng_n', 'long', 'lat', 'st_areashape', 'st_lengthshape'])
df_region

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,2013 Q3,2013 Q2,2013 Q1,2012 Q4,2012 Q3,2012 Q2,2012 Q1,2011 Q4,objectid,rgn19cd
0,E12000001,North East,4666,4565,4112,3917,3613,3448,3384,3198,...,321,290,256,235,226,202,182,189,1,E12000001
1,E12000002,North West,15093,14542,13227,12143,11019,10506,9822,9266,...,883,739,627,552,526,469,437,426,2,E12000002
2,E12000003,Yorkshire and The Humber,21526,20653,17283,15611,14545,13562,12049,10931,...,413,388,377,342,315,284,261,244,3,E12000003
3,E12000004,East Midlands,15181,14597,13171,12210,11296,10555,9858,9026,...,590,590,548,526,392,359,332,324,4,E12000004
4,E12000005,West Midlands,42275,39428,34635,31336,28932,27407,26152,24106,...,1531,1370,1229,1149,857,745,546,523,5,E12000005
5,E12000006,East,35721,34125,31011,28470,26329,25588,24312,22755,...,747,661,587,528,615,544,494,347,6,E12000006
6,E12000007,London,37044,35145,32425,28906,25548,23613,20623,18493,...,1866,1789,1720,1666,1616,1559,1533,1576,7,E12000007
7,E12000007,Inner London,15754,14917,13964,12257,10951,9829,8499,7539,...,992,960,949,936,916,910,906,941,7,E12000007
8,E12000007,Outer London,21252,20191,18412,16596,14539,13728,12065,10897,...,856,810,753,710,682,632,608,615,7,E12000007
9,E12000008,South East,62332,59005,52261,47250,43455,40943,38655,35612,...,1994,1925,1853,1774,1790,1720,1642,1613,8,E12000008


In [10]:
# df_lad = df_lad.drop(columns=['Region/Local Authority', 'FID', 'LAD19NMW'])
df_region["2020"] = df_region["2020 Q2"] + df_region["2020 Q1"]
df_region["2019"] = df_region["2019 Q4"] + df_region["2019 Q3"] + df_region["2019 Q2"] + df_region["2019 Q1"]
df_region["2018"] = df_region["2018 Q4"] + df_region["2018 Q3"] + df_region["2018 Q2"] + df_region["2018 Q1"]
df_region["2017"] = df_region["2017 Q4"] + df_region["2017 Q3"] + df_region["2017 Q2"] + df_region["2017 Q1"]
df_region["2016"] = df_region["2016 Q4"] + df_region["2016 Q3"] + df_region["2016 Q2"] + df_region["2016 Q1"]

In [14]:
# Read LAD population data
region_pop = pd.read_csv("data/ONS/LAD_population.csv")
region_pop = region_pop[["Code", "All ages"]]
region_pop["All ages"] = region_pop["All ages"].str.replace(',', '').astype(float)

# Merge population and EV data
df_pop_region = pd.merge(df_region, region_pop, left_on="rgn19cd", right_on="Code")
df_pop_region = df_pop_region.drop(["Code"], axis=1)
df_pop_region = df_pop_region.rename(columns={"All ages": "population"})
df_pop_region

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,2012 Q1,2011 Q4,objectid,rgn19cd,2020,2019,2018,2017,2016,population
0,E12000001,North East,4666,4565,4112,3917,3613,3448,3384,3198,...,182,189,1,E12000001,9231,15090,12580,10508,8564,2669941.0
1,E12000002,North West,15093,14542,13227,12143,11019,10506,9822,9266,...,437,426,2,E12000002,29635,46895,35619,26488,19011,7341196.0
2,E12000003,Yorkshire and The Humber,21526,20653,17283,15611,14545,13562,12049,10931,...,261,244,3,E12000003,42179,61001,42195,29478,18491,5502967.0
3,E12000004,East Midlands,15181,14597,13171,12210,11296,10555,9858,9026,...,332,324,4,E12000004,29778,47232,34422,23643,15322,4835928.0
4,E12000005,West Midlands,42275,39428,34635,31336,28932,27407,26152,24106,...,546,523,5,E12000005,81703,122310,93620,66651,43041,5934037.0
5,E12000006,East,35721,34125,31011,28470,26329,25588,24312,22755,...,494,347,6,E12000006,69846,111398,90580,70377,44391,6236072.0
6,E12000007,London,37044,35145,32425,28906,25548,23613,20623,18493,...,1533,1576,7,E12000007,72189,110492,71080,47903,29938,8961989.0
7,E12000007,Inner London,15754,14917,13964,12257,10951,9829,8499,7539,...,906,941,7,E12000007,30671,47001,29210,19968,13034,8961989.0
8,E12000007,Outer London,21252,20191,18412,16596,14539,13728,12065,10897,...,608,615,7,E12000007,41443,63275,41689,27815,16793,8961989.0
9,E12000008,South East,62332,59005,52261,47250,43455,40943,38655,35612,...,1642,1613,8,E12000008,121337,183909,135184,89553,60494,9180135.0


In [22]:
# Normalise by LAD population
df_pop_region["per_index_2020"] = df_pop_region["2020"] / df_pop_region["population"] * 100
df_pop_region["per_index_2019"] = df_pop_region["2019"] / df_pop_region["population"] * 100
df_pop_region["per_index_2018"] = df_pop_region["2018"] / df_pop_region["population"] * 100
df_pop_region["per_index_2017"] = df_pop_region["2017"] / df_pop_region["population"] * 100
df_pop_region["per_index_2016"] = df_pop_region["2016"] / df_pop_region["population"] * 100

# Cumulative
df_pop_region["cum_per_index_2016"] = df_pop_region["per_index_2016"]
df_pop_region["cum_per_index_2017"] = df_pop_region["per_index_2017"] + df_pop_region["cum_per_index_2016"]
df_pop_region["cum_per_index_2018"] = df_pop_region["per_index_2018"] + df_pop_region["cum_per_index_2017"]
df_pop_region["cum_per_index_2019"] = df_pop_region["per_index_2019"] + df_pop_region["cum_per_index_2018"]
df_pop_region["cum_per_index_2020"] = df_pop_region["per_index_2020"] + df_pop_region["cum_per_index_2019"]

# df_pop_region = df_pop_region.drop(index=[7, 8])
df_pop_region

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,per_index_2020,per_index_2019,per_index_2018,per_index_2017,per_index_2016,cum_per_index_2016,cum_per_index_2017,cum_per_index_2018,cum_per_index_2019,cum_per_index_2020
0,E12000001,North East,4666,4565,4112,3917,3613,3448,3384,3198,...,0.345738,0.565181,0.471171,0.393567,0.320756,0.320756,0.714323,1.18549,1.75068,2.09641
1,E12000002,North West,15093,14542,13227,12143,11019,10506,9822,9266,...,0.403681,0.638792,0.485193,0.360813,0.258963,0.258963,0.619776,1.10497,1.74376,2.14744
2,E12000003,Yorkshire and The Humber,21526,20653,17283,15611,14545,13562,12049,10931,...,0.766477,1.10851,0.766768,0.535675,0.336019,0.336019,0.871693,1.63846,2.74697,3.51345
3,E12000004,East Midlands,15181,14597,13171,12210,11296,10555,9858,9026,...,0.615766,0.976689,0.711797,0.488903,0.316837,0.316837,0.80574,1.51754,2.49423,3.10999
4,E12000005,West Midlands,42275,39428,34635,31336,28932,27407,26152,24106,...,1.37685,2.06116,1.57768,1.1232,0.725324,0.725324,1.84852,3.4262,5.48736,6.86421
5,E12000006,East,35721,34125,31011,28470,26329,25588,24312,22755,...,1.12003,1.78635,1.45252,1.12855,0.711842,0.711842,1.84039,3.29291,5.07926,6.19929
6,E12000007,London,37044,35145,32425,28906,25548,23613,20623,18493,...,0.805502,1.2329,0.793128,0.534513,0.334055,0.334055,0.868568,1.6617,2.89459,3.70009
9,E12000008,South East,62332,59005,52261,47250,43455,40943,38655,35612,...,1.32173,2.00334,1.47257,0.975509,0.658966,0.658966,1.63447,3.10705,5.11038,6.43212
10,E12000009,South West,34488,33046,27679,25047,23117,22162,21410,19741,...,1.20067,1.74241,1.31124,0.853593,0.541967,0.541967,1.39556,2.7068,4.4492,5.64987


In [23]:
df_pop_region.to_csv("data/ev_growth_analysis_by_region.csv")

This data blends Local Authority and Region codes.  
To disentangle, the codes need to map to either a LA or a Region.  

LA codes: https://data.gov.uk/dataset/24d87ad2-0fa9-4b35-816a-89f9d92b0042/local-authority-districts-april-2020-names-and-codes-in-the-united-kingdom

GeoJson and CSV is available.


## 5. What type of vehicles are there?

A DfT spreadsheet provides vehicle Make and Model by vehicle type (the tabs are different vehicle types). But some Makes / Models appear across multiple categories. Can we create a matching for Make / Model to a predominant vehicle type (maybe by count), and use that to slice other datasets by vehicle type?  

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/917200/veh0120.ods





We also have an aggregated set of 35m active vehicles (vehicles with MOTs in date at around September 2020).



In [18]:
file_url = 'https://storage.googleapis.com/new_automotive/active_vehicle_counts.csv'
filepath = f'data/active_vehicle_counts.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [19]:
df_types = pd.read_csv(filepath)

In [20]:
df_types.head()

Unnamed: 0,count,avg_engineSize,make,model,avg_year,fuelType,primaryColour
0,34243,,,,,,
1,2949,996.03,SUZUKI,ALTO,2011.4,Petrol,Pink
2,33,399.85,BSA,UNKNOWN,1955.94,Petrol,Not Stated
3,1,2500.0,LDV,CONVOY,2001.0,Diesel,Pink
4,51,996.0,SUZUKI,ALTO SZ4 AUTO,2012.39,Petrol,Pink
