# Zuhlke - NewAutoMotive Hackathon



## Data sources

### Setup

In [3]:
import requests
import pandas as pd
from pandas_ods_reader import read_ods

### 1. New Vehicle Registrations by vehicle type, quarterly

The Department for Transport (DfT) and their statistical team publishes a large amount of vehicle information.  
The main page for this information is here: https://www.gov.uk/government/collections/vehicles-statistics  
We've selected a few tables that are of particular interest. The code below downloads the data file to the `data` folder, and pulls a relevant part of the spreadsheet into a dataframe.

In [4]:
file_url =  'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/917425/veh0253.ods'
filepath = f'data/veh0253.ods'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

50803

In [8]:
df_raw = read_ods(filepath, 1)

headers = df_raw.iloc[6, :].values
cols = dict(zip(df_raw.columns, df_raw.iloc[6, :].values))

df = (df_raw
          .iloc[27: 105, :]
          .rename(columns=cols)
          .reset_index(drop=True)
         )

In [10]:
df.columns = ['Date', 'Petrol', 'Diesel', 'Hybrid Electric',
       'Plug-in Hybrid Electric', 'Battery Electric',
       'Range-Extended Electric', 'Fuel Cell Electric', 'Gas', 'Other',
       'Total', 'Alternative Fuels']

In [18]:
df.head(1)

Unnamed: 0,Date,Petrol,Diesel,Hybrid Electric,Plug-in Hybrid Electric,Battery Electric,Range-Extended Electric,Fuel Cell Electric,Gas,Other,Total,Alternative Fuels
0,2001 Q1,598.295,108.329,0.248,0,0.015,0,0,1.035,0.001,707.923,1.299


In [17]:
df.tail(1)

Unnamed: 0,Date,Petrol,Diesel,Hybrid Electric,Plug-in Hybrid Electric,Battery Electric,Range-Extended Electric,Fuel Cell Electric,Gas,Other,Total,Alternative Fuels
77,2020 Q2,105.211,29.892,14.498,5.736,12.639,0.006,0.007,0.074,0,168.063,32.96


In [15]:
# what is the % share of battery electric vehicles in 2020 Q2?
round((df.tail(1)[['Battery Electric']].sum().sum() / df[df['Date']=='2020 Q2']['Total']).values[0] * 100, 2)

7.52

## 2. Miles travelled by vehicle make, model and fuelType 2015-2019

In [37]:
file_url =  'https://storage.googleapis.com/new_automotive/yearly_mileage_make-model-fuelType_2015-2019.csv'
filepath = f'data/yearly_mileage_make-model-fuelType_2015-2019.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [45]:
df = pd.read_csv('data/yearly_mileage_make-model-fuelType_2015-2019.csv')

In [46]:
df.head()

Unnamed: 0.1,Unnamed: 0,mileage,y,make,model,fuelType
0,0,10737400000.0,2017,FORD,TRANSIT,Diesel
1,1,10494190000.0,2018,FORD,TRANSIT,Diesel
2,2,10168430000.0,2016,FORD,TRANSIT,Diesel
3,3,9952579000.0,2019,FORD,TRANSIT,Diesel
4,4,9661517000.0,2018,FORD,FIESTA,Petrol


## 3. CO2 Emissions over time

Two datasets, one with just age and fueltype, and one with make and model.  
These have been aggregated from a set of 1.5m vehicles sampled from the DVLA vehicle checker API.  
Try it yourself here: https://vehicleenquiry.service.gov.uk/

In [32]:
file_url =  'https://storage.googleapis.com/new_automotive/avg_co2Emissions_by_fuelType_yearOfManufacture.csv'
filepath = f'data/avg_co2Emissions_by_fuelType_yearOfManufacture.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [47]:
file_url =  'https://storage.googleapis.com/new_automotive/avg_co2Emissions_by_fuelType_yearOfManufacture_make_model.csv'
filepath = f'data/avg_co2Emissions_by_fuelType_yearOfManufacture_make_model.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [48]:
df = pd.read_csv(filepath)

In [49]:
df.head()

Unnamed: 0,avg_co2Emissions,avg_engineCapacity,fuelType,yearOfManufacture,make,model
0,,3995.25,PETROL,1967.0,ASTON MARTIN,DB6
1,,6750.0,PETROL,1976.0,ROLLS ROYCE,SILVER SHADOW 1
2,,4235.0,PETROL,1974.0,DAIMLER,DS420
3,,2303.222222,PETROL,1987.0,BMW,3 SERIES
4,,1414.1,PETROL,1973.0,VOLKSWAGEN,BEETLE


## 4. EV Growth by Local Authority (LA) over time

In [34]:
file_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/853462/veh0131.ods'
filepath = f'data/veh0131.ods'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [35]:
df_raw = read_ods(filepath, 1)

headers = df_raw.iloc[5, :].values
cols = dict(zip(df_raw.columns, df_raw.iloc[5, :].values))

df = (df_raw
      .iloc[6:478, :]
      .rename(columns=cols)
      .reset_index(drop=True)
     )

# dropping null regions
df = df[df[['ONS LA Code', 'Region/Local Authority']].isnull().sum(axis=1) == 0]

df.head()

Unnamed: 0,ONS LA Code,Region/Local Authority,2020 Q2,2020 Q1,2019 Q4,2019 Q3,2019 Q2,2019 Q1,2018 Q4,2018 Q3,...,2014 Q1,2013 Q4,2013 Q3,2013 Q2,2013 Q1,2012 Q4,2012 Q3,2012 Q2,2012 Q1,2011 Q4
0,K02000001,United Kingdom,300931,283910,253957,230811,211440,199886,186407,172220,...,13616,11868,10905,10122,9213,8606,7843,7211,6563,6228
1,K03000001,Great Britain,297305,280453,250865,227881,208677,197217,183909,169827,...,13427,11706,10760,9995,9119,8530,7778,7169,6537,6206
2,E92000001,England,268326,255106,225804,204890,187854,177784,166265,153128,...,11933,10426,9578,8849,8135,7544,6998,6420,5909,5705
3,E12000001,North East,4666,4565,4112,3917,3613,3448,3384,3198,...,413,343,321,290,256,235,226,202,182,189
4,E06000047,County Durham UA,930,906,786,745,714,673,671,631,...,69,58,53,39,30,27,26,22,20,39


This data blends Local Authority and Region codes.  
To disentangle, the codes need to map to either a LA or a Region.  

LA codes: https://data.gov.uk/dataset/24d87ad2-0fa9-4b35-816a-89f9d92b0042/local-authority-districts-april-2020-names-and-codes-in-the-united-kingdom

GeoJson and CSV is available.


## 5. What type of vehicles are there?

A DfT spreadsheet provides vehicle Make and Model by vehicle type (the tabs are different vehicle types). But some Makes / Models appear across multiple categories. Can we create a matching for Make / Model to a predominant vehicle type (maybe by count), and use that to slice other datasets by vehicle type?  

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/917200/veh0120.ods





We also have an aggregated set of 35m active vehicles (vehicles with MOTs in date at around September 2020).



In [41]:
file_url = 'https://storage.googleapis.com/new_automotive/active_vehicle_counts.csv'
filepath = f'data/active_vehicle_counts.csv'

r = requests.get(file_url)
with open(filepath, 'wb') as f:
    f.write(r.content)

In [42]:
df = pd.read_csv(filepath)

In [43]:
df.head()

Unnamed: 0,count,avg_engineSize,make,model,avg_year,fuelType,primaryColour
0,34243,,,,,,
1,2949,996.03,SUZUKI,ALTO,2011.4,Petrol,Pink
2,33,399.85,BSA,UNKNOWN,1955.94,Petrol,Not Stated
3,1,2500.0,LDV,CONVOY,2001.0,Diesel,Pink
4,51,996.0,SUZUKI,ALTO SZ4 AUTO,2012.39,Petrol,Pink
