# Eurostat
Code to collect and process eurostat data to create the following indicators:

* Private sector R&D workforce
* Business Enterprise R&D (BERD)
* Share if hugh growth firms 

Raw data collected using the Eurostat API via the `EuroStat API Client` python package (https://pypi.org/project/eurostatapiclient/).

## Preamble

In [1]:
from eurostatapiclient import EurostatAPIClient

import numpy as np
import pandas as pd

In [2]:
VERSION = 'v2.1'
FORMAT = 'json'
LANGUAGE = 'en'

In [3]:
client = EurostatAPIClient(VERSION, FORMAT, LANGUAGE)

### Mappings

In [4]:
nuts2_map = {
    'UKC1': 'Tees Valley and Durham',
    'UKC2': 'Northumberland and Tyne and Wear',
    'UKD1': 'Cumbria',
    'UKD6': 'Cheshire',
    'UKD3': 'Greater Manchester',
    'UKD4': 'Lancashire',
    'UKD7': 'Merseyside',
    'UKE1': 'East Riding and North Lincolnshire', 
    'UKE2': 'North Yorkshire',
    'UKE3': 'South Yorkshire',
    'UKE4': 'West Yorkshire', 
    'UKF1': 'Derbyshire and Nottinghamshire',
    'UKF2': 'Leicestershire, Rutland and Northamptonshire', 
    'UKF3': 'Lincolnshire', 
    'UKG1': 'Herefordshire, Worcestershire and Warwickshire',
    'UKG2': 'Shropshire and Staffordshire',
    'UKG3': 'West Midlands', 
    'UKH1': 'East Anglia', 
    'UKH2': 'Bedfordshire and Hertfordshire',
    'UKH3': 'Essex',
    'UKI3': 'Inner London - West', 
    'UKI4': 'Inner London - East',
    'UKI5': 'Outer London - East and North East',
    'UKI6': 'Outer London - South', 
    'UKI7': 'Outer London - West and North West',
    'UKJ1': 'Berkshire, Buckinghamshire, and Oxfordshire', 
    'UKJ2': 'Surrey, East and West Sussex',
    'UKJ3': 'Hampshire and Isle of Wight', 
    'UKJ4': 'Kent', 
    'UKK1': 'Gloucestershire, Wiltshire and Bristol/Bath area', 
    'UKK2': 'Dorset and Somerset', 
    'UKK3': 'Cornwall and Isles of Scilly',
    'UKK4': 'Devon',
    'UKL1': 'West Wales and The Valleys',
    'UKL2': 'East Wales', 
    'UKM2': 'Eastern Scotland', 
    'UKM3': 'South Western Scotland', 
    'UKM5': 'North Eastern Scotland',
    'UKM6': 'Highlands and Islands', 
    'UKN0': 'Northern Ireland', 
    'UKZZ': 'Extra-regio NUTS 2'
    
}

In [5]:
vars_map = {
    'EUR_HAB': 'Euro per inhabitant',
    'MIO_EUR': 'Euros (Millions)',
    'FTE': 'Full time equivalent (FTE)',
    'HC': 'Head Count(HC)', 
    'PC_ACT_FTE': '% of active population- FTE',
    'PC_ACT_HC': '% of active population- HC'
}

## Data Collection, Processing & Transformation

This section is made up with three sections- one for each indicator. Each section is broken down in the following steps:

* Use the python package to pull down flattened data by entering using a query & put into a dataframe
* Collect the subset for UK NUTS2 regions
* Replace the codes with the label associated 
* Data is transformed into a pivot table to output the desired format

### Private sector R&D workforce data

In [7]:
#pull in data
data_priv_nuts2 = client.get_dataset('rd_p_persreg?sinceTimePeriod=2012&geoLevel=nuts2&precision=1&sex=T&sectperf=BES&prof_pos=TOTAL&unit=FTE&unit=HC&unit=PC_ACT_FTE&unit=PC_ACT_HC')

print(data_priv_nuts2.label)

dataframe_priv_nuts2 = data_priv_nuts2.to_dataframe()

Total R&D personnel and researchers by sectors of performance, sex and NUTS 2 regions


In [9]:
#UK NUTS2 regions subset
dataframe_priv_nuts2_uk = dataframe_priv_nuts2[dataframe_priv_nuts2['geo'].str.contains('UK')]

In [10]:
#mappings
dataframe_priv_nuts2_uk['geo'] = dataframe_priv_nuts2_uk['geo'].map(nuts2_map)
dataframe_priv_nuts2_uk['unit'] = dataframe_priv_nuts2_uk['unit'].map(vars_map)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [27]:
#pivot table
d_priv = dataframe_priv_nuts2_uk.pivot_table(index=['geo','time'],
               columns = 'unit',
               values = 'values').reset_index().set_index('geo')

In [29]:
#save data
d_priv.to_csv('../../data/processed/eurostat/17_12_2019_eurostat_private_rd_data.csv')

### Business Enterprise R&D (BERD) data

In [39]:
#pull in data
data_berd_nuts2 = client.get_dataset('rd_e_gerdreg?sinceTimePeriod=2012&geoLevel=nuts2&precision=1&sectperf=BES&unit=EUR_HAB&unit=MIO_EUR')

print(data_berd_nuts2.label)

dataframe_berd_nuts2 = data_berd_nuts2.to_dataframe()

In [42]:
#UK NUTS2 regions subset
dataframe_berd_nuts2_uk = dataframe_berd_nuts2[dataframe_berd_nuts2['geo'].str.contains('UK')]

In [44]:
#mappings
dataframe_berd_nuts2_uk['geo'] = dataframe_berd_nuts2_uk['geo'].map(nuts2_map)
dataframe_berd_nuts2_uk['unit'] = dataframe_berd_nuts2_uk['unit'].map(vars_map)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [48]:
#pivot table
d_berd = dataframe_berd_nuts2_uk.pivot_table(index=['geo','time'],
               columns = 'unit',
               values = 'values').reset_index().set_index('geo')

In [49]:
#save data
d_berd.to_csv('../../data/processed/eurostat/17_12_2019_eurostat_berd_data.csv')

### Share of high growth firms

In [86]:
#pull in data
data_share_nuts2 = client.get_dataset('bd_hgnace2_r3?sinceTimePeriod=2012&geoLevel=nuts2&precision=1&indic_sb=V97460&nace_r2=B-E&nace_r2=B-S_X_K642&nace_r2=F&nace_r2=G&nace_r2=H&nace_r2=I&nace_r2=J&nace_r2=K_L_X_K642&nace_r2=M_N&nace_r2=P_Q&nace_r2=R_S')

print(data_share_nuts2.label)

dataframe_share_nuts2 = data_share_nuts2.to_dataframe()

In [76]:
dataframe_share_nuts2[dataframe_share_nuts2['geo'] == 'UKC1']

Unnamed: 0,values,indic_sb,nace_r2,geo,time
0,7.80,V97460,B-S_X_K642,AT11,2012
1,7.13,V97460,B-S_X_K642,AT11,2013
2,6.16,V97460,B-S_X_K642,AT11,2014
3,5.76,V97460,B-S_X_K642,AT11,2015
4,7.03,V97460,B-S_X_K642,AT11,2016
...,...,...,...,...,...
895,9.67,V97460,B-S_X_K642,SK04,2012
896,10.03,V97460,B-S_X_K642,SK04,2013
897,10.04,V97460,B-S_X_K642,SK04,2014
898,11.51,V97460,B-S_X_K642,SK04,2015


In [65]:
#UK NUTS2 regions subset

dataframe_share_nuts2_uk = dataframe_share_nuts2[dataframe_share_nuts2['geo'].str.contains('UK')]

In [67]:
dataframe_share_nuts2_uk

Unnamed: 0,values,indic_sb,nace_r2,geo,time


In [66]:
dataframe_share_nuts2_uk.pivot_table(index=['geo','time'],
               columns = 'indic_sb',
               values = 'values')

geo,time


Note: Does not seem to be UK NUTS2 values for this dataset