In [1]:
from gssutils import *

if is_interactive():
    scraper = Scraper("https://www.gov.uk/government/statistics/regional-trade-in-goods-statistics-dis-aggregated-by-smaller-geographical-areas-2017")

In [2]:
scraper

## Regional trade in goods statistics disaggregated by smaller geographical areas: 2017

International trade in goods data at summary product and country level, by UK areas smaller than NUTS1.

### Description

HM Revenue & Customs (HMRC) collects the UK’s international trade in goods
data, which are published as two National Statistics series - the ‘Overseas
Trade in Goods Statistics (OTS)’ and the ‘Regional Trade in Goods Statistics
(RTS)’. The RTS are published quarterly showing trade at summary product and
country level, split by UK regions and devolved administrations.

This release provides statistics for 2017 calendar year. It breaks down the
RTS into smaller UK geographical areas. RTS data and related products are
categorised by partner country and [Standard International Trade
Classification,
Rev.4](http://unstats.un.org/unsd/cr/registry/regcst.asp?Cl=28) (SITC).

In this release data is analysed mainly at partner country and SITC section
(1-digit) level. The collection and publication methodology for the RTS and
this release is available on
[www.uktradeinfo.com](https://www.uktradeinfo.com/Pages/Home.aspx).

  *[HMRC]: HM Revenue & Customs
  *[OTS]: Overseas Trade in Goods Statistics
  *[RTS]: Regional Trade in Goods Statistics
  *[SITC]: Standard International Trade Classification



### Distributions

1. Regional trade in goods statistics disaggregated by smaller geographical areas: Commentary 2017 ([application/pdf](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/754754/Local_Area_Commentary_2017.pdf))
1. Regional trade in goods statistics disaggregated by smaller geographical areas: Data Tables 2017 ([MS Excel Spreadsheet](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/763405/Local_Area_Tables_2017.xls))
1. Regional trade in goods statistics disaggregated by smaller geographical areas: Interactive Data Tool 2017 ([application/vnd.ms-excel.sheet.macroEnabled.12](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/763409/Local_Area_Interactive_Tables_2017.xlsm))


In [3]:
tabs = {tab.name: tab for tab in scraper.distribution(title=lambda t: 'Data Tables' in t).as_databaker()}

In [4]:
tab = tabs['T1 NUTS1 (Summary Data)']

In [5]:
tidy = pd.DataFrame()

In [6]:
flow = tab.filter('Flow').fill(DOWN).is_not_blank().is_not_whitespace()
geography = tab.filter('EU / Non-EU').fill(DOWN).is_not_blank().is_not_whitespace() | flow
nut = tab.filter('NUTS1').fill(DOWN).is_not_blank().is_not_whitespace() | flow
observations = tab.filter('Statistical Value (£ million)').fill(DOWN).is_not_blank().is_not_whitespace()
observations = observations.filter(lambda x: type(x.value) != str or 'HMRC' not in x.value)
Dimensions = [
            HDim(flow,'Flow',DIRECTLY,LEFT),
            HDim(geography,'HMRC Partner Geography',DIRECTLY,LEFT),
            HDim(nut,'NUTS Geography',DIRECTLY,LEFT),
            HDimConst('SITC 4', 'all'),
            HDimConst('Measure Type', 'GBP Total'),
            HDimConst('Unit', 'gbp-million'),
            HDimConst('Year', '2017')
            ]
c1 = ConversionSegment(observations, Dimensions, processTIMEUNIT=True)
table1 = c1.topandas()
tidy = pd.concat([tidy, table1])




In [7]:
savepreviewhtml(c1)

0,1,2,3
OBS,Flow,HMRC Partner Geography,NUTS Geography

0,1,2,3,4,5,6,7,8,9,10,11,12,13
Table 1 - NUTS1 by Flow - 2017,,,,,,,,,,,,,
,,,,,,,,,,,,,
Flow,EU / Non-EU,NUTS1,Statistical Value (£ million),Business Count,,Flow,EU / Non-EU,Statistical Value (£ million),Business Count,,Flow,Statistical Value (£ million),Business Count
Exp,EU,East,15248.0,13962.0,,Exp,EU,162271.0,119843.0,,Exp,328380.0,153046.0
Exp,EU,East Midlands,10712.0,10405.0,,Exp,Non-EU,166109.0,75901.0,,Imp,468384.0,232537.0
Exp,EU,London,15181.0,20688.0,,Imp,EU,256208.0,163463.0,,,,Source: HMRC Regional Trade in Goods Statistics data
Exp,EU,North East,7613.0,3377.0,,Imp,Non-EU,212176.0,114498.0,,,,
Exp,EU,North West,14165.0,12781.0,,,,,Source: HMRC Regional Trade in Goods Statistics data,,,,
Exp,EU,Northern Ireland,4964.0,8298.0,,,,,,,,,
Exp,EU,Scotland,13987.0,6783.0,,,,,,,,,


In [8]:
observations1 = tab.filter('Business Count').fill(DOWN).is_not_blank().is_not_whitespace()
observations1 = observations1.filter(lambda x: type(x.value) != str or 'HMRC' not in x.value)
Dimensions = [
            HDim(flow,'Flow',DIRECTLY,LEFT),
            HDim(geography,'HMRC Partner Geography',DIRECTLY,LEFT),
            HDim(nut,'NUTS Geography',DIRECTLY,LEFT),
            HDimConst('Measure Type', 'Count of Businesses'),
            HDimConst('SITC 4', 'all'),
            HDimConst('Unit', 'businesses'),
            HDimConst('Year', '2017')
            ]
c2 = ConversionSegment(observations1, Dimensions, processTIMEUNIT=True)
table2 = c2.topandas()
tidy = pd.concat([tidy, table2])




of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


In [9]:
savepreviewhtml(c2)

0,1,2,3
OBS,Flow,HMRC Partner Geography,NUTS Geography

0,1,2,3,4,5,6,7,8,9,10,11,12,13
Table 1 - NUTS1 by Flow - 2017,,,,,,,,,,,,,
,,,,,,,,,,,,,
Flow,EU / Non-EU,NUTS1,Statistical Value (£ million),Business Count,,Flow,EU / Non-EU,Statistical Value (£ million),Business Count,,Flow,Statistical Value (£ million),Business Count
Exp,EU,East,15248.0,13962.0,,Exp,EU,162271.0,119843.0,,Exp,328380.0,153046.0
Exp,EU,East Midlands,10712.0,10405.0,,Exp,Non-EU,166109.0,75901.0,,Imp,468384.0,232537.0
Exp,EU,London,15181.0,20688.0,,Imp,EU,256208.0,163463.0,,,,Source: HMRC Regional Trade in Goods Statistics data
Exp,EU,North East,7613.0,3377.0,,Imp,Non-EU,212176.0,114498.0,,,,
Exp,EU,North West,14165.0,12781.0,,,,,Source: HMRC Regional Trade in Goods Statistics data,,,,
Exp,EU,Northern Ireland,4964.0,8298.0,,,,,,,,,
Exp,EU,Scotland,13987.0,6783.0,,,,,,,,,


In [10]:
tidy

Unnamed: 0,DATAMARKER,Flow,HMRC Partner Geography,Measure Type,NUTS Geography,OBS,SITC 4,Unit,Year
0,,Exp,EU,GBP Total,East,15248,all,gbp-million,2017
1,,Exp,EU,GBP Total,Exp,162271,all,gbp-million,2017
2,,Exp,Exp,GBP Total,Exp,328380,all,gbp-million,2017
3,,Exp,EU,GBP Total,East Midlands,10712,all,gbp-million,2017
4,,Exp,Non-EU,GBP Total,Exp,166109,all,gbp-million,2017
5,,Imp,Imp,GBP Total,Imp,468384,all,gbp-million,2017
6,,Exp,EU,GBP Total,London,15181,all,gbp-million,2017
7,,Imp,EU,GBP Total,Imp,256208,all,gbp-million,2017
8,,Exp,EU,GBP Total,North East,7613,all,gbp-million,2017
9,,Imp,Non-EU,GBP Total,Imp,212176,all,gbp-million,2017


In [11]:
tidy['DATAMARKER'] = tidy['DATAMARKER'].map(lambda x:'Not Applicable'
                                  if (x == 'N/A')
                                  else (x))

In [12]:
import numpy as np
tidy['OBS'].replace('', np.nan, inplace=True)
# tidy.dropna(subset=['OBS'], inplace=True)
# tidy.drop(columns=['DATAMARKER'], inplace=True)
tidy.rename(columns={'OBS': 'Value'}, inplace=True)
# tidy['Value'] = tidy['Value'].astype(int)
tidy['Value'] = tidy['Value'].map(lambda x:''
                                  if (x == ':') | (x == 'xx') | (x == '..') | (x == 'N/A')
                                  else (x))

In [13]:
tidy['NUTS Geography'] = tidy['NUTS Geography'].map(
    lambda x: {
        'East':'East of England', 
        'Exp' : 'nuts1/all',
        'Imp': 'nuts1/all'}.get(x, x))

tidy['HMRC Partner Geography'] = tidy['HMRC Partner Geography'].map(
    lambda x: {
        'Exp' : 'europe',
        'Imp': 'europe'}.get(x, x))

In [14]:
for col in tidy.columns:
    if col not in ['Value', 'Year']:
        tidy[col] = tidy[col].astype('category')
        display(col)
        display(tidy[col].cat.categories)

'DATAMARKER'

Index(['Not Applicable'], dtype='object')

'Flow'

Index(['Exp', 'Imp'], dtype='object')

'HMRC Partner Geography'

Index(['EU', 'Non-EU', 'europe'], dtype='object')

'Measure Type'

Index(['Count of Businesses', 'GBP Total'], dtype='object')

'NUTS Geography'

Index(['East Midlands', 'East of England', 'London', 'North East',
       'North West', 'Northern Ireland', 'Scotland', 'South East',
       'South West', 'Unallocated - Known', 'Unallocated - Unknown', 'Wales',
       'West Midlands', 'Yorkshire and The Humber', 'nuts1/all'],
      dtype='object')

'SITC 4'

Index(['all'], dtype='object')

'Unit'

Index(['businesses', 'gbp-million'], dtype='object')

In [15]:
tidy['NUTS Geography'] = tidy['NUTS Geography'].cat.rename_categories({
    'East Midlands' : 'nuts1/UKF', 
    'East of England': 'nuts1/UKH', 
    'London' : 'nuts1/UKI', 
    'North East' : 'nuts1/UKC',
    'North West' : 'nuts1/UKD', 
    'Scotland' : 'nuts1/UKM', 
    'South East' : 'nuts1/UKJ', 
    'South West' : 'nuts1/UKK',
    'Total for functional category' : 'nuts1/all', 
    'Wales' : 'nuts1/UKL', 
    'West Midlands' : 'nuts1/UKG',
    'Yorkshire and The Humber' : 'nuts1/UKE',
    'Northern Ireland' : 'nuts1/UKN',
    'East of England' : 'nuts1/UKH', 
    'Unallocated - Known' : 'nuts1/unk', 
    'Unallocated - Unknown' : 'nuts1/unu'
})
tidy['HMRC Partner Geography'] = tidy['HMRC Partner Geography'].cat.rename_categories({
        'EU'   : 'C',
        'Non-EU' : 'non-eu'})
tidy['Flow'] = tidy['Flow'].cat.rename_categories({
        'Exp'   : 'exports',
        'Imp' : 'imports'})

In [16]:
tidy =tidy[['Year','NUTS Geography','HMRC Partner Geography','Flow','SITC 4','Measure Type', 'Value', 'Unit','DATAMARKER']]