## HMRC Regional Trade Statistics

Transform to Tidy Data.

The source data is available from https://www.uktradeinfo.com/Statistics/RTS/Documents/Forms/AllItems.aspx in a series of zip files, `RTS web YYYY.zip` for the years 2013 to 2016 currently.

Each zip file contains fixed-width formatted text files following a layout described in https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20Detailed%20data%20information%20pack.pdf. Each row is has two measures: net mass in tonnes and statistical value in £1000's. We're assuming each observation has one measure, so split these  out into separate files.

In [1]:
from gssutils import *
scraper = Scraper('https://www.uktradeinfo.com/Statistics/RTS/Pages/default.aspx')
scraper

## UK Regional Trade Statistics (RTS

### Description

RTS data provides a breakdown by standard UK Region geography, of the UK Overseas Trade Statistics (OTS) data collected from UK Customs import and export entries, and the HMRC Intrastat survey.

### Distributions

1. RTS 2013 ([application/zip](https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202013.zip))
1. RTS 2014 ([application/zip](https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202014.zip))
1. RTS 2015 ([application/zip](https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202015.zip))
1. RTS 2016 ([application/zip](https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202016.zip))
1. RTS 2017 ([application/zip](https://www.uktradeinfo.com/Statistics/RTS/Documents/Rtsweb%202017.zip))


In [2]:
for dist in scraper.distributions:
    print(dist.downloadURL)

https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202013.zip
https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202014.zip
https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202015.zip
https://www.uktradeinfo.com/Statistics/RTS/Documents/RTS%20web%202016.zip
https://www.uktradeinfo.com/Statistics/RTS/Documents/Rtsweb%202017.zip


In [3]:
from zipfile import ZipFile
from io import BytesIO, TextIOWrapper

destinationFolder = Path('out')
destinationFolder.mkdir(exist_ok=True, parents=True)

for distribution in scraper.distributions:
    with ZipFile(BytesIO(scraper.session.get(distribution.downloadURL).content)) as zip:
        for name in zip.namelist():
            with zip.open(name, 'r') as quarterFile:
                quarterText = TextIOWrapper(quarterFile, encoding='utf-8')
                table = pd.read_fwf(quarterText, widths=[6, 1, 2, 1, 3, 2, 1, 2, 9, 9], names=[
                    'Period',
                    'Flow',
                    'HMRC Reporter Region',
                    'HMRC Partner Geography',
                    'Codalpha',
                    'Codseq',
                    'SITC Section',
                    'SITC 4',
                    'Value',
                    'Netmass'
                ], dtype=str)
                table['Period'] = table['Period'].map(lambda x: f'quarter/{x[2:]}-Q{x[0]}')
                table['Flow'] = table['Flow'].map(lambda x: 'Exports' if x == 'E' else 'Imports')
                table['HMRC Partner Geography'] = table.apply(
                    lambda x: x['Codseq'] if x['Codseq'][0] != '#' else x['Codalpha'],
                    axis=1)
                assert table['SITC Section'].equals(table['SITC 4'].apply(lambda x: x[0]))
                table.drop(columns=['Codalpha', 'Codseq', 'SITC Section'], inplace=True)
                mass = table.drop(columns=['Value'])
                mass['Measure Type'] = 'Net Mass'
                mass['Unit'] = 'kg-thousands'
                mass.rename(columns={'Netmass': 'Value'}, inplace=True, index=str)
                textFile = destinationFolder / pathify(name)
                mass.to_csv(textFile.with_suffix('.mass.csv'), index=False)
                value = table.drop(columns=['Netmass'])
                value['Measure Type'] = 'GBP Total'
                value['Unit'] = 'gbp-thousands'
                value.to_csv(textFile.with_suffix('.value.csv'), index=False)



In [4]:
scraper.dataset.family = 'trade'
with open(destinationFolder / 'dataset.trig', 'wb') as metadata:
    metadata.write(scraper.generate_trig())