Table 3: NIN to adult Overseas Nationals Entering The UK by Word region and nationality

In [1]:
%run lib/scrape_dwp.ipynb

metadata = scrape('https://www.gov.uk/government/statistics/national-insurance-number-allocations-to-adult-overseas-nationals-to-march-2018')
metadata

{'details': 'We also publish data on the [nationality of DWP working age benefit claimants\nat the point of National Insurance number\nregistration](https://www.gov.uk/government/statistics/nationality-at-point-\nof-nino-registration-of-dwp-working-age-benefit-recipients-data-to-feb-2017).\n\nThis quarterly report contains data on National Insurance number allocations\nto adult overseas nationals entering the UK.\n\nThe summary tables, derived from Stat-Xplore, show National Insurance number\nallocations to adult overseas nationals entering the UK by:\n\n  * quarter of registration and world region – January 2002 to March 2018\n  * region and local authority by world area – registrations year to March 2018\n  * registrations by nationality – year to March 2018\n\n### Explore the statistics with our interactive tools\n\nFull statistics on National Insurance number allocations to adult overseas\nnationals entering the UK are available from [Stat-Xplore](https://stat-\nxplore.dwp.gov.uk/)

The source of the data in this case is an OpenOffice spreadsheet. Unfortunately, Databaker can only read Excel spreadsheets at the moment, so we need to convert. For this we'll use `pyexcel` and plugins for ODS and XLS file formats.

In [2]:
import pyexcel
from io import BytesIO
from pathlib import Path, PurePosixPath

sourceFolder = Path('in')
sourceFolder.mkdir(exist_ok=True)

ods_files = [f for f in metadata['files'] if f['type'] == 'ODS']
assert len(ods_files) == 1, 'Should be exactly one ODS file'

ods_url = ods_files[0]['url']
ods_title = ods_files[0]['title']
ods_filename = PurePosixPath(urlparse(ods_url).path)

ods_file = BytesIO(session.get(ods_files[0]['url']).content)
xls_filename = sourceFolder / (ods_filename.with_suffix('.xls').name)

pyexcel.save_book_as(file_content=ods_file, file_type='ods', dest_file_name=str(xls_filename))

In [3]:
sheets = loadxlstabs(xls_filename)
tab = sheets[3]

Loading in\nino-registrations-adult-overseas-nationals-march-2018-tables.xls which has size 162304 bytes
Table names: ['CONTENTS', '1', '2', '3', '4']


In [4]:
savepreviewhtml(tab)

0,1,2,3,4,5,6
Table 3 : National Insurance Number Registrations To Adult Overseas Nationals Entering The UK,,,,,,
Word region and nationality,,,,,,
,,Yr to March 2017,Yr to March 2018,,Difference,% Change
,,,,,,
Total,,785722.0,669846.0,,-115876.0,-0.1474770974975882
,,,,,,
,,,,,,
European Union,,593466.0,476785.0,,-116681.0,-0.19660940980612202
Non European Union,,191380.0,192273.0,,893.0,0.004666109311317797
,,,,,,


In [5]:
observations = tab.excel_ref('C29:D29').expand(DOWN).is_not_blank()
savepreviewhtml(observations)

0
item 0

0,1,2,3,4,5,6
Table 3 : National Insurance Number Registrations To Adult Overseas Nationals Entering The UK,,,,,,
Word region and nationality,,,,,,
,,Yr to March 2017,Yr to March 2018,,Difference,% Change
,,,,,,
Total,,785722.0,669846.0,,-115876.0,-0.1474770974975882
,,,,,,
,,,,,,
European Union,,593466.0,476785.0,,-116681.0,-0.19660940980612202
Non European Union,,191380.0,192273.0,,893.0,0.004666109311317797
,,,,,,


In [6]:
Citizenship = tab.excel_ref('B29').expand(DOWN).is_not_blank()
savepreviewhtml(Citizenship)

0
item 0

0,1,2,3,4,5,6
Table 3 : National Insurance Number Registrations To Adult Overseas Nationals Entering The UK,,,,,,
Word region and nationality,,,,,,
,,Yr to March 2017,Yr to March 2018,,Difference,% Change
,,,,,,
Total,,785722.0,669846.0,,-115876.0,-0.1474770974975882
,,,,,,
,,,,,,
European Union,,593466.0,476785.0,,-116681.0,-0.19660940980612202
Non European Union,,191380.0,192273.0,,893.0,0.004666109311317797
,,,,,,


In [7]:
Period = tab.excel_ref('C3:D3')
Period = Period - Period.regex('^INFO').expand(DOWN)
savepreviewhtml(Period)

0
item 0

0,1,2,3,4,5,6
Table 3 : National Insurance Number Registrations To Adult Overseas Nationals Entering The UK,,,,,,
Word region and nationality,,,,,,
,,Yr to March 2017,Yr to March 2018,,Difference,% Change
,,,,,,
Total,,785722.0,669846.0,,-115876.0,-0.1474770974975882
,,,,,,
,,,,,,
European Union,,593466.0,476785.0,,-116681.0,-0.19660940980612202
Non European Union,,191380.0,192273.0,,893.0,0.004666109311317797
,,,,,,


In [8]:
Dimensions = [
            HDim(Period,'Period',DIRECTLY,ABOVE),
            HDim(Citizenship,'Citizenship', DIRECTLY, LEFT),
            HDimConst('Measure Type', 'Count'),
            HDimConst('Unit','People')
            ]

In [9]:
c1 = ConversionSegment(observations, Dimensions, processTIMEUNIT=True)
savepreviewhtml(c1)

0,1,2
OBS,Period,Citizenship

0,1,2,3,4,5,6
Table 3 : National Insurance Number Registrations To Adult Overseas Nationals Entering The UK,,,,,,
Word region and nationality,,,,,,
,,Yr to March 2017,Yr to March 2018,,Difference,% Change
,,,,,,
Total,,785722.0,669846.0,,-115876.0,-0.1474770974975882
,,,,,,
,,,,,,
European Union,,593466.0,476785.0,,-116681.0,-0.19660940980612202
Non European Union,,191380.0,192273.0,,893.0,0.004666109311317797
,,,,,,


In [10]:
new_table = c1.topandas()
new_table




Unnamed: 0,OBS,DATAMARKER,Period,Citizenship,Measure Type,Unit
0,181882,,Yr to March 2017,Romania,Count,People
1,147956,,Yr to March 2018,Romania,Count,People
2,83589,,Yr to March 2017,Poland,Count,People
3,58370,,Yr to March 2018,Poland,Count,People
4,61751,,Yr to March 2017,Italy,Count,People
5,47887,,Yr to March 2018,Italy,Count,People
6,42052,,Yr to March 2017,Bulgaria,Count,People
7,37223,,Yr to March 2018,Bulgaria,Count,People
8,44075,,Yr to March 2017,Spain,Count,People
9,34599,,Yr to March 2018,Spain,Count,People


In [11]:
new_table.count()

OBS             470
DATAMARKER       75
Period          470
Citizenship     470
Measure Type    470
Unit            470
dtype: int64

In [12]:
new_table.dtypes

OBS             object
DATAMARKER      object
Period          object
Citizenship     object
Measure Type    object
Unit            object
dtype: object

In [13]:
new_table = new_table[new_table['OBS'] != '']

In [14]:
new_table['Value'] = new_table['OBS'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [15]:
new_table = new_table[['Period','Citizenship','Measure Type','Value','Unit']]

In [16]:
new_table.head()

Unnamed: 0,Period,Citizenship,Measure Type,Value,Unit
0,Yr to March 2017,Romania,Count,181882,People
1,Yr to March 2018,Romania,Count,147956,People
2,Yr to March 2017,Poland,Count,83589,People
3,Yr to March 2018,Poland,Count,58370,People
4,Yr to March 2017,Italy,Count,61751,People


In [17]:
destinationFolder = Path('out')
destinationFolder.mkdir(exist_ok=True, parents=True)

new_table.to_csv(destinationFolder / ('nin3.csv'), index = False)

In [18]:
writeMetadata(metadata,
              'Adult overseas nationals entering the UK by Region and nationality',
              ods_title, 'Migration')

In [19]:
new_table.count()

Period          395
Citizenship     395
Measure Type    395
Value           395
Unit            395
dtype: int64