## Northern Ireland Statistics and Research Agency

[2017 Mid Year Population Estimates for Northern Ireland (NEW FORMAT TABLES)
](https://www.nisra.gov.uk/publications/2017-mid-year-population-estimates-northern-ireland-new-format-tables)

In [1]:
from databaker.framework import *
import pandas as pd

def extractMetadata(sheet):
    metadata = {}
    description = []

    prop = None
    name2prop = {
        'National Statistics Theme:': 'dcat:theme',
        'Data Subset:': 'gss:subset',
        'Dataset Title:': 'dct:title',
        'Coverage:': 'dct:spatial',
        'Source:': 'dct:publisher',
        'Contact:': 'dcat:contactPoint',
        'National Statistics Data?': 'gss:nationalStatistics',
        'Responsible Statistician:': 'dcat:contactPoint'
    }

    section = 'metadata'
    for row in book['Metadata']:
        if section == 'metadata':
            if row[1] in name2prop:
                prop = name2prop[row[1]]
            elif row[1] == 'Description of Data':
                section = 'description'

            if section != 'description' and len(row[2]) != 0 and prop:
                if prop in metadata:
                    metadata[prop] = metadata[prop] + " " + row[2]
                else:
                    metadata[prop] = row[2]
        elif section == 'description':
            description.append(row[1])

    metadata['dct:description'] = '\n'.join(description).strip()
    return metadata

In [2]:
%run "NISRA Migration MEY17CoC.ipynb"
all_tidy = tidy.copy()
md1 = extractMetadata(book['Metadata'])

%run "NISRA Migration MYE17 NETMIG AGE BANDS Gender.ipynb"
all_tidy = pd.concat([all_tidy, tidy])
md2 = extractMetadata(book['Metadata'])

%run "NISRA Migration MYE17 NETMIG AGE.ipynb"
all_tidy = pd.concat([all_tidy, tidy])
md3 = extractMetadata(book['Metadata'])

%run "NISRA Migration MYE17 NETMIG FLOW.ipynb"
all_tidy = pd.concat([all_tidy, tidy])
md4 = extractMetadata(book['Metadata'])

We're merging the data from these spreadsheets into the same cube, by making implicit dimensions explicit. The metadata should be the same for each, but it differs slightly: the description for the first table is a bit less verbose; the reference area for the first is a specialization of the rest.

In [3]:
md4

{'dcat:theme': 'Population',
 'gss:subset': 'Population and Migration',
 'dct:title': 'Mid-Year Population Estimates',
 'dct:spatial': 'Northern Ireland',
 'dct:publisher': 'NISRA ',
 'dcat:contactPoint': 'Customer Services; 02890 255156;  census@nisra.gov.uk Brian Green - Head of Demographic Statistics',
 'gss:nationalStatistics': 'Yes',
 'dct:description': 'Mid-2017 population estimates for Northern Ireland were published on 28 June 2018. Migration is one of the components of population change; its estimates are provided to enable understanding of the mid-year population estimates and to inform comment. \n\nNotes:\n1. The estimates are produced using a variety of data sources and statistical models.  Therefore small estimates should not be taken to refer to particular individuals.\n2. The migration element of the components of change have been largely derived from a data source which is known to be deficient in recording young adult males and outflows from Northern Ireland. Therefore

In [4]:
all_tidy.count()

Mid Year                       19580
Area                           19580
Age                            19580
Sex                            19580
Population Change Component    19580
Measure Type                   19580
Value                          19580
Unit                           19580
dtype: int64

In [5]:
all_tidy

Unnamed: 0,Mid Year,Area,Age,Sex,Population Change Component,Measure Type,Value,Unit
0,2001-06-30T00:00:00/P1Y,N92000002,all,T,Starting population,Count,1688838,People
1,2001-06-30T00:00:00/P1Y,N92000002,all,T,Births,Count,21460,People
2,2001-06-30T00:00:00/P1Y,N92000002,all,T,Deaths,Count,14432,People
3,2001-06-30T00:00:00/P1Y,N92000002,all,T,Natural Change,Count,7028,People
4,2001-06-30T00:00:00/P1Y,N92000002,all,T,Internal Inflows,Count,0,People
5,2001-06-30T00:00:00/P1Y,N92000002,all,T,Internal Outflows,Count,0,People
6,2001-06-30T00:00:00/P1Y,N92000002,all,T,Internal Net,Count,0,People
7,2001-06-30T00:00:00/P1Y,N92000002,all,T,United Kingdom Inflows,Count,12510,People
8,2001-06-30T00:00:00/P1Y,N92000002,all,T,United Kingdom Outflows,Count,11589,People
9,2001-06-30T00:00:00/P1Y,N92000002,all,T,United Kingdom Net,Count,921,People


In [6]:
out = Path('out')
out.mkdir(exist_ok=True)
all_tidy.to_csv(out / 'migration_nisra.csv', index = False)