## Notebook Intention:
*This basic Jupyter Notebook is intended to demonstrate the end-to-end process using a number of defined steps.
These steps do not need to be coupled within a single Notebook.*
### Section: Define and Load into Memory Notebook Python Libraries / Components:

In [377]:
conda install -c conda-forge cchardet

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [519]:
!pip install gssutils
!pip install csvwlib
!pip install slug
!pip install goodtables

from gssutils import *
from csvwlib import CSVWConverter
import slug

from goodtables import validate
from goodtables import Inspector
inspector = Inspector()
from pprint import pprint

import numpy as np

import json

from IPython.display import Markdown, display






### Section: Define Notebook Functions:

In [7]:
# Function for markdown Notebook outputs:
def printmd(string, colour=None):
    colourstr = "<span style='color:{}'>{}</span>".format(colour, string)
    display(Markdown(colourstr))

In [8]:
# Function that determines Python Execution Environment:
if get_ipython().__class__.__name__ == 'ZMQInteractiveShell':
    boo_pythonNB_environment = True
else:
    boo_pythonNB_environment = False

printmd('***Execution Environment: ' + get_ipython().__class__.__name__ + '; setting boo_pythonNB_environment to: ' \
      + str(boo_pythonNB_environment) + '.***', colour='Grey')

<span style='color:Grey'>***Execution Environment: ZMQInteractiveShell; setting boo_pythonNB_environment to: True.***</span>

In [590]:
# Join our Slugized transformed data with our REF data - using DataFrames only - no CSV functionality:
def align_REFdata_with_Transform(input_df, source_ref_columns_df, source_ref_components_df):
    
    REFdata_intermediate = pd.merge(input_df, source_ref_columns_df, left_on='REFColumnsCSV Link', right_on='title', how='inner')
    REFdata_linked_successful = pd.merge(REFdata_intermediate, source_ref_components_df, left_on='title', right_on='Label', how='left')
    
    REFdata_linked_UNsuccessful = pd.merge(input_df, source_ref_columns_df, left_on='REFColumnsCSV Link', right_on='title', how='outer', indicator=True).query('_merge=="left_only"')    

    return REFdata_linked_successful, REFdata_linked_UNsuccessful


### Stage 1: Harvesting Human Readable Data Sources:
* *Using GSS_Utils we can web scrape data sources such as .XLSX spreadsheets.*
* *The Scraper component of GSS-Utils can print out all the identified distributions of data.*

In [10]:
scraper = Scraper('https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/' \
                  'healthandlifeexpectancies/datasets/healthstatelifeexpectancyallagesuk')
display(scraper)

## Health state life expectancy, all ages, UK

Pivot tables for health state life expectancy by sex and area type, divided by three-year intervals starting from 2009 to 2011.

### Description

Pivot tables for health state life expectancy by sex and area type, divided by two-year intervals starting from 2009 to 2011.

### Distributions

1. Health state life expectancy, all ages, UK ([MS Excel Spreadsheet](https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/healthandlifeexpectancies/datasets/healthstatelifeexpectancyallagesuk/current/heestimates.xlsx)) - 2016-11-29
1. Health state life expectancy, all ages, UK ([application/zip](https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/healthandlifeexpectancies/datasets/healthstatelifeexpectancyallagesuk/current/previous/v1/healthexpectanciespivottables.zip)) - 2016-11-29
1. Health state life expectancy, all ages, UK ([application/zip](https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/healthandlifeexpectancies/datasets/healthstatelifeexpectancyallagesuk/current/previous/v2/healthexpectanciespivottablesversion2.zip)) - 2016-11-29
1. Health state life expectancy, all ages, UK ([MS Excel Spreadsheet](https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/healthandlifeexpectancies/datasets/healthstatelifeexpectancyallagesuk/current/previous/v3/refpivottablesfinal.xlsx)) - 2016-11-29
1. Health state life expectancy, all ages, UK ([MS Excel Spreadsheet](https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/healthandlifeexpectancies/datasets/healthstatelifeexpectancyallagesuk/current/previous/v4/hslepivotab1.xlsx)) - 2016-11-29


### Stage 2: Identifying the Distribution to Extract and Process:
*We're selecting the first distribution for processing...*
* *Note that our component scrapes associated meta data.*

In [11]:
tabs = scraper.distributions[0].as_databaker()
distribution = scraper.distributions[0]
display(distribution)

### Stage 3: Data Wrangling:
**For the purposes of this Notebook we're only going to process two tabs of the spreadsheet.**

*Note: Where multiple tabs and multiple outputs are required and generated additional file / code management may*
*be necessary.*

**Transforms (tidy data outputs) are stored in a collection of Pandas DataFrames.**

**Validations by user and through tools are expected and conducted in the cells that succeed the wrangling code-cells.**

In [320]:
str_tabsheetsinfocus = 'H'
if boo_pythonNB_environment == True:
    printmd("**Processing tabs that start with: " + str_tabsheetsinfocus + ".**")
    
i = 0 # Loop Variable.
ii = 0 # Loop Variable.
dataframe_collection = {} # Collection of Pandas DataFrames.


for tab in tabs:
    if tabs[i].name.startswith(str_tabsheetsinfocus):
        
        
        if tabs[i].name == 'HE - Country level estimates':
            ii = 1
            try:
                pd_tab = distribution.as_pandas(sheet_name = tabs[i].name) #, skiprows=1, header=None)
                pd_df_dimensions = pd_tab.iloc[:, :7] # This split is lost in our transformation - but it helped here.
                pd_df_observations = pd_tab.iloc[:, 7:14] # See above.
                pd_df_original = pd.concat([pd_df_dimensions, pd_df_observations], axis=1, sort=False)
                pd_df_transformed = pd_df_original.dropna(how='all')
                pd_df_transformed = pd_df_transformed.drop(columns=['Country', 'sex1', 'ageband'])
                pd_df_transformed['Period'] = pd_df_transformed['Period']\
                    .map(lambda x: f'gregorian-interval/{str(x)[:4]}-03-31T00:00:00/P3Y')
                pd_df_transformed.loc[pd_df_transformed['Sex'] == 'Males', 'Sex'] = 'M'
                pd_df_transformed.loc[pd_df_transformed['Sex'] == 'Females', 'Sex'] = 'F'
                pd_df_transformed['age group'][pd_df_transformed['age group'] == '<1'] = 'lessthan1'
                pd_df_transformed['age group'][pd_df_transformed['age group'] == '90+'] = '90plus'
                pd_df_transformed_le =\
                    pd_df_transformed[['Period', 'Code', 'Sex', 'age group', 'Life Expectancy (LE)_',\
                                       'LE Lower CI_', 'LE Upper CI_',\
                                       #'Proportion of Life Spent in "Good" Health (%)_']].copy()
                                      ]].copy()
                pd_df_transformed_hle =\
                    pd_df_transformed[['Period', 'Code', 'Sex', 'age group', 'Healthy Life Expectancy (HLE) _',\
                                       'HLE Lower CI_', 'HLE Upper CI_',\
                                       #'Proportion of Life Spent in "Good" Health (%)_']].copy()
                                       ]].copy()
                pd_df_transformed_le['TransformationType'] = 'LE'
                pd_df_transformed_hle['TransformationType'] = 'HLE'
                dataframe_collection[tabs[i].name + '_LE'] = pd_df_transformed_le
                dataframe_collection[tabs[i].name + '_HLE'] = pd_df_transformed_hle
                printmd('[' + str(i) + '] Processed: ' + tabs[i].name + '.', colour='Green')
            except ERR_HECountryLevelEstimates:
                print('Error within ' + str(pd_df_name.append(tabs[i].name)) + ' process to extract to Pandas DF.')
            
            
        if tabs[i].name == 'HE - Region level estimates':
            ii = 1
            try:
                pd_tab = distribution.as_pandas(sheet_name = tabs[i].name)
                pd_df_dimensions = pd_tab.iloc[:, :8]
                pd_df_observations = pd_tab.iloc[:, 8:14]
                pd_df_original = pd.concat([pd_df_dimensions, pd_df_observations], axis=1, sort=False)
                pd_df_original.columns = pd_df_original.iloc[0]
                pd_df_original = pd_df_original[1:]
                pd_df_transformed = pd_df_original.dropna(how='all')
                pd_df_transformed = pd_df_transformed.drop(columns=['Area_name', 'sex1', 'ageband'])
                pd_df_transformed['Period'] = pd_df_transformed['Period']\
                    .map(lambda x: f'gregorian-interval/{str(x)[:4]}-03-31T00:00:00/P3Y')
                pd_df_transformed.loc[pd_df_transformed['Sex'] == 'Males', 'Sex'] = 'M'
                pd_df_transformed.loc[pd_df_transformed['Sex'] == 'Females', 'Sex'] = 'F'
                pd_df_transformed['Age group'][pd_df_transformed['Age group'] == '<1'] = 'lessthan1'
                pd_df_transformed['Age group'][pd_df_transformed['Age group'] == '90+'] = '90plus'
                pd_df_transformed_le =\
                    pd_df_transformed[['Period', 'Code', 'Sex', 'Age group', 'Life Expectancy (LE)_',\
                                       'LE Lower CI_', 'LE Upper CI_',\
                                       #'Proportion of Life Spent in "Good" Health (%)_']].copy()
                                       ]].copy()
                pd_df_transformed_hle =\
                    pd_df_transformed[['Period', 'Code', 'Sex', 'Age group', 'Healthy Life Expectancy (HLE) _',\
                                       'HLE Lower CI_', 'HLE Upper CI_',\
                                       #'Proportion of Life Spent in "Good" Health (%)_']].copy()
                                        ]].copy()
                pd_df_transformed_le['TransformationType'] = 'LE'
                pd_df_transformed_hle['TransformationType'] = 'HLE'
                dataframe_collection[tabs[i].name + '_LE'] = pd_df_transformed_le
                dataframe_collection[tabs[i].name + '_HLE'] = pd_df_transformed_hle
                printmd('[' + str(i) + '] Processed: ' + tabs[i].name + '.', colour='Green')
            except ERR_HERegionLevelEstimates:
                print('Error within ' + str(pd_df_name.append(tabs[i].name)) + ' process to extract to Pandas DF.')            
    
        if ii == 0:
                printmd('[' + str(i) + '] Ignoring: ' + tabs[i].name + '.', colour='Red')
        
    i += 1
    ii = 0 # Code should be amended to utilise loop break outs...


<span style='color:None'>**Processing tabs that start with: H.**</span>

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


<span style='color:Green'>[1] Processed: HE - Country level estimates.</span>

<span style='color:Green'>[2] Processed: HE - Region level estimates.</span>

<span style='color:Red'>[3] Ignoring: HE - MC,CA,WHB estimates.</span>

<span style='color:Red'>[4] Ignoring: HE - Local area estimates.</span>

**Checking the result of our data extraction and wrangling of the two data tabs:**

*Note: This is for human visualisation - not for specific testing (or as part of a testing framework).*

If the transformations appear correct we can proceed, otherwise we need to re-factor the data wrangling stage.

In [380]:
# Display:
for key in dataframe_collection.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('**' + key + '**' + ' *: First five records displayed of*' + ' ' + str(dataframe_collection[key].shape[0]) + ' records.', colour='Blue')
    printmd("="*100, colour='Grey')
    #print(dataframe_collection[key]) #Print like this for Logs...  
    display(dataframe_collection[key].head(5))
    

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>**HE - Country level estimates_LE** *: First five records displayed of* 1600 records.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,age group,Life Expectancy (LE)_,LE Lower CI_,LE Upper CI_,TransformationType
0,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,lessthan1,78.78073,78.75026,78.8112,LE
1,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,01-04,78.17075,78.14207,78.19942,LE
2,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,05-09,74.22782,74.19942,74.25623,LE
3,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,10-14,69.26178,69.23354,69.29002,LE
4,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,15-19,64.29632,64.26823,64.3244,LE


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>**HE - Country level estimates_HLE** *: First five records displayed of* 1600 records.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,age group,Healthy Life Expectancy (HLE) _,HLE Lower CI_,HLE Upper CI_,TransformationType
0,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,lessthan1,63.02647,62.87787,63.17508,HLE
1,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,01-04,62.37935,62.23008,62.52862,HLE
2,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,05-09,58.59795,58.44976,58.74614,HLE
3,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,10-14,53.86217,53.71597,54.00838,HLE
4,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,15-19,49.15668,49.01273,49.30062,HLE


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>**HE - Region level estimates_LE** *: First five records displayed of* 2880 records.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,Age group,Life Expectancy (LE)_,LE Lower CI_,LE Upper CI_,TransformationType
1,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,lessthan1,77.4021,77.2661,77.5382,LE
2,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,01-04,76.7237,76.5949,76.8526,LE
3,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,05-09,72.7749,72.6472,72.9026,LE
4,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,10-14,67.7985,67.6714,67.9256,LE
5,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,15-19,62.8475,62.7214,62.9737,LE


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>**HE - Region level estimates_HLE** *: First five records displayed of* 2880 records.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,Age group,Healthy Life Expectancy (HLE) _,HLE Lower CI_,HLE Upper CI_,TransformationType
1,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,lessthan1,59.7111,59.1905,60.2318,HLE
2,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,01-04,58.9645,58.4417,59.4873,HLE
3,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,05-09,55.0573,54.5354,55.5793,HLE
4,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,10-14,50.2258,49.7077,50.7439,HLE
5,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,15-19,45.5027,44.9906,46.0148,HLE


### Checking Validity of Transforms in CSV (in-memory) Format:

In [414]:
# Good Tables CSV validity tests:
for key in dataframe_collection.keys():
    csv_path_file = str(slug.slug(key)) + '.csv'
    dataframe_collection[key].to_csv(csv_path_file, index = None, header=True)
    report = validate(csv_path_file)
    printmd("\n" + "="*100, colour='Grey')
    printmd('***CSV extract for: ' + csv_path_file + ' has been examined. Validation of the file is:***' + ' ' + str(report['valid']) + '.', colour='Purple')
    printmd("="*100, colour='Grey')
    if report['valid'] == False:
        closer_file_inspection = inspector.inspect(csv_path_file)
        pprint(closer_file_inspection)


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Purple'>***CSV extract for: he-country-level-estimates_le.csv has been examined. Validation of the file is:*** True.</span>

<span style='color:Grey'>====================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Purple'>***CSV extract for: he-country-level-estimates_hle.csv has been examined. Validation of the file is:*** True.</span>

<span style='color:Grey'>====================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Purple'>***CSV extract for: he-region-level-estimates_le.csv has been examined. Validation of the file is:*** True.</span>

<span style='color:Grey'>====================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Purple'>***CSV extract for: he-region-level-estimates_hle.csv has been examined. Validation of the file is:*** True.</span>

<span style='color:Grey'>====================================================================================================</span>

### Stage 4: Isolated Notebook Tests until a Framework is Developed and Deployed:
**Checking the result of our data extraction and wrangling of data sources / tabs:**

* Note: A testing framework and strategy is required.
* Note: Tests can be programatically implemented using code / functions OR as implemented here via hardcoded values.
    * *Remember: that if testing values are hardcoded they will only apply to a specific data source; for example record counts may change between revisions of the same data source etc.*


* *A testing strategy is yet to be defined for our COGS development teams. Currently, checks that mirror those below are a good starting point; these include counts and sum'ing to ensure no data is lost during the transform process and random data entry point checks to ensure the data wrangling steps have not skewed the data schema / structure.*
    * *A testing framework should be implemented where tests are executed periodically to identify when new source data is available or has been revised - a test suite for a RAP (Reproducible Analytical Pipeline) can highlight failures and a requirment for a code-refactor.*
    * ***FOR INFORMATION! BE ADVISED - Intentionally one test here fails for demonstration purposes.***
    

In [515]:
# Hard-coded tests:
test_count_int = 0
test_count_successful = 0

if boo_pythonNB_environment == True:
    # Only output to Notebook as per other code-cells.
    # Remember tests only relevant to specific data sources!
    # You could use Python assert for all tests in a testing framework wrapper - lots of options available to us.
    
    test_count_int += 1 # Increment Test Counter...
    if dataframe_collection['HE - Country level estimates_HLE'].shape[0] == 1600:
        printmd('Test ID: [' + str(test_count_int) + '] Successful.', colour='Green')
        test_count_successful += 1 # Increment Successful Test Counter...
    else:
        printmd('Test ID: [' + str(test_count_int) + '] Failed.', colour='Red')

    test_count_int += 1 # Increment Test Counter...
    my_testdata = (dataframe_collection['HE - Country level estimates_HLE'].loc[(dataframe_collection['HE - Country level estimates_HLE']['Period'] == 'gregorian-interval/2009-03-31T00:00:00/P3Y') &
                     (dataframe_collection['HE - Country level estimates_HLE']['Code'] == 'E92000001') &
                     (dataframe_collection['HE - Country level estimates_HLE']['age group'] == 'lessthan1') &
                     (dataframe_collection['HE - Country level estimates_HLE']['Sex'] == 'M')]
                     )
    my_expecteddata = pd.DataFrame({
                    'Period': ['gregorian-interval/2009-03-31T00:00:00/P3Y'],
                    'Code': ['E92000001'],
                    'Sex': ['M'],
                    'age group': ['lessthan1'],
                    'Healthy Life Expectancy (HLE) _': [63.02647],
                    'HLE Lower CI_': [62.87787],
                    'HLE Upper CI_': [63.17508],
                    #'Proportion of Life Spent in "Good" Health (%)_': [80.0024],
                    'TransformationType': ['HLE']
                    })
    if my_testdata.equals(my_expecteddata):
        printmd('Test ID: [' + str(test_count_int) + '] Successful.', colour='Green')
        test_count_successful += 1 # Increment Successful Test Counter...
    else:
        printmd('Test ID: [' + str(test_count_int) + '] Failed.', colour='Red')

    test_count_int += 1 # Increment Test Counter...
    my_testdata = (dataframe_collection['HE - Country level estimates_HLE'].loc[(dataframe_collection['HE - Country level estimates_HLE']['Period'] == 'gregorian-interval/2016-03-31T00:00:00/P3Y') &
                     (dataframe_collection['HE - Country level estimates_HLE']['Code'] == 'W92000004') &
                     (dataframe_collection['HE - Country level estimates_HLE']['Sex'] == 'F') &
                     (dataframe_collection['HE - Country level estimates_HLE']['age group'] == '05-09')]
                    ) 
    my_testdata.index = np.arange(1,len(my_testdata)+1) # To avoid index comparison errors.
    my_expecteddata = pd.DataFrame({
                    'Period': ['gregorian-interval/2016-03-31T00:00:00/P3Y'],
                    'Code': ['W92000004'],
                    'Sex': ['F'],
                    'age group': ['05-09'],
                    'Healthy Life Expectancy (HLE) _': [57.56803],
                    'HLE Lower CI_': [57.10483],
                    'HLE Upper CI_': [58.03124],
                    #'Proportion of Life Spent in "Good" Health (%)_': [74.21282],
                    'TransformationType': ['HLE']
                    })
    my_expecteddata.index = np.arange(1,len(my_expecteddata)+1) # To avoid index comparison errors.
    if my_testdata.equals(my_expecteddata):
        printmd('Test ID: [' + str(test_count_int) + '] Successful.', colour='Green')
        test_count_successful += 1 # Increment Successful Test Counter...
    else:
        printmd('Test ID: [' + str(test_count_int) + '] Failed.', colour='Red')
        
    test_count_int += 1 # Increment Test Counter...
    if dataframe_collection['HE - Country level estimates_LE']['Life Expectancy (LE)_'].sum() == 65114.214550000004:
        printmd('Test ID: [' + str(test_count_int) + '] Successful.', colour='Green')
        test_count_successful += 1 # Increment Successful Test Counter...
    else:
        printmd('Test ID: [' + str(test_count_int) + '] Failed.', colour='Red')        
        
    test_count_int += 1 # Increment Test Counter...
    if dataframe_collection['HE - Region level estimates_HLE'].shape[0] == 2880:
        printmd('Test ID: [' + str(test_count_int) + '] Successful.', colour='Green')
        test_count_successful += 1 # Increment Successful Test Counter...
    else:
        printmd('Test ID: [' + str(test_count_int) + '] Failed.', colour='Red')

    test_count_int += 1 # Increment Test Counter...
    if 1 == 2: # Intentional FAILED test for demo purposes...
        printmd('Test ID: [' + str(test_count_int) + '] Successful.', colour='Green')
        test_count_successful += 1 # Increment Successful Test Counter...
    else:
        printmd('Test ID: [' + str(test_count_int) + '] Failed.', colour='Red')        

        
    test_count_int += 1 # Increment Test Counter...
    if dataframe_collection['HE - Region level estimates_LE']['Life Expectancy (LE)_'].sum() == 118840.88113000023:
        printmd('Test ID: [' + str(test_count_int) + '] Successful.', colour='Green')
        test_count_successful += 1 # Increment Successful Test Counter...
    else:
        printmd('Test ID: [' + str(test_count_int) + '] Failed.', colour='Red')
                      
printmd('**Test Rating: ' + str(round(test_count_successful/test_count_int*100,2)) + '% Successful.**', colour='Magenta')

<span style='color:Green'>Test ID: [1] Successful.</span>

<span style='color:Green'>Test ID: [2] Successful.</span>

<span style='color:Green'>Test ID: [3] Successful.</span>

<span style='color:Green'>Test ID: [4] Successful.</span>

<span style='color:Green'>Test ID: [5] Successful.</span>

<span style='color:Red'>Test ID: [6] Failed.</span>

<span style='color:Green'>Test ID: [7] Successful.</span>

<span style='color:Magenta'>**Test Rating: 85.71% Successful.**</span>

In the event that tests are failing - it may be wise to terminate the testing framework / testing component / Notebook or communicate the results and messages to downstream processes etc. However for our purposes of this particular Notebook we will just continue.

*Having trouble using Python 'assert' or Panda's Dataframe equality checks? A lot of the time, it's the schema that doesn't 'match' even though the data entries appear to be the same.*

***Try the following code-snippet to investigate schemas:***


In [323]:
my_expecteddata.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 1 to 1
Data columns (total 8 columns):
Period                             1 non-null object
Code                               1 non-null object
Sex                                1 non-null object
age group                          1 non-null object
Healthy Life Expectancy (HLE) _    1 non-null float64
HLE Lower CI_                      1 non-null float64
HLE Upper CI_                      1 non-null float64
TransformationType                 1 non-null object
dtypes: float64(3), object(5)
memory usage: 72.0+ bytes


***With our older / existing process flow we would transfer the wrangled data outputs to a .csv file, but for our purposes we'll leave the data in memory.***

### Stage 5: Creating or Mapping Reference Data and Data Markers:

**To create an RDF output we need the data (.csv) and associated meta data (.json). Here we focus on the meta data.**
* *We must isolate the dimensions (keys) from the observations (values).*
    * Generate the components from the transforms (i.e. from the tidy data outputs)...
    * Strip out data elements - remove all dimensions to leave only the observations...
        * *...thus providing us with our codelist(s) keys.*
        * ***Note:***
        *This must be done manually!*
        

In [324]:
codelist_cols = []
dataframe_elements_collection = {}

# Obtain all data elements (columns) from data outputs (tidy data):
i = 0
if boo_pythonNB_environment == True:
    for key in dataframe_collection.keys():
        codelist_cols.append((list(dataframe_collection[key].columns)))
        printmd('**Extracted columns from dataset ' + key + ":**", colour='Green')
        printmd(codelist_cols[i], colour='Grey')  
        i += 1
    
# Generate a collection of dataframes - default all entries to dimensions for time being intentionally...
i = 0
for key in dataframe_collection.keys():
    for x in range(len(codelist_cols[i])):
        df_components = pd.DataFrame(codelist_cols[i],columns=[key])
    df_components['Entry Type']='Dimension'
    dataframe_elements_collection[key] = df_components
    i += 1

# Display:
i = 0
for key in dataframe_elements_collection.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('Initial Component DataFrame for: ' + key + '.', colour='Blue')
    printmd("="*100, colour='Grey')
    #print(dataframe_elements_collection[key]) #Print like this for Logs...  
    display(dataframe_elements_collection[key])
    i += 1

<span style='color:Green'>**Extracted columns from dataset HE - Country level estimates_LE:**</span>

<span style='color:Grey'>['Period', 'Code', 'Sex', 'age group', 'Life Expectancy (LE)_', 'LE Lower CI_', 'LE Upper CI_', 'TransformationType']</span>

<span style='color:Green'>**Extracted columns from dataset HE - Country level estimates_HLE:**</span>

<span style='color:Grey'>['Period', 'Code', 'Sex', 'age group', 'Healthy Life Expectancy (HLE) _', 'HLE Lower CI_', 'HLE Upper CI_', 'TransformationType']</span>

<span style='color:Green'>**Extracted columns from dataset HE - Region level estimates_LE:**</span>

<span style='color:Grey'>['Period', 'Code', 'Sex', 'Age group', 'Life Expectancy (LE)_', 'LE Lower CI_', 'LE Upper CI_', 'TransformationType']</span>

<span style='color:Green'>**Extracted columns from dataset HE - Region level estimates_HLE:**</span>

<span style='color:Grey'>['Period', 'Code', 'Sex', 'Age group', 'Healthy Life Expectancy (HLE) _', 'HLE Lower CI_', 'HLE Upper CI_', 'TransformationType']</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Initial Component DataFrame for: HE - Country level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
4,Life Expectancy (LE)_,Dimension
5,LE Lower CI_,Dimension
6,LE Upper CI_,Dimension
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Initial Component DataFrame for: HE - Country level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
4,Healthy Life Expectancy (HLE) _,Dimension
5,HLE Lower CI_,Dimension
6,HLE Upper CI_,Dimension
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Initial Component DataFrame for: HE - Region level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
4,Life Expectancy (LE)_,Dimension
5,LE Lower CI_,Dimension
6,LE Upper CI_,Dimension
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Initial Component DataFrame for: HE - Region level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
4,Healthy Life Expectancy (HLE) _,Dimension
5,HLE Lower CI_,Dimension
6,HLE Upper CI_,Dimension
7,TransformationType,Dimension


***Define the observations in the data item catalogues:***

In [326]:
# Manual, non-automatable step:

# This could / shuld be wrapped in a function / component wrapper etc...
# Using loop as all transforms in this example use same data elements (columns):
for key in dataframe_elements_collection.keys():
    dataframe_elements_collection[key].loc[dataframe_elements_collection[key][key] == 'Life Expectancy (LE)_', ['Entry Type']] = 'Measure'
    dataframe_elements_collection[key].loc[dataframe_elements_collection[key][key] == 'LE Lower CI_', ['Entry Type']] = 'Observation'
    dataframe_elements_collection[key].loc[dataframe_elements_collection[key][key] == 'LE Upper CI_', ['Entry Type']] = 'Observation'
    dataframe_elements_collection[key].loc[dataframe_elements_collection[key][key] == 'Healthy Life Expectancy (HLE) _', ['Entry Type']] = 'Measure'
    dataframe_elements_collection[key].loc[dataframe_elements_collection[key][key] == 'HLE Lower CI_', ['Entry Type']] = 'Observation'
    dataframe_elements_collection[key].loc[dataframe_elements_collection[key][key] == 'HLE Upper CI_', ['Entry Type']] = 'Observation'
    #dataframe_elements_collection[key].loc[dataframe_elements_collection[key][key] == 'Proportion of Life Spent in "Good" Health (%)_', ['Entry Type']] = 'Observation'


# Render the Results:
i = 0
printmd("\n" + "="*115, colour='Grey')
printmd('**WARNING!:**' + ' Please be AWARE that you should not proceed until the data items are correctly assigned as being either a Dimension or an Observation.', colour='Red')
printmd('*If the results below show errors please fix by re-factoring your code and re-run.*', colour='Red')
printmd("\n" + "="*115, colour='Grey')
for key in dataframe_elements_collection.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('Revised Component DataFrame for: ' + key + '.', colour='Blue')
    printmd("="*100, colour='Grey')
    #print(dataframe_elements_collection[key]) #Print like this for Logs...  
    display(dataframe_elements_collection[key])
    i += 1


<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Red'>**WARNING!:** Please be AWARE that you should not proceed until the data items are correctly assigned as being either a Dimension or an Observation.</span>

<span style='color:Red'>*If the results below show errors please fix by re-factoring your code and re-run.*</span>

<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Country level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
4,Life Expectancy (LE)_,Measure
5,LE Lower CI_,Observation
6,LE Upper CI_,Observation
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Country level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
4,Healthy Life Expectancy (HLE) _,Measure
5,HLE Lower CI_,Observation
6,HLE Upper CI_,Observation
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Region level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
4,Life Expectancy (LE)_,Measure
5,LE Lower CI_,Observation
6,LE Upper CI_,Observation
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Region level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
4,Healthy Life Expectancy (HLE) _,Measure
5,HLE Lower CI_,Observation
6,HLE Upper CI_,Observation
7,TransformationType,Dimension


***Once the Entry Types of Dimension / Observation have been correctly assigned we can proceed...***

**Creating the Codelists:**

In [327]:
dataframe_codelists_collection = {}

printmd("\n" + "="*100, colour='Grey')
printmd('Identified Dimension Entries:', colour='Blue')
printmd("="*100, colour='Grey')


for key in dataframe_elements_collection.keys():
    df_result = pd.DataFrame(columns = ['DropMe'])

    df_temp = dataframe_elements_collection[key].loc[dataframe_elements_collection[key]['Entry Type'] == 'Dimension']
    display(df_temp)
    codelist_cols = []
    for rows in df_temp.itertuples():
        codelist_cols_temp = rows[1]
        codelist_cols.append(codelist_cols_temp)

    
    for col in dataframe_collection[key]:
        if col in codelist_cols:
            my_codelist_lst = dataframe_collection[key][col].unique()
            df_codelist = pd.DataFrame(my_codelist_lst, columns = [col])
            df_result = pd.concat([df_result, df_codelist], axis = 1, ignore_index=False, sort=False)
    df_result = df_result.drop('DropMe', 1)
    dataframe_codelists_collection[key] = df_result


for key in dataframe_codelists_collection.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('Codelists DataFrame for: ' + key + '.', colour='Blue')
    printmd("="*100, colour='Grey')
    display(dataframe_codelists_collection[key])
                      


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Identified Dimension Entries:</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
7,TransformationType,Dimension


Unnamed: 0,HE - Country level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
7,TransformationType,Dimension


Unnamed: 0,HE - Region level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
7,TransformationType,Dimension


Unnamed: 0,HE - Region level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Codelists DataFrame for: HE - Country level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,age group,TransformationType
0,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,lessthan1,LE
1,gregorian-interval/2010-03-31T00:00:00/P3Y,K02000001,F,01-04,
2,gregorian-interval/2011-03-31T00:00:00/P3Y,N92000002,,05-09,
3,gregorian-interval/2012-03-31T00:00:00/P3Y,S92000003,,10-14,
4,gregorian-interval/2013-03-31T00:00:00/P3Y,W92000004,,15-19,
5,gregorian-interval/2014-03-31T00:00:00/P3Y,,,20-24,
6,gregorian-interval/2015-03-31T00:00:00/P3Y,,,25-29,
7,gregorian-interval/2016-03-31T00:00:00/P3Y,,,30-34,
8,,,,35-39,
9,,,,40-44,


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Codelists DataFrame for: HE - Country level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,age group,TransformationType
0,gregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,lessthan1,HLE
1,gregorian-interval/2010-03-31T00:00:00/P3Y,K02000001,F,01-04,
2,gregorian-interval/2011-03-31T00:00:00/P3Y,N92000002,,05-09,
3,gregorian-interval/2012-03-31T00:00:00/P3Y,S92000003,,10-14,
4,gregorian-interval/2013-03-31T00:00:00/P3Y,W92000004,,15-19,
5,gregorian-interval/2014-03-31T00:00:00/P3Y,,,20-24,
6,gregorian-interval/2015-03-31T00:00:00/P3Y,,,25-29,
7,gregorian-interval/2016-03-31T00:00:00/P3Y,,,30-34,
8,,,,35-39,
9,,,,40-44,


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Codelists DataFrame for: HE - Region level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,Age group,TransformationType
0,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,lessthan1,LE
1,gregorian-interval/2010-03-31T00:00:00/P3Y,E12000002,F,01-04,
2,gregorian-interval/2011-03-31T00:00:00/P3Y,E12000003,,05-09,
3,gregorian-interval/2012-03-31T00:00:00/P3Y,E12000004,,10-14,
4,gregorian-interval/2013-03-31T00:00:00/P3Y,E12000005,,15-19,
5,gregorian-interval/2014-03-31T00:00:00/P3Y,E12000006,,20-24,
6,gregorian-interval/2015-03-31T00:00:00/P3Y,E12000007,,25-29,
7,gregorian-interval/2016-03-31T00:00:00/P3Y,E12000008,,30-34,
8,,E12000009,,35-39,
9,,,,40-44,


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Codelists DataFrame for: HE - Region level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,Period,Code,Sex,Age group,TransformationType
0,gregorian-interval/2009-03-31T00:00:00/P3Y,E12000001,M,lessthan1,HLE
1,gregorian-interval/2010-03-31T00:00:00/P3Y,E12000002,F,01-04,
2,gregorian-interval/2011-03-31T00:00:00/P3Y,E12000003,,05-09,
3,gregorian-interval/2012-03-31T00:00:00/P3Y,E12000004,,10-14,
4,gregorian-interval/2013-03-31T00:00:00/P3Y,E12000005,,15-19,
5,gregorian-interval/2014-03-31T00:00:00/P3Y,E12000006,,20-24,
6,gregorian-interval/2015-03-31T00:00:00/P3Y,E12000007,,25-29,
7,gregorian-interval/2016-03-31T00:00:00/P3Y,E12000008,,30-34,
8,,E12000009,,35-39,
9,,,,40-44,


### Stage 6: Loading and Matching our Transforms with our Reference Data:

**As per Stage 5; to create an RDF output we need the data (.csv) and associated meta data (.json). Here we focus on the meta data.**
* *We must associate our transformed data dimensions with our reference data repository / master database.*
    * Step (a): Load in reference data.
    * Step (b): Map / associate reference data with our transformed data entities / dimensions.
        * *Note an automated attempt to map is conducted - but manual intervention is likely.*

In [328]:
# Specify the source of the reference data master (currently COGS has a split-by-data-family configuration):
url_ref_repo_components = "https://raw.githubusercontent.com/GSS-Cogs/family-disability/master/reference/components.csv"
url_ref_repo_columns = "https://raw.githubusercontent.com/GSS-Cogs/family-disability/master/reference/columns.csv"


In [329]:
df_ref_repo_columns = pd.read_csv(url_ref_repo_columns)
printmd('**Displaying: ' + url_ref_repo_columns + ':**')
display(df_ref_repo_columns)

df_ref_repo_components = pd.read_csv(url_ref_repo_components)
printmd('**Displaying: ' + url_ref_repo_components + ':**')
display(df_ref_repo_components)


<span style='color:None'>**Displaying: https://raw.githubusercontent.com/GSS-Cogs/family-disability/master/reference/columns.csv:**</span>

Unnamed: 0,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range
0,Measure Type,measure_type,qb:dimension,http://purl.org/linked-data/cube#measureType,http://gss-data.org.uk/def/measure/{measure_type},string,slugize,,qb:MeasureProperty
1,Area,area,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://statistics.data.gov.uk/id/statistical-g...,string,,,
2,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex
3,Age,age,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://gss-data.org.uk/def/concept/ages/{age},string,slugize,,
4,Household Disability Status,household_disability_status,qb:dimension,http://gss-data.org.uk/def/dimension/household...,http://gss-data.org.uk/def/concept/household-d...,string,slugize,,http://gss-data.org.uk/def/classes/household-d...
5,Workless Household Type,workless_household_type,qb:dimension,http://gss-data.org.uk/def/dimension/workless-...,http://gss-data.org.uk/def/concept/workless-ho...,string,slugize,,http://gss-data.org.uk/def/classes/workless-ho...
6,Unit,unit,qb:attribute,http://purl.org/linked-data/sdmx/2009/attribut...,http://gss-data.org.uk/def/concept/measurement...,string,unitize,,
7,Sample Size,sample_size,qb:attribute,http://gss-data.org.uk/def/attribute/sample-size,,number,,,
8,Lower CI,lower_ci,qb:attribute,http://gss-data.org.uk/def/attribute/lower-ci,,number,,,
9,Upper CI,upper_ci,qb:attribute,http://gss-data.org.uk/def/attribute/upper-ci,,number,,,


<span style='color:None'>**Displaying: https://raw.githubusercontent.com/GSS-Cogs/family-disability/master/reference/components.csv:**</span>

Unnamed: 0,Label,Description,Component Type,Codelist
0,Household Disability Status,,Dimension,http://gss-data.org.uk/def/concept-scheme/hous...
1,Workless Household Type,,Dimension,http://gss-data.org.uk/def/concept-scheme/work...
2,Percentage of People,The percentage of all people measured who meet...,Measure,
3,Identified support needs of homeless households,,Dimension,http://gss-data.org.uk/def/concept-scheme/iden...
4,Reasons for failing to maintain accommodation,,Dimension,http://gss-data.org.uk/def/concept-scheme/reas...
5,Reasons for homelessness application,,Dimension,http://gss-data.org.uk/def/concept-scheme/reas...
6,Indicator,,Dimension,http://gss-data.org.uk/def/concept-scheme/indi...
7,Trend,,Dimension,http://gss-data.org.uk/def/concept-scheme/trend
8,Standard Population,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-...
9,County & UA (pre Apr2019) deprivation deciles ...,,Dimension,http://gss-data.org.uk/def/concept-scheme/coun...


**Mapping between the transformed dimensions (components) and the master reference data is completed manually.**


In [330]:
printmd("\n" + "="*115, colour='Grey')
printmd('**Please use Caution!:**' + ' Displayed below are the current components as defined by automation and manual [Entry Type] assignment.', colour='Red')
printmd('*Through code you may need to map these components with those in the master reference data repository (currently .csv files). See later code sections.*', colour='Red')
printmd("\n" + "="*115, colour='Grey')
for key in dataframe_elements_collection.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('Revised Component DataFrame for: ' + key + '.', colour='Blue')
    printmd("="*100, colour='Grey')
    #print(dataframe_elements_collection[key]) #Print like this for Logs...  
    display(dataframe_elements_collection[key])
    i += 1
    

<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Red'>**Please use Caution!:** Displayed below are the current components as defined by automation and manual [Entry Type] assignment.</span>

<span style='color:Red'>*Through code you may need to map these components with those in the master reference data repository (currently .csv files). See later code sections.*</span>

<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Country level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
4,Life Expectancy (LE)_,Measure
5,LE Lower CI_,Observation
6,LE Upper CI_,Observation
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Country level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,age group,Dimension
4,Healthy Life Expectancy (HLE) _,Measure
5,HLE Lower CI_,Observation
6,HLE Upper CI_,Observation
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Region level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_LE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
4,Life Expectancy (LE)_,Measure
5,LE Lower CI_,Observation
6,LE Upper CI_,Observation
7,TransformationType,Dimension


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Revised Component DataFrame for: HE - Region level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_HLE,Entry Type
0,Period,Dimension
1,Code,Dimension
2,Sex,Dimension
3,Age group,Dimension
4,Healthy Life Expectancy (HLE) _,Measure
5,HLE Lower CI_,Observation
6,HLE Upper CI_,Observation
7,TransformationType,Dimension


In [591]:
# Automated mapping of transformed components with the master reference data:
dataframe_mapped_elements_collection = {}
dataframe_mapped_elements_collection_errors = {}

# Prepare mapping in memory component:
for key in dataframe_elements_collection.keys(): 
    dataframe_mapped_elements_collection[key] = dataframe_elements_collection[key].copy()
    #dataframe_mapped_elements_collection[key]['Linked Component'] = 'default-null'
    for cols in dataframe_mapped_elements_collection[key][key]:
        idx_temp = dataframe_mapped_elements_collection[key].index[dataframe_mapped_elements_collection[key][key] == cols]
        dataframe_mapped_elements_collection[key].loc[idx_temp, 'REFColumnsCSV Link'] = cols #slug.slug(cols) Not matched on Slugilzed as first believed!
    
# Automated first pass assignment / mapping:
for key in dataframe_elements_collection.keys():
    df_collection_point, df_collection_point_errors = align_REFdata_with_Transform(dataframe_mapped_elements_collection[key], df_ref_repo_columns, df_ref_repo_components)
    dataframe_mapped_elements_collection[key] = df_collection_point
    dataframe_mapped_elements_collection_errors[key] = df_collection_point_errors

    
printmd("\n" + "="*115, colour='Red')
printmd('**WARNING - PLEASE ADDRESS NON-MATCHING REFERENCES!**', colour='Red')
printmd("="*115, colour='Red')
for key in dataframe_mapped_elements_collection_errors.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('UN-Mapped Component DataFrame for: ' + key + '.', colour='Red')
    printmd("="*100, colour='Grey')
    #print(dataframe_mapped_elements_collection[key]) #Print like this for Logs...  
    display(dataframe_mapped_elements_collection_errors[key])
    
    
printmd("\n" + "="*115, colour='Grey')
printmd('**Please use Caution!:**' + ' Displayed below are the current components as mapped and linked through code.', colour='Green')
printmd('*Check carefully before proceeding.*', colour='Green')
printmd("\n" + "="*115, colour='Grey')
for key in dataframe_mapped_elements_collection.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('Mapped Component DataFrame for: ' + key + '.', colour='Green')
    printmd("="*100, colour='Grey')
    #print(dataframe_mapped_elements_collection[key]) #Print like this for Logs...  
    display(dataframe_mapped_elements_collection[key])
    

<span style='color:Red'>
===================================================================================================================</span>

<span style='color:Red'>**WARNING - PLEASE ADDRESS NON-MATCHING REFERENCES!**</span>

<span style='color:Red'>===================================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Red'>UN-Mapped Component DataFrame for: HE - Country level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_LE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,_merge
1,Code,Dimension,Code,,,,,,,,,,left_only
3,age group,Dimension,age group,,,,,,,,,,left_only
4,Life Expectancy (LE)_,Measure,Life Expectancy (LE)_,,,,,,,,,,left_only
5,LE Lower CI_,Observation,LE Lower CI_,,,,,,,,,,left_only
6,LE Upper CI_,Observation,LE Upper CI_,,,,,,,,,,left_only
7,TransformationType,Dimension,TransformationType,,,,,,,,,,left_only


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Red'>UN-Mapped Component DataFrame for: HE - Country level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_HLE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,_merge
1,Code,Dimension,Code,,,,,,,,,,left_only
3,age group,Dimension,age group,,,,,,,,,,left_only
4,Healthy Life Expectancy (HLE) _,Measure,Healthy Life Expectancy (HLE) _,,,,,,,,,,left_only
5,HLE Lower CI_,Observation,HLE Lower CI_,,,,,,,,,,left_only
6,HLE Upper CI_,Observation,HLE Upper CI_,,,,,,,,,,left_only
7,TransformationType,Dimension,TransformationType,,,,,,,,,,left_only


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Red'>UN-Mapped Component DataFrame for: HE - Region level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_LE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,_merge
1,Code,Dimension,Code,,,,,,,,,,left_only
3,Age group,Dimension,Age group,,,,,,,,,,left_only
4,Life Expectancy (LE)_,Measure,Life Expectancy (LE)_,,,,,,,,,,left_only
5,LE Lower CI_,Observation,LE Lower CI_,,,,,,,,,,left_only
6,LE Upper CI_,Observation,LE Upper CI_,,,,,,,,,,left_only
7,TransformationType,Dimension,TransformationType,,,,,,,,,,left_only


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Red'>UN-Mapped Component DataFrame for: HE - Region level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_HLE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,_merge
1,Code,Dimension,Code,,,,,,,,,,left_only
3,Age group,Dimension,Age group,,,,,,,,,,left_only
4,Healthy Life Expectancy (HLE) _,Measure,Healthy Life Expectancy (HLE) _,,,,,,,,,,left_only
5,HLE Lower CI_,Observation,HLE Lower CI_,,,,,,,,,,left_only
6,HLE Upper CI_,Observation,HLE Upper CI_,,,,,,,,,,left_only
7,TransformationType,Dimension,TransformationType,,,,,,,,,,left_only


<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Green'>**Please use Caution!:** Displayed below are the current components as mapped and linked through code.</span>

<span style='color:Green'>*Check carefully before proceeding.*</span>

<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Green'>Mapped Component DataFrame for: HE - Country level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_LE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Green'>Mapped Component DataFrame for: HE - Country level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_HLE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Green'>Mapped Component DataFrame for: HE - Region level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_LE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Green'>Mapped Component DataFrame for: HE - Region level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_HLE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex


### The next code section will / may require iteration until all elemenets are mapped:

In [593]:
# Manual mapping of transformed components with the master reference data:
# Iterate Code until all components are Mapped:


for key in dataframe_elements_collection.keys(): 
    dataframe_mapped_elements_collection[key] = dataframe_elements_collection[key].copy()
    
for key in dataframe_mapped_elements_collection.keys(): 
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'Code'] = 'ONS Geography'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'age group'] = 'ONS Age Range'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'Age group'] = 'ONS Age Range'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'LE Lower CI_'] = 'Lower CI'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'HLE Lower CI_'] = 'Lower CI'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'LE Upper CI_'] = 'Upper CI'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'HLE Upper CI_'] = 'Upper CI'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'Life Expectancy (LE)_'] = 'Value'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'Healthy Life Expectancy (HLE) _'] = 'Value'
    dataframe_mapped_elements_collection[key][key].loc[dataframe_mapped_elements_collection[key][key] == 'TransformationType'] = 'Life Expectancy Estimate Type'
    
    
for key in dataframe_mapped_elements_collection.keys():     
    for cols in dataframe_mapped_elements_collection[key][key]:
        idx_temp = dataframe_mapped_elements_collection[key].index[dataframe_mapped_elements_collection[key][key] == cols]
        dataframe_mapped_elements_collection[key].loc[idx_temp, 'REFColumnsCSV Link'] = cols #slug.slug(cols) Not matched on Slugilzed as first believed!


for key in dataframe_elements_collection.keys():
    df_collection_point, df_collection_point_errors = align_REFdata_with_Transform(dataframe_mapped_elements_collection[key], df_ref_repo_columns, df_ref_repo_components)
    dataframe_mapped_elements_collection[key] = df_collection_point
    dataframe_mapped_elements_collection_errors[key] = df_collection_point_errors

mapped_error_flag = True
for key in dataframe_mapped_elements_collection_errors.keys():
    for cols in dataframe_mapped_elements_collection[key][key]:
        if len(dataframe_mapped_elements_collection_errors[key]) > 0:
            mapped_error_flag = False
            printmd("\n" + "="*115, colour='Red')
            printmd('**WARNING - PLEASE ADDRESS NON-MATCHING REFERENCES!**', colour='Red')
            printmd("="*115, colour='Red')
            #for key in dataframe_mapped_elements_collection_errors.keys():
            printmd("\n" + "="*100, colour='Grey')
            printmd('UN-Mapped Component DataFrame for: ' + key + '.', colour='Red')
            printmd("="*100, colour='Grey')
            #print(dataframe_mapped_elements_collection[key]) #Print like this for Logs...  
            display(dataframe_mapped_elements_collection_errors[key])
if mapped_error_flag == True:
        printmd("\n" + "="*115, colour='Green')
        printmd('**ALL Data Entities Mapped! Safe to Proceed!**', colour='Green')
        printmd("="*115, colour='Green')    


printmd("\n" + "="*115, colour='Grey')
printmd('**Please use Caution!:**' + ' Displayed below are the current components as mapped and linked through code.', colour='Green')
printmd('*Check carefully before proceeding.*', colour='Green')
printmd("\n" + "="*115, colour='Grey')
for key in dataframe_mapped_elements_collection.keys():
    printmd("\n" + "="*100, colour='Grey')
    printmd('Mapped Component DataFrame for: ' + key + '.', colour='Blue')
    printmd("="*100, colour='Grey')
    #print(dataframe_mapped_elements_collection[key]) #Print like this for Logs...  
    display(dataframe_mapped_elements_collection[key])


<span style='color:Green'>
===================================================================================================================</span>

<span style='color:Green'>**ALL Data Entities Mapped! Safe to Proceed!**</span>

<span style='color:Green'>===================================================================================================================</span>

<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Green'>**Please use Caution!:** Displayed below are the current components as mapped and linked through code.</span>

<span style='color:Green'>*Check carefully before proceeding.*</span>

<span style='color:Grey'>
===================================================================================================================</span>

<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Mapped Component DataFrame for: HE - Country level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_LE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,ONS Geography,Dimension,ONS Geography,ONS Geography,ons_geography,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://statistics.data.gov.uk/id/statistical-g...,string,,[A-Z][0-9]{8},,,,,
2,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex
3,ONS Age Range,Dimension,ONS Age Range,ONS Age Range,ons_age_range,qb:dimension,http://gss-data.org.uk/def/dimension/ons-age-r...,http://gss-data.org.uk/def/concept/ons-age-ran...,string,slugize,,http://gss-data.org.uk/def/classes/ons-age-ran...,ONS Age Range,,Dimension,http://gss-data.org.uk/def/concept-scheme/ons-...
4,Value,Measure,Value,Value,value,,http://gss-data.org.uk/def/measure/{measure_type},,number,,,,,,,
5,Lower CI,Observation,Lower CI,Lower CI,lower_ci,qb:attribute,http://gss-data.org.uk/def/attribute/lower-ci,,number,,,,,,,
6,Upper CI,Observation,Upper CI,Upper CI,upper_ci,qb:attribute,http://gss-data.org.uk/def/attribute/upper-ci,,number,,,,,,,
7,Life Expectancy Estimate Type,Dimension,Life Expectancy Estimate Type,Life Expectancy Estimate Type,life_expectancy_estimate_type,qb:dimension,http://gss-data.org.uk/def/dimension/life-expe...,http://gss-data.org.uk/def/concept/life-expect...,string,slugize,,http://gss-data.org.uk/def/classes/life-expect...,Life Expectancy Estimate Type,,Dimension,http://gss-data.org.uk/def/concept-scheme/life...


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Mapped Component DataFrame for: HE - Country level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Country level estimates_HLE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,ONS Geography,Dimension,ONS Geography,ONS Geography,ons_geography,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://statistics.data.gov.uk/id/statistical-g...,string,,[A-Z][0-9]{8},,,,,
2,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex
3,ONS Age Range,Dimension,ONS Age Range,ONS Age Range,ons_age_range,qb:dimension,http://gss-data.org.uk/def/dimension/ons-age-r...,http://gss-data.org.uk/def/concept/ons-age-ran...,string,slugize,,http://gss-data.org.uk/def/classes/ons-age-ran...,ONS Age Range,,Dimension,http://gss-data.org.uk/def/concept-scheme/ons-...
4,Value,Measure,Value,Value,value,,http://gss-data.org.uk/def/measure/{measure_type},,number,,,,,,,
5,Lower CI,Observation,Lower CI,Lower CI,lower_ci,qb:attribute,http://gss-data.org.uk/def/attribute/lower-ci,,number,,,,,,,
6,Upper CI,Observation,Upper CI,Upper CI,upper_ci,qb:attribute,http://gss-data.org.uk/def/attribute/upper-ci,,number,,,,,,,
7,Life Expectancy Estimate Type,Dimension,Life Expectancy Estimate Type,Life Expectancy Estimate Type,life_expectancy_estimate_type,qb:dimension,http://gss-data.org.uk/def/dimension/life-expe...,http://gss-data.org.uk/def/concept/life-expect...,string,slugize,,http://gss-data.org.uk/def/classes/life-expect...,Life Expectancy Estimate Type,,Dimension,http://gss-data.org.uk/def/concept-scheme/life...


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Mapped Component DataFrame for: HE - Region level estimates_LE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_LE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,ONS Geography,Dimension,ONS Geography,ONS Geography,ons_geography,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://statistics.data.gov.uk/id/statistical-g...,string,,[A-Z][0-9]{8},,,,,
2,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex
3,ONS Age Range,Dimension,ONS Age Range,ONS Age Range,ons_age_range,qb:dimension,http://gss-data.org.uk/def/dimension/ons-age-r...,http://gss-data.org.uk/def/concept/ons-age-ran...,string,slugize,,http://gss-data.org.uk/def/classes/ons-age-ran...,ONS Age Range,,Dimension,http://gss-data.org.uk/def/concept-scheme/ons-...
4,Value,Measure,Value,Value,value,,http://gss-data.org.uk/def/measure/{measure_type},,number,,,,,,,
5,Lower CI,Observation,Lower CI,Lower CI,lower_ci,qb:attribute,http://gss-data.org.uk/def/attribute/lower-ci,,number,,,,,,,
6,Upper CI,Observation,Upper CI,Upper CI,upper_ci,qb:attribute,http://gss-data.org.uk/def/attribute/upper-ci,,number,,,,,,,
7,Life Expectancy Estimate Type,Dimension,Life Expectancy Estimate Type,Life Expectancy Estimate Type,life_expectancy_estimate_type,qb:dimension,http://gss-data.org.uk/def/dimension/life-expe...,http://gss-data.org.uk/def/concept/life-expect...,string,slugize,,http://gss-data.org.uk/def/classes/life-expect...,Life Expectancy Estimate Type,,Dimension,http://gss-data.org.uk/def/concept-scheme/life...


<span style='color:Grey'>
====================================================================================================</span>

<span style='color:Blue'>Mapped Component DataFrame for: HE - Region level estimates_HLE.</span>

<span style='color:Grey'>====================================================================================================</span>

Unnamed: 0,HE - Region level estimates_HLE,Entry Type,REFColumnsCSV Link,title,name,component_attachment,property_template,value_template,datatype,value_transformation,regex,range,Label,Description,Component Type,Codelist
0,Period,Dimension,Period,Period,period,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://reference.data.gov.uk/id/{+period},string,,^(year/[0-9]{4}|gregorian-interval/.*|month/[0...,http://reference.data.gov.uk/def/intervals/Int...,,,,
1,ONS Geography,Dimension,ONS Geography,ONS Geography,ons_geography,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://statistics.data.gov.uk/id/statistical-g...,string,,[A-Z][0-9]{8},,,,,
2,Sex,Dimension,Sex,Sex,sex,qb:dimension,http://purl.org/linked-data/sdmx/2009/dimensio...,http://purl.org/linked-data/sdmx/2009/code#sex...,string,,^(M|F|T|U|N)$,http://purl.org/linked-data/sdmx/2009/code#Sex,Sex,,Dimension,http://gss-data.org.uk/def/concept-scheme/phe-sex
3,ONS Age Range,Dimension,ONS Age Range,ONS Age Range,ons_age_range,qb:dimension,http://gss-data.org.uk/def/dimension/ons-age-r...,http://gss-data.org.uk/def/concept/ons-age-ran...,string,slugize,,http://gss-data.org.uk/def/classes/ons-age-ran...,ONS Age Range,,Dimension,http://gss-data.org.uk/def/concept-scheme/ons-...
4,Value,Measure,Value,Value,value,,http://gss-data.org.uk/def/measure/{measure_type},,number,,,,,,,
5,Lower CI,Observation,Lower CI,Lower CI,lower_ci,qb:attribute,http://gss-data.org.uk/def/attribute/lower-ci,,number,,,,,,,
6,Upper CI,Observation,Upper CI,Upper CI,upper_ci,qb:attribute,http://gss-data.org.uk/def/attribute/upper-ci,,number,,,,,,,
7,Life Expectancy Estimate Type,Dimension,Life Expectancy Estimate Type,Life Expectancy Estimate Type,life_expectancy_estimate_type,qb:dimension,http://gss-data.org.uk/def/dimension/life-expe...,http://gss-data.org.uk/def/concept/life-expect...,string,slugize,,http://gss-data.org.uk/def/classes/life-expect...,Life Expectancy Estimate Type,,Dimension,http://gss-data.org.uk/def/concept-scheme/life...


### Stage 7: Creating the Meta-Data files (Reference Data):

**This next section is currently very hacked together using file parsing etc. It requires 'proper' integration with the scraper component for the meta data!!!**

In [745]:
# We'll just have a string builder (string append) in the absence of a 'proper' component for the moment:

json_metadata_string = ('{ \n"@context": "http://www.w3.org/ns/csvw", ') # {"@language": "en"}') # \n],')

# Yep, it's more hacked up than a 90s teen-slasher-flick. Apologies my friends for this horrid code:

# WTF is this Martyn? I honestly thought better of myself:
str_temp_metadata = str(distribution._properties_metadata)
str_temp_metadata = mytext.replace("\'", "\"")
str_temp_metadata_pt2 = mytext.split('"')

str_parsed_meta = []
for i in range(len(str_temp_metadata_pt2)):
    if i % 2 != 0:
        str_parsed_meta.append(str_temp_metadata_pt2[i])

str_parsed_meta_links = {}
for i in range(len(str_parsed_meta)):
    if (str_parsed_meta[i][:7] != 'http://') and (str_parsed_meta[i+1][:7] == 'http://'):
        str_parsed_meta_links[str_parsed_meta[i]] = str_parsed_meta[i+1]

# Bringing shame on one self:
hacked_string_baby = ''
for key in str_parsed_meta_links.keys():
    #printmd('**' + key + ':**', colour='Red')
    #printmd('*---> '+ str_parsed_meta_links[key] + '.*', colour='Blue')
    hacked_string_baby = hacked_string_baby + ('\n"' + str(key).rstrip() + '": ' + '"' + str(str_parsed_meta_links[key]).rstrip() + '", ').rstrip()

hacked_string_baby = hacked_string_baby + '\n"url: "' + 'INSERT KEY FROM LOOP IDIOT' + '",'
hacked_string_baby = hacked_string_baby + '\ntableSchema": { \n"columns": ['

json_metadata_string = json_metadata_string + hacked_string_baby
print(json_metadata_string)


{ 
"@context": "http://www.w3.org/ns/csvw", 
"label": "http://www.w3.org/2000/01/rdf-schema#label",
"comment": "http://www.w3.org/2000/01/rdf-schema#comment",
"title": "http://purl.org/dc/terms/title",
"description": "http://purl.org/dc/terms/description",
"issued": "http://purl.org/dc/terms/issued",
"modified": "http://purl.org/dc/terms/modified",
"license": "http://purl.org/dc/terms/license",
"rights": "http://purl.org/dc/terms/rights",
"accessURL": "http://www.w3.org/ns/dcat#accessURL",
"downloadURL": "http://www.w3.org/ns/dcat#downloadURL",
"mediaType": "http://www.w3.org/ns/dcat#mediaType",
"byteSize": "http://www.w3.org/ns/dcat#byteSize",
"checksum": "http://spdx.org/rdf/terms#checksum",
"language": "http://purl.org/dc/terms/language",
"url: "INSERT KEY FROM LOOP IDIOT",
tableSchema": { 
"columns": [


In [746]:
# You should get fired for this fugly code:
scary_movie_string = ''


for idx, val in enumerate(dataframe_mapped_elements_collection['HE - Country level estimates_LE'].itertuples()):
    scary_movie_string = scary_movie_string + ('\n{ \n"title": ' \
                        + str(dataframe_mapped_elements_collection['HE - Country level estimates_LE'].title[idx]).rstrip() \
                        + '",')
    scary_movie_string = scary_movie_string + ('\n"name": ' \
                        + str(dataframe_mapped_elements_collection['HE - Country level estimates_LE'].name[idx]).rstrip() \
                        + '",')
    if pd.notna(dataframe_mapped_elements_collection['HE - Country level estimates_LE'].regex[idx]):
        scary_movie_string = scary_movie_string + ('\n"datatype": {"format": "' \
                            + str(dataframe_mapped_elements_collection['HE - Country level estimates_LE'].regex[idx]).rstrip() \
                            + '"},')
    scary_movie_string = scary_movie_string + ('\n"required": true \n},')
              
json_metadata_string = json_metadata_string + scary_movie_string[:-1]
json_metadata_string = json_metadata_string + ('\n]')

print(json_metadata_string)


{ 
"@context": "http://www.w3.org/ns/csvw", 
"label": "http://www.w3.org/2000/01/rdf-schema#label",
"comment": "http://www.w3.org/2000/01/rdf-schema#comment",
"title": "http://purl.org/dc/terms/title",
"description": "http://purl.org/dc/terms/description",
"issued": "http://purl.org/dc/terms/issued",
"modified": "http://purl.org/dc/terms/modified",
"license": "http://purl.org/dc/terms/license",
"rights": "http://purl.org/dc/terms/rights",
"accessURL": "http://www.w3.org/ns/dcat#accessURL",
"downloadURL": "http://www.w3.org/ns/dcat#downloadURL",
"mediaType": "http://www.w3.org/ns/dcat#mediaType",
"byteSize": "http://www.w3.org/ns/dcat#byteSize",
"checksum": "http://spdx.org/rdf/terms#checksum",
"language": "http://purl.org/dc/terms/language",
"url: "INSERT KEY FROM LOOP IDIOT",
tableSchema": { 
"columns": [
{ 
"title": Period",
"name": period",
"datatype": {"format": "^(year/[0-9]{4}|gregorian-interval/.*|month/[0-9]{4}-[0-9]{2}|day/[0-9]{4}-[0-9]{2}-[0-9]{2}|quarter/[0-9]{4}-Q[1-4]|gov

In [748]:
# Drop JSON to disc...
write_to_file_JSONmetadata = open(slug.slug("HE - Country level estimates_LE--metadata") + '.json', "w")
write_to_file_JSONmetadata.write(json_metadata_string)
write_to_file_JSONmetadata.close()

#dataframe_collection[key] # Already output to csv...


In [763]:
test = open('/Users/martyn/Python Notebook Experiments/he-country-level-estimates_le.csv', "r")
print(test)
test.read()


<_io.TextIOWrapper name='/Users/martyn/Python Notebook Experiments/he-country-level-estimates_le.csv' mode='r' encoding='UTF-8'>


'Period,Code,Sex,age group,Life Expectancy (LE)_,LE Lower CI_,LE Upper CI_,TransformationType\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,lessthan1,78.78073,78.75026,78.8112,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,01-04,78.17075,78.14207,78.19942,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,05-09,74.22782,74.19942,74.25623,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,10-14,69.26178,69.23354,69.29002,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,15-19,64.29632,64.26823,64.3244,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,20-24,59.40272,59.37503,59.43041,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,25-29,54.5599,54.53272,54.58708,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,30-34,49.72706,49.70037,49.75375,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,35-39,44.92366,44.89751,44.9498,LE\ngregorian-interval/2009-03-31T00:00:00/P3Y,E92000001,M,40-44,40.18422,40.1586

In [768]:
myThing = CSVWConverter.to_rdf('File:///Users/martyn/Python Notebook Experiments/he-country-level-estimates_le.csv', format='ttl')


InvalidSchema: No connection adapters were found for 'File:///Users/martyn/Python Notebook Experiments/he-country-level-estimates_le.csv'

In [754]:
myThing = CSVWConverter.to_rdf('http://w3c.github.io/csvw/tests/tree-ops.csv', format='ttl')
#CSVWConverter.to_rdf('he-country-level-estimates_le.csv', metadata_url='he-country-level-estimates_le-metadata.json', format='ttl')

In [755]:
myThing

'@prefix : <http://w3c.github.io/csvw/tests/tree-ops.csv#> .\n@prefix as: <https://www.w3.org/ns/activitystreams#> .\n@prefix cc: <http://creativecommons.org/ns#> .\n@prefix csvw: <http://www.w3.org/ns/csvw#> .\n@prefix ctag: <http://commontag.org/ns#> .\n@prefix dc: <http://purl.org/dc/terms/> .\n@prefix dc11: <http://purl.org/dc/elements/1.1/> .\n@prefix dcat: <http://www.w3.org/ns/dcat#> .\n@prefix dcterms: <http://purl.org/dc/terms/> .\n@prefix dqv: <http://www.w3.org/ns/dqv#> .\n@prefix duv: <https://www.w3.org/TR/vocab-duv#> .\n@prefix foaf: <http://xmlns.com/foaf/0.1/> .\n@prefix gr: <http://purl.org/goodrelations/v1#> .\n@prefix grddl: <http://www.w3.org/2003/g/data-view#> .\n@prefix ical: <http://www.w3.org/2002/12/cal/icaltzd#> .\n@prefix ldp: <http://www.w3.org/ns/ldp#> .\n@prefix ma: <http://www.w3.org/ns/ma-ont#> .\n@prefix ns1: <http://w3c.github.io/csvw/tests/tree-ops.csv#Inventory%20> .\n@prefix ns2: <http://w3c.github.io/csvw/tests/tree-ops.csv#Trim%20> .\n@prefix ns3:

In [661]:
str_temp_metadata = str(distribution._properties_metadata)
str_temp_metadata = mytext.replace("\'", "\"")
str_temp_metadata_pt2 = mytext.split('"')

str_parsed_meta = []
for i in range(len(str_temp_metadata_pt2)):
    if i % 2 != 0:
        str_parsed_meta.append(str_temp_metadata_pt2[i])

str_parsed_meta_links = {}
for i in range(len(str_parsed_meta)):
    if (str_parsed_meta[i][:7] != 'http://') and (str_parsed_meta[i+1][:7] == 'http://'):
        str_parsed_meta_links[str_parsed_meta[i]] = str_parsed_meta[i+1]
 
hacked_string_baby = ''
for key in str_parsed_meta_links.keys():
    printmd('**' + key + ':**', colour='Red')
    printmd('*---> '+ str_parsed_meta_links[key] + '.*', colour='Blue')
    hacked_string_baby = hacked_string_baby + ('"' + str(key) + '": ' + '"' + str(str_parsed_meta_links[key]) + '",')

hacked_string_baby = hacked_string_baby + '"url: "' + 'INSERT KEY FROM LOOP IDIOT' + '",'
print(hacked_string_baby)

<span style='color:Red'>**label:**</span>

<span style='color:Blue'>*---> http://www.w3.org/2000/01/rdf-schema#label.*</span>

<span style='color:Red'>**comment:**</span>

<span style='color:Blue'>*---> http://www.w3.org/2000/01/rdf-schema#comment.*</span>

<span style='color:Red'>**title:**</span>

<span style='color:Blue'>*---> http://purl.org/dc/terms/title.*</span>

<span style='color:Red'>**description:**</span>

<span style='color:Blue'>*---> http://purl.org/dc/terms/description.*</span>

<span style='color:Red'>**issued:**</span>

<span style='color:Blue'>*---> http://purl.org/dc/terms/issued.*</span>

<span style='color:Red'>**modified:**</span>

<span style='color:Blue'>*---> http://purl.org/dc/terms/modified.*</span>

<span style='color:Red'>**license:**</span>

<span style='color:Blue'>*---> http://purl.org/dc/terms/license.*</span>

<span style='color:Red'>**rights:**</span>

<span style='color:Blue'>*---> http://purl.org/dc/terms/rights.*</span>

<span style='color:Red'>**accessURL:**</span>

<span style='color:Blue'>*---> http://www.w3.org/ns/dcat#accessURL.*</span>

<span style='color:Red'>**downloadURL:**</span>

<span style='color:Blue'>*---> http://www.w3.org/ns/dcat#downloadURL.*</span>

<span style='color:Red'>**mediaType:**</span>

<span style='color:Blue'>*---> http://www.w3.org/ns/dcat#mediaType.*</span>

<span style='color:Red'>**byteSize:**</span>

<span style='color:Blue'>*---> http://www.w3.org/ns/dcat#byteSize.*</span>

<span style='color:Red'>**checksum:**</span>

<span style='color:Blue'>*---> http://spdx.org/rdf/terms#checksum.*</span>

<span style='color:Red'>**language:**</span>

<span style='color:Blue'>*---> http://purl.org/dc/terms/language.*</span>

"label": "http://www.w3.org/2000/01/rdf-schema#label","comment": "http://www.w3.org/2000/01/rdf-schema#comment","title": "http://purl.org/dc/terms/title","description": "http://purl.org/dc/terms/description","issued": "http://purl.org/dc/terms/issued","modified": "http://purl.org/dc/terms/modified","license": "http://purl.org/dc/terms/license","rights": "http://purl.org/dc/terms/rights","accessURL": "http://www.w3.org/ns/dcat#accessURL","downloadURL": "http://www.w3.org/ns/dcat#downloadURL","mediaType": "http://www.w3.org/ns/dcat#mediaType","byteSize": "http://www.w3.org/ns/dcat#byteSize","checksum": "http://spdx.org/rdf/terms#checksum","language": "http://purl.org/dc/terms/language","url: "INSERT KEY FROM LOOP IDIOT",


In [622]:
https://www.w3.org/TR/tabular-metadata/ -->
    
{
  "@context": "http://www.w3.org/ns/csvw",
  "tables": [{
    "url": "http://example.org/countries.csv",
    "tableSchema": {
      "columns": [{
        "name": "countryCode",
        "datatype": "string",
        "propertyUrl": "http://www.geonames.org/ontology{#_name}"
      }, {
        "name": "latitude",
        "datatype": "number"
      }, {
        "name": "longitude",
        "datatype": "number"
      }, {
        "name": "name",
        "datatype": "string"
      }],
      "aboutUrl": "http://example.org/countries.csv{#countryCode}",
      "propertyUrl": "http://schema.org/{_name}",
      "primaryKey": "countryCode"
    }
  }, {
    "url": "http://example.org/country_slice.csv",
    "tableSchema": {
      "columns": [{
        "name": "countryRef",
        "valueUrl": "http://example.org/countries.csv{#countryRef}"
      }, {
        "name": "year",
        "datatype": "gYear"
      }, {
        "name": "population",
        "datatype": "integer"
      }],
      "foreignKeys": [{
        "columnReference": "countryRef",
        "reference": {
          "resource": "http://example.org/countries.csv",
          "columnReference": "countryCode"
        }
      }]
    }
  }]
}

SyntaxError: invalid syntax (<ipython-input-622-25e51c25b33f>, line 1)

In [454]:
type(distribution)

gssutils.metadata.Distribution

In [447]:
display(distribution)

In [452]:
print(distribution.title + ' --- ' + distribution.description + ' --- ' + str(distribution.issued))

Health state life expectancy, all ages, UK --- Pivot tables for health state life expectancy by sex and area type, divided by two-year intervals starting from 2009 to 2011. --- 2016-11-29


In [453]:
print(distribution.downloadURL + ' --- ' + distribution.mediaType)

https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/healthandlifeexpectancies/datasets/healthstatelifeexpectancyallagesuk/current/heestimates.xlsx --- application/vnd.ms-excel


In [493]:
#print(distribution.DCAT.Catalog)
display(scraper.catalog._properties_metadata)

{'label': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'comment': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#comment'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'title': (rdflib.term.URIRef('http://purl.org/dc/terms/title'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Catalog.<lambda>(s)>),
 'description': (rdflib.term.URIRef('http://purl.org/dc/terms/description'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Catalog.<lambda>(s)>),
 'issued': (rdflib.term.URIRef('http://purl.org/dc/terms/issued'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Catalog.<lambda>(d)>),
 'modified': (rdflib.term.URIRef('http://purl.org/dc/terms/modified'),
  <Status.recommended: 2>,
  <function gssutils.metadata.Catalog.<lambda>(d)>),
 'language': (rdflib.term.URIRef('http://purl.org/dc/terms/language'),
  <Status.mandatory

In [495]:
display(distribution._properties_metadata)

{'label': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'comment': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#comment'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'title': (rdflib.term.URIRef('http://purl.org/dc/terms/title'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(s)>),
 'description': (rdflib.term.URIRef('http://purl.org/dc/terms/description'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(s)>),
 'issued': (rdflib.term.URIRef('http://purl.org/dc/terms/issued'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(d)>),
 'modified': (rdflib.term.URIRef('http://purl.org/dc/terms/modified'),
  <Status.recommended: 2>,
  <function gssutils.metadata.Distribution.<lambda>(d)>),
 'license': (rdflib.term.URIRef('http://purl.org/dc/terms/license'),
 

In [503]:
display(scraper.dataset)

In [527]:
display(scraper.dataset.label)

'Health state life expectancy, all ages, UK'

In [549]:
display(distribution._properties_metadata)

{'label': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'comment': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#comment'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'title': (rdflib.term.URIRef('http://purl.org/dc/terms/title'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(s)>),
 'description': (rdflib.term.URIRef('http://purl.org/dc/terms/description'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(s)>),
 'issued': (rdflib.term.URIRef('http://purl.org/dc/terms/issued'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(d)>),
 'modified': (rdflib.term.URIRef('http://purl.org/dc/terms/modified'),
  <Status.recommended: 2>,
  <function gssutils.metadata.Distribution.<lambda>(d)>),
 'license': (rdflib.term.URIRef('http://purl.org/dc/terms/license'),
 

In [521]:
# some JSON:
x =  '{ "name":"John", "age":30, "city":"New York"}'

# parse x:
y = json.loads(x)

# the result is a Python dictionary:
print(y["name"])

John


In [583]:
mytext = str(distribution._properties_metadata)

mytext = mytext.replace("\'", "\"")

pprint(mytext)

mytext2 = mytext.split('"')

str_parsed_meta = []
for i in range(len(mytext2)):
    if i % 2 != 0: 
        print(str(i) + " - " + mytext2[i])
        str_parsed_meta.append(mytext2[i])

print(str_parsed_meta)

#print(mytext2[1])
#print(mytext2[3])
#print(mytext2[5])
#print(mytext2[7])
#y = json.loads(mytext)

# the result is a Python dictionary:
#print(y["name"])

('{"label": (rdflib.term.URIRef("http://www.w3.org/2000/01/rdf-schema#label"), '
 '<Status.mandatory: 1>, <function Metadata.<lambda> at 0x114d0e378>), '
 '"comment": '
 '(rdflib.term.URIRef("http://www.w3.org/2000/01/rdf-schema#comment"), '
 '<Status.mandatory: 1>, <function Metadata.<lambda> at 0x114d0e400>), '
 '"title": (rdflib.term.URIRef("http://purl.org/dc/terms/title"), '
 '<Status.mandatory: 1>, <function Distribution.<lambda> at 0x114cca730>), '
 '"description": (rdflib.term.URIRef("http://purl.org/dc/terms/description"), '
 '<Status.mandatory: 1>, <function Distribution.<lambda> at 0x114cca7b8>), '
 '"issued": (rdflib.term.URIRef("http://purl.org/dc/terms/issued"), '
 '<Status.mandatory: 1>, <function Distribution.<lambda> at 0x114cca840>), '
 '"modified": (rdflib.term.URIRef("http://purl.org/dc/terms/modified"), '
 '<Status.recommended: 2>, <function Distribution.<lambda> at 0x114cca8c8>), '
 '"license": (rdflib.term.URIRef("http://purl.org/dc/terms/license"), '
 '<Status.m

In [534]:
display(distribution._properties_metadata)


{'label': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'comment': (rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#comment'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Metadata.<lambda>(s)>),
 'title': (rdflib.term.URIRef('http://purl.org/dc/terms/title'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(s)>),
 'description': (rdflib.term.URIRef('http://purl.org/dc/terms/description'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(s)>),
 'issued': (rdflib.term.URIRef('http://purl.org/dc/terms/issued'),
  <Status.mandatory: 1>,
  <function gssutils.metadata.Distribution.<lambda>(d)>),
 'modified': (rdflib.term.URIRef('http://purl.org/dc/terms/modified'),
  <Status.recommended: 2>,
  <function gssutils.metadata.Distribution.<lambda>(d)>),
 'license': (rdflib.term.URIRef('http://purl.org/dc/terms/license'),
 

In [580]:
from rdflib import Namespace

n = Namespace("http://example.org/")
n.Person # as attribute
# = rdflib.term.URIRef(u'http://example.org/Person')

#n['first%20name'] # as item - for things that are not valid python identifiers
# = rdflib.term.URIRef(u'http://example.org/first%20name')

rdflib.term.URIRef('http://example.org/Person')

In [582]:
from rdflib import Namespace

n = Namespace("http://www.w3.org/2000/01/rdf-schema#label/")
n.Label # as attribute
# = rdflib.term.URIRef(u'http://example.org/Person')

#n['first%20name'] # as item - for things that are not valid python identifiers
# = rdflib.term.URIRef(u'http://example.org/first%20name')

rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#label/Label')

In [None]:
{
  "@context": [
    "http://www.w3.org/ns/csvw",
    {
      "@language": "en"
    }
  ],
  "tables": [
    {
      "url": "https://gss-cogs.github.io/family-disability/reference/codelists/ons-age-range.csv",
      "tableSchema": "https://gss-cogs.github.io/ref_common/codelist-schema.json",
      "suppressOutput": true
    },
    {
      "url": "https://gss-cogs.github.io/family-disability/reference/codelists/life-expectancy-estimate-type.csv",
      "tableSchema": "https://gss-cogs.github.io/ref_common/codelist-schema.json",
      "suppressOutput": true
    },
    {
      "url": "he-country-level-estimates.csv",
      "tableSchema": {
        "columns": [
          {
            "titles": "Period",
            "required": true,
            "name": "period",
            "datatype": {
              "format": "^(year/[0-9]{4}|gregorian-interval/.*|month/[0-9]{4}-[0-9]{2}|day/[0-9]{4}-[0-9]{2}-[0-9]{2}|quarter/[0-9]{4}-Q[1-4]|government-year/[0-9]{4}-[0-9]{4})$"
            }
          },
          {
            "titles": "ONS Geography",
            "required": true,
            "name": "ons_geography",
            "datatype": {
              "format": "[A-Z][0-9]{8}"
            }
          },
          {
            "titles": "Sex",
            "required": true,
            "name": "sex",
            "datatype": {
              "format": "^(M|F|T|U|N)$"
            }
          },
          {
            "titles": "ONS Age Range",
            "required": true,
            "name": "ons_age_range",
            "datatype": "string"
          },
          {
            "titles": "Life Expectancy Estimate Type",
            "required": true,
            "name": "life_expectancy_estimate_type",
            "datatype": "string"
          },
          {
            "titles": "Lower CI",
            "required": false,
            "name": "lower_ci",
            "datatype": "number"
          },
          {
            "titles": "Upper CI",
            "required": false,
            "name": "upper_ci",
            "datatype": "number"
          },
          {
            "titles": "Value",
            "required": false,
            "name": "value",
            "datatype": "number"
          },
          {
            "titles": "Measure Type",
            "required": true,
            "name": "measure_type",
            "datatype": "string"
          }
        ],
        "foreignKeys": [
          {
            "columnReference": "ons_age_range",
            "reference": {
              "resource": "https://gss-cogs.github.io/family-disability/reference/codelists/ons-age-range.csv",
              "columnReference": "notation"
            }
          },
          {
            "columnReference": "life_expectancy_estimate_type",
            "reference": {
              "resource": "https://gss-cogs.github.io/family-disability/reference/codelists/life-expectancy-estimate-type.csv",
              "columnReference": "notation"
            }
          }
        ],
        "primaryKey": [
          "period",
          "ons_geography",
          "sex",
          "ons_age_range",
          "life_expectancy_estimate_type",
          "measure_type"
        ]
      }
    }
  ]
}

In [None]:
{
  "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}],
  "url": "tree-ops.csv",
  "dc:title": "Tree Operations",
  "dcat:keyword": ["tree", "street", "maintenance"],
  "dc:publisher": {
    "schema:name": "Example Municipality",
    "schema:url": {"@id": "http://example.org"}
  },
  "dc:license": {"@id": "http://opendefinition.org/licenses/cc-by/"},
  "dc:modified": {"@value": "2010-12-31", "@type": "xsd:date"},
  "tableSchema": {
    "columns": [{
      "name": "GID",
      "titles": ["GID", "Generic Identifier"],
      "dc:description": "An identifier for the operation on a tree.",
      "datatype": "string",
      "required": true
    }, {
      "name": "on_street",
      "titles": "On Street",
      "dc:description": "The street that the tree is on.",
      "datatype": "string"
    }, {
      "name": "species",
      "titles": "Species",
      "dc:description": "The species of the tree.",
      "datatype": "string"
    }, {
      "name": "trim_cycle",
      "titles": "Trim Cycle",
      "dc:description": "The operation performed on the tree.",
      "datatype": "string"
    }, {
      "name": "inventory_date",
      "titles": "Inventory Date",
      "dc:description": "The date of the operation that was performed.",
      "datatype": {"base": "date", "format": "M/d/yyyy"}
    }],
    "primaryKey": "GID",
    "aboutUrl": "#gid-{GID}"
  }
}

In [None]:
{
  "@context": "http://www.w3.org/ns/csvw",
  "rdfs:comment": "The supported date and time formats listed here are expressed in terms of the date field symbols defined in [UAX35] and MUST be interpreted by implementations as defined in that specification.",
  "rdfs:label": "date format (valid date combinations with formats)",
  "url": "test188.csv",
  "tableSchema": {
    "columns": [
      {"titles": "yyyy-MM-dd", "datatype": {"base": "date", "format": "yyyy-MM-dd"}},
      {"titles": "yyyyMMdd",   "datatype": {"base": "date", "format": "yyyyMMdd"}},
      {"titles": "dd-MM-yyyy", "datatype": {"base": "date", "format": "dd-MM-yyyy"}},
      {"titles": "d-M-yyyy",   "datatype": {"base": "date", "format": "d-M-yyyy"}},
      {"titles": "MM-dd-yyyy", "datatype": {"base": "date", "format": "MM-dd-yyyy"}},
      {"titles": "M-d-yyyy",   "datatype": {"base": "date", "format": "M-d-yyyy"}},
      {"titles": "dd/MM/yyyy", "datatype": {"base": "date", "format": "dd/MM/yyyy"}},
      {"titles": "d/M/yyyy",   "datatype": {"base": "date", "format": "d/M/yyyy"}},
      {"titles": "MM/dd/yyyy", "datatype": {"base": "date", "format": "MM/dd/yyyy"}},
      {"titles": "M/d/yyyy",   "datatype": {"base": "date", "format": "M/d/yyyy"}},
      {"titles": "dd.MM.yyyy", "datatype": {"base": "date", "format": "dd.MM.yyyy"}},
      {"titles": "d.M.yyyy",   "datatype": {"base": "date", "format": "d.M.yyyy"}},
      {"titles": "MM.dd.yyyy", "datatype": {"base": "date", "format": "MM.dd.yyyy"}},
      {"titles": "M.d.yyyy",   "datatype": {"base": "date", "format": "M.d.yyyy"}},
      {"titles": "yyyy-MM-ddX",      "datatype": {"base": "date", "format": "yyyy-MM-ddX"}},
      {"titles": "dd.MM.yyyy XXX", "datatype": {"base": "date", "format": "dd.MM.yyyy XXX"}}
    ]
  }
}

In [None]:
#Suggestions in the scan?

In [None]:
#Create json - run rdf thingy component!

In [185]:
import json

jsondata = {}
agent={}
content={}
agent['agentid'] = 'john'
content['eventType'] = 'view'
content['othervar'] = "new"

jsondata['agent'] = agent
jsondata['content'] = content
print(json.dumps(jsondata,indent=4))

{
    "agent": {
        "agentid": "john"
    },
    "content": {
        "eventType": "view",
        "othervar": "new"
    }
}


In [None]:
#Data markers and video