# Challenge 3 - Employment and Skills

This notebook demonstrates the use of the Python recipe wrapper to create a basic data pack that you can use to get you started with the GLA challenge of Employment and Skills. If you want to know more on the Challenge you can visit our [Tombolo website](http://www.tombolo.org.uk/greater-london-authority/). Don't forget that you can use the [City Data Explorer](https://tombolo-staging.emu-analytics.net) web app to visualise and style your results.


## Some Background

**The Tombolo project** is a Future Cities Catapult project funded by InnovateUK. It is a research and development project focused on understanding the value of data to unlock the potential of our cities. A big part of the Tombolo project is  the [Digital Connector](http://www.tombolo.org.uk/products/), an open source piece of software for Data Scientists to import and combine datasets into a standard format and model. You can visit the project on [Github](https://github.com/FutureCitiesCatapult/TomboloDigitalConnector) to learn some background as well as instructions on how to use it.



## The goal

We will use the Python recipe implementation to tell Digital Connector to fetch some Social Isolation data for Barking and dagenham.

The geographical unit of measurement for our exports (the ***Subject*** in DC language) will be the Local Super Output Area (LSOA).

The data that we will be fetching are:
* ONS data on employment/unemployment
* ONS data on Business Demography
* Data on employment seekers allowance
* Data on Gross Annual Income

**Please note that the above datasources are only indicative! You should think more holistic in order to tackle this challenge!**

Our output will be a GeoJson file GLA's local authorities along with the attributes of interest. Feel free to play around with the code, explore the DC and download more resources that will help you tackle the Challenge!

### Lets get started

First, we import some libraries that we will be using as well as the recipe.py file
that contains all the classes necessary to build our recipes

In [93]:
import os
from pathlib import Path

home_dir = str(Path.home())
tdc = os.path.join(home_dir, 'Desktop/python_library_dc/digital-connector-python')
digital_connector = os.path.join(home_dir, 'Desktop/UptodateProject/TomboloDigitalConnector')
os.chdir(tdc)

In [101]:
from recipe import Recipe, Subject, Dataset, Geo_Match_Rule, Match_Rule, Datasource, GeographicAggregationField, FixedValueField, AttributeMatcherField, AttributeMatcher, LatestValueField, MapToContainingSubjectField, BackOffField, PercentilesField, LinearCombinationField

The first thing we need to do is to create a **Subject**. This represents the core geometry on which all our operations will be based on. It also specifies the export geometry of our final geojson file. We are using *localAuthority* and a **match_rule** to filter out all local authorities not belonging to Greater London Area.

In [102]:
subject_geometry = Subject(subject_type_label='localAuthority', provider_label='uk.gov.ons', 
                           match_rule=Match_Rule(attribute_to_match_on='label', pattern='E0900%'))

Next, we need to define our **Datasource**. This will tell DC what data to download. For more information on DC importers and datasource_id's consult the [catalogue.json](https://github.com/FutureCitiesCatapult/TomboloDigitalConnector/blob/master/src/main/resources/catalogue.json) or use the terminal 

**gradle info -Pi= *name_of_the_class***

In [None]:
localAuthority = Datasource(importer_class='uk.org.tombolo.importer.ons.OaImporter',
                            datasource_id='localAuthority')

englandGeneralisedBoundaries = Datasource(importer_class='uk.org.tombolo.importer.ons.OaImporter' ,
                                          datasource_id='englandBoundaries')

NOMISIncome = Datasource(datasource_id='ONSGrossAnnualIncome',
                         importer_class='uk.org.tombolo.importer.ons.ONSEmploymentImporter')

ONSBusiness = Datasource(datasource_id='ONSBusiness',
                        importer_class='uk.org.tombolo.importer.ons.ONSBusinessDemographyImporter')

NOMISJobs = Datasource(datasource_id='ONSJobsDensity',
                      importer_class='uk.org.tombolo.importer.ons.ONSEmploymentImporter')

NOMISEmployment = Datasource(datasource_id='APSEmploymentRate',
                            importer_class='uk.org.tombolo.importer.ons.ONSEmploymentImporter')

NOMISUnEmployment = Datasource(datasource_id='APSUnemploymentRate',
                            importer_class='uk.org.tombolo.importer.ons.ONSEmploymentImporter')

NOMISBenefits = Datasource(datasource_id='ESAclaimants',
                          importer_class='uk.org.tombolo.importer.ons.ONSEmploymentImporter')

PopulationDensity = Datasource(datasource_id='qs102ew', 
                              importer_class='uk.org.tombolo.importer.ons.CensusImporter')


importers_list = [localAuthority,englandGeneralisedBoundaries, NOMISIncome, ONSBusiness, NOMISJobs,
                  NOMISEmployment, NOMISUnEmployment,NOMISBenefits, PopulationDensity]



Now that we defined the datasources we need to tell the DC which attributes to fetch from the database. To do that we create **AttributeMatcher** fields for all the attributes of interest. Having specified the attributes that we will be using, we now need to use them within DC's **Fields**. There are numerous fields each one with its own unique properties. Please consult [DC's github repo](https://github.com/FutureCitiesCatapult/TomboloDigitalConnector/blob/master/documentation/fields-and-models.md) for more information on fields.

In [105]:

### Fields ###

### Defining our attributes and passing them to fields ###

### Unemployment 

unemployment_attribute = AttributeMatcher(label='APSUnemploymentRate',
                                                   provider='uk.gov.ons')
unemployment = LatestValueField(attribute_matcher=unemployment_attribute,
                                                label='APSUnemploymentRate')

### Employment 

employment_attribute = AttributeMatcher(label='APSEmploymentRate',
                                                   provider='uk.gov.ons')
employment = LatestValueField(attribute_matcher=employment_attribute,
                                                label='APSEmploymentRate')

### Claiming allowance 

claimants_attribute = AttributeMatcher(label='ESAclaimants',
                                                   provider='uk.gov.ons')
claimants = LatestValueField(attribute_matcher=claimants_attribute,
                                                label='ESAclaimants')


### Tranforming them to percentiles after taking care of the missing values ###

fields = ['unemployment','employment', 'claimants']

f={}
for i in fields:
    f['geo_{0}'.format(i)] = GeographicAggregationField(subject=subject_geometry,
                                                           field=eval(('{0}').format(i)),
                                                           function='mean',
                                                           label='geo_{0}'.format(i))


    f['map_{0}'.format(i)] = MapToContainingSubjectField(field=f['geo_{0}'.format(i)],
                                                                   subject=Subject(subject_type_label='englandBoundaries',
                                                                                  provider_label='uk.gov.ons'),
                                                                   label='map_{0}'.format(i))

    f['backoff_{0}'.format(i)] = BackOffField(fields=[eval(('{0}').format(i)),
                                                             f['map_{0}'.format(i)]],
                                         label='backoff_{0}'.format(i))
    if i == 'employment':
        f['percentile_{0}'.format(i)] = PercentilesField(field=f['backoff_{0}'.format(i)],
                                                         inverse=False,
                                                         percentile_count=10,
                                                         normalization_subjects=[subject_geometry],
                                                         label='percentile_{0}'.format(i))
    else:
        f['percentile_{0}'.format(i)] = PercentilesField(field=f['backoff_{0}'.format(i)],
                                                         inverse=True,
                                                         percentile_count=10,
                                                         normalization_subjects=[subject_geometry],
                                                         label='percentile_{0}'.format(i))        


### Combining the resulting fields with a LinearCombinationField and convering the result to percentiles ###

combined_employment = LinearCombinationField(fields=[f['percentile_claimants'],
                                                 f['percentile_employment'],
                                                 f['percentile_unemployment']],
                                         scalars = [1.,1.,1.],
                                         label='Unemployment lower than the East London average')

percentile_combined_employment = PercentilesField(field=combined_employment,
                                                     inverse=False,
                                                     label='unemployment',
                                                     percentile_count=10,
                                                     normalization_subjects=[subject_geometry])


Execution completed Successfully!!!!


Now we are in a good shape to run our recipe!

In [None]:
### Run the exporter and plot the result ###

importers = [localAuthority,englandGeneralisedBoundaries,
            NOMISEmployment,NOMISUnEmployment,NOMISBenefits]

dataset = Dataset(subjects=[subject_geometry], fields=[f['percentile_claimants'], 
                                                       f['percentile_employment'], f['percentile_unemployment']],
                  datasources=importers)

recipe = Recipe(dataset,timestamp=False)
recipe.build_recipe(console_print=False)

recipe.run_recipe(tombolo_path=digital_connector,
                  output_path = 'Desktop/employment_and_skills.json', console_print=False)

Now lets view the results using geopandas

In [106]:
import geopandas as gpd

In [107]:
data = gpd.read_file(home_dir + '/Desktop/employment_and_skills.json')
data.head()

Unnamed: 0,name,percentile_employment,label,percentile_unemployment,percentile_claimants,geometry
0,City of London,4.0,E09000001,6.0,10.0,"POLYGON ((-0.0968 51.5233, -0.0964 51.5228, -0..."
1,Barking and Dagenham,1.0,E09000002,1.0,6.0,"(POLYGON ((0.1482 51.5968, 0.1481 51.5964, 0.1..."
2,Barnet,4.0,E09000003,10.0,4.0,"POLYGON ((-0.199 51.6682, -0.1966 51.6681, -0...."
3,Bexley,5.0,E09000004,8.0,8.0,"(POLYGON ((0.1439 51.5077, 0.1475 51.5066, 0.1..."
4,Brent,2.0,E09000005,3.0,3.0,"POLYGON ((-0.2671 51.6004, -0.2597 51.5942, -0..."
