In [1]:
import urllib3
urllib3.disable_warnings() # disable InsecureRequestWarning if using verify=False

# Demo of the census21api module 

In order to use the census21api tool you will first have to 

```bash
$ python -m pip install census21api@git+https://github.com/datasciencecampus/census21api
```

alternative methods of instaliation can be found in the [repo](https://github.com/datasciencecampus/census21api). After instilation, if you also want to make the interactive map at the bottom I would recomend creating a new conda enviroemnt and installing the census21api tool as well as the ```requirements.txt```. 

You can then import the core class from the module. Then create an instance of the class. 

In [2]:
from census21api import CensusAPI

api = CensusAPI()

## Querying population, area and feature types

After creating an instance you will need three key ingridients to feed into a query: population type, area type and feature(s). A population type must always be chosen first. The tool can help you choose what ingriedients to put into a query. In the cell below we fetch information on all of the population types avaliable to us. 

This creates a query and sends it to the Create A Custom Dataset API, this pulls down all of the metadata as a JSON and renders it. 

In [3]:
populations = api.query_population_types()
populations

Unnamed: 0,name,label,description,type
0,HH,All households,Either one usual resident living alone or a gr...,microdata
1,HRP,All Household Reference Persons,"A person who serves as a reference point, main...",microdata
2,UR,All usual residents,The main population base for census statistics...,microdata
3,UR_CE,All usual residents in communal establishments,A usual resident who lives in a place that pro...,microdata
4,UR_HH,All usual residents in households,A person who usually lives in England or Wales...,microdata


For this demonstraition we are going to choose to look at household reference persons. 

In [4]:
population_type = "HRP"

Next we can look at the different are types avaliable to us. Below I query all area types avaliable for the given population type of household reference persons. 

In [5]:
area_types = api.query_feature(population_type, "area-types")
area_types

Unnamed: 0,id,label,description,total_count,hierarchy_order,population_type
0,nat,England and Wales,Data for both England and Wales.,1,1400,HRP
1,ctry,Countries,Data for either the whole of England or Wales.,2,1300,HRP
2,rgn,Regions,"Data for the nine regions in England, and Wale...",10,1200,HRP
3,lep,Local enterprise partnerships,Local enterprise partnerships (LEPs) are volun...,37,1150,HRP
4,nhser,NHS England regions,Each NHS region is responsible for planning lo...,8,1100,HRP
5,lhb,Local health boards,Local health boards in Wales are responsible f...,8,1000,HRP
6,icb,Integrated care boards,Integrated care boards in England are responsi...,43,900,HRP
7,sicbl,Sub integrated care board locations,Sub integrated care board locations have repla...,107,800,HRP
8,utla23,2023 Upper tier local authorities,Upper tier local authorities provide a range o...,175,750,HRP
9,utla,Upper tier local authorities,Upper tier local authorities provide a range o...,174,700,HRP


The final ingrideint is the feature(s). We can choose one or more features for a query. The more features we have the more likely we are to get some missing values because the statistical disclosure control means we cant see very small counts. Instead of querying all possible features avaliable I restrict the search for those begining with ```hh_deprevation``` the ```hh_``` means only features which are relavant to hosueholds. We restrict to features that are measures of deprevation here just because there are so many possibilities.  

In [6]:
dimensions = api.query_feature(population_type, "dimensions")
dimensions[dimensions['id'].str.startswith("hh_deprivation")]

Unnamed: 0,id,label,description,total_count,quality_statement_text,population_type
20,hh_deprivation,Household deprivation (6 categories),The dimensions of deprivation used to classify...,6,Caution should be used in interpreting this va...,HRP
21,hh_deprivation_education,Household deprived in the education dimension ...,A household is classified as deprived in the e...,3,,HRP
22,hh_deprivation_employment,Household deprived in the employment dimension...,A household is classified as deprived in the e...,3,,HRP
23,hh_deprivation_health,Household deprived in the health and disabilit...,A household is classified as deprived in the h...,3,,HRP
24,hh_deprivation_housing,Household deprived in the housing dimension (3...,A household is classified as deprived in the h...,3,,HRP


Now we can choose the area type and feature that we will feed to the query. In this example we choose local authorities as defined with 2023 boundaries. We also choose to looko at the feature households deprived in the housing dimension. 

In [7]:
area_type = "ltla23"
dimension = "hh_deprivation_housing"

We feed these into a query, as we only have one dimension we wrap it in a list to make it an itterable with a single element. The ooutput shows us how many household reference people live in non-deprived (0) and deprived (1) housing. It also counts the number of people for whom we are missing the data or for whom the measure doesnt apply (-8). 

In [8]:
table = api.query_table(population_type, area_type, [dimension])
table.head(6)

Unnamed: 0,ltla23,hh_deprivation_housing,count,population_type
0,E06000001,-8,0,HRP
1,E06000001,0,39304,HRP
2,E06000001,1,1626,HRP
3,E06000002,-8,0,HRP
4,E06000002,0,56978,HRP
5,E06000002,1,3284,HRP


## Creating an interactive map

After we have the data we can visualise it. We do that here with an interactive map. 

In [9]:
import geopandas as gpd
import pandas as pd
import pyproj
import matplotlib.pyplot as plt

The shape files and lookup tables we need can be found on the [ONS Open Geography Portal](https://geoportal.statistics.gov.uk/)

The shape files can be found [here](https://geoportal.statistics.gov.uk/datasets/608940e46ed649e3b00a5409befe31f8_0/explore).
The lookup table can be found [here](https://geoportal.statistics.gov.uk/datasets/ons::middle-layer-super-output-area-2021-to-ward-to-lad-may-2023-lookup-in-england-and-wales/about).


In [10]:
bounds = gpd.read_file('data/MSOA_2021_EW_BFC_V6.shp')
bounds.head()

Unnamed: 0,MSOA21CD,MSOA21NM,BNG_E,BNG_N,LONG,LAT,GlobalID,geometry
0,E02000001,City of London 001,532384,181355,-0.09349,51.5156,283e7adc-faef-4736-9a0b-146cb27c72ec,"POLYGON ((532153.703 182165.155, 532158.250 18..."
1,E02000002,Barking and Dagenham 001,548267,189685,0.138756,51.5865,7b32290e-3b18-45b1-b5d5-bf778f71e3ce,"POLYGON ((548881.304 190819.980, 548881.125 19..."
2,E02000003,Barking and Dagenham 002,548259,188520,0.138149,51.576,56f43674-2eda-47c0-819a-0cabeb9595f5,"POLYGON ((548958.555 189072.176, 548954.517 18..."
3,E02000004,Barking and Dagenham 003,551004,186412,0.176828,51.5564,fbedb5c5-b92a-475f-899e-1baf1dbae111,"POLYGON ((551550.056 187364.705, 551528.633 18..."
4,E02000005,Barking and Dagenham 004,548733,186824,0.144267,51.5607,f04829fe-a903-4bca-a88a-ace39fdbd3ac,"POLYGON ((549237.051 187627.941, 549241.319 18..."


In [11]:
lookup = pd.read_csv("data/MSOA_2021_to_LAD_2023_Lookup.csv")
lookup.head()

Unnamed: 0,MSOA21CD,MSOA21NM,MSOA21NMW,WD23CD,WD23NM,WD23NMW,LAD23CD,LAD23NM,LAD23NMW,ObjectId
0,E02002489,Hartlepool 007,,E05013038,Burn Valley,,E06000001,Hartlepool,,1
1,E02002490,Hartlepool 008,,E05013041,Foggy Furze,,E06000001,Hartlepool,,2
2,E02002483,Hartlepool 001,,E05013042,Hart,,E06000001,Hartlepool,,3
3,E02002484,Hartlepool 002,,E05013043,Headland & Harbour,,E06000001,Hartlepool,,4
4,E02002491,Hartlepool 009,,E05013044,Manor House,,E06000001,Hartlepool,,5


We combine the local authority codes and names with their boundary data. 

In [12]:
auths = bounds.merge(lookup).dissolve(by=["LAD23CD", "LAD23NM"])[["geometry"]].reset_index()
auths["geometry"] = auths["geometry"].simplify(100).buffer(0)
auths.head()

Unnamed: 0,LAD23CD,LAD23NM,geometry
0,E06000001,Hartlepool,"POLYGON ((452352.503 530704.054, 452701.750 52..."
1,E06000002,Middlesbrough,"MULTIPOLYGON (((451933.804 516881.716, 453270...."
2,E06000003,Redcar and Cleveland,"MULTIPOLYGON (((468700.724 522049.101, 469891...."
3,E06000004,Stockton-on-Tees,"MULTIPOLYGON (((446745.187 513828.305, 447079...."
4,E06000005,Darlington,"POLYGON ((435827.188 515305.187, 436038.000 51..."


We combine the above with the calculation proportion of household refernce people in each local authority who live in deprived housing. 

In [13]:
feature = f"prop_{dimension}"
table[feature] = table["count"]/ table.groupby("ltla23")["count"].transform("sum")

deprived = table[table[dimension]>0][["ltla23", feature]]

combined = auths.merge(deprived, left_on="LAD23CD", right_on= "ltla23")

We create an interactive map. We can see hotspots of local authorities with higher proportions  of deprived housing in some cities such as London and in rural areas such as Cornwall. 

In [14]:
combined.explore(feature, cmap = "magma")