## Index of Multiple Deprivation etc -simple visualisation 

The data contains the ranks and deciles for the Index of Multiple Deprivation at Lower-layer Super Output Area (LSOA) level.


The data used in this notebook and that might be used for the EMA is from the following location

www.gov.uk/government/statistics/english-indices-of-deprivation-2015

Available are reports that provide how the data has been collated and aggregated to create these indices. 
In this notebook, as an example one data file is used. 

The data file has been copied and modified slightly to make this easier to work with and is called example.csv. This is just an example investigation and you are welcome to explore and use any of the data files provided. 

After exploring the data - here are some example visualisations 


In [1]:
!ls ../

 2020J_TMA02	    2020j_green_collection   Solution
 2020J_TMA02_data   Reports		    'TMA 02_ Questions.pdf'


In [2]:
!pwd

/resources/DataScience/2020J_TMA02


In [3]:
import pandas as pd

import matplotlib.pyplot as plt 

furl = '../2020J_TMA02_data/Deprivation_Index/example.csv'

df = pd.read_csv(furl)
df.head()

Unnamed: 0,LSOAcode2011,LSOAname2011,LocalAuthorityDistrictcode2013,LocalAuthorityDistrictname2013,IndexofMultipleDeprivationRank,IndexofMultipleDeprivationDecile,IncomeRank,IncomeDecile,EmploymentRank,EmploymentDecile,EducationSkillsandTrainingRank,EducationSkillsandTrainingDecile,HealthDeprivationandDisabilityRank,HealthDeprivationandDisabilityDecile,CrimeRank,CrimeDecile,BarrierstoHousingandServicesRank,BarrierstoHousingandServicesDecile,LivingEnvironmentRank,LivingEnvironmentDecile
0,E01031349,Adur 001A,E07000223,Adur,21352,7,18992,6,19305,6,13727,5,25876,8,12817,4,28166,9,18367,6
1,E01031350,Adur 001B,E07000223,Adur,8864,3,9233,3,7879,3,6969,3,6883,3,12781,4,11399,4,16242,5
2,E01031351,Adur 001C,E07000223,Adur,22143,7,24539,8,23389,8,10213,4,24693,8,9112,3,24743,8,22299,7
3,E01031352,Adur 001D,E07000223,Adur,17252,6,16087,5,13699,5,10468,4,19627,6,16127,5,19559,6,22819,7
4,E01031370,Adur 001E,E07000223,Adur,15643,5,17918,6,13322,5,7819,3,14893,5,13723,5,22433,7,18191,6


This file that has been loaded with a columns (20). It is worth seeing what type of columns are available for the data



In [4]:
list(df.columns)

['LSOAcode2011',
 'LSOAname2011',
 'LocalAuthorityDistrictcode2013',
 'LocalAuthorityDistrictname2013',
 'IndexofMultipleDeprivationRank',
 'IndexofMultipleDeprivationDecile',
 'IncomeRank',
 'IncomeDecile',
 'EmploymentRank',
 'EmploymentDecile',
 'EducationSkillsandTrainingRank ',
 'EducationSkillsandTrainingDecile',
 'HealthDeprivationandDisabilityRank',
 'HealthDeprivationandDisabilityDecile',
 'CrimeRank',
 'CrimeDecile ',
 'BarrierstoHousingandServicesRank ',
 'BarrierstoHousingandServicesDecile ',
 'LivingEnvironmentRank ',
 'LivingEnvironmentDecile ']

In [5]:
ListofNames = df['LocalAuthorityDistrictname2013'].unique()
ListofNames

array(['Adur', 'Allerdale', 'Amber Valley', 'Arun', 'Ashfield', 'Ashford',
       'Aylesbury Vale', 'Babergh', 'Barking and Dagenham', 'Barnet',
       'Barnsley', 'Barrow-in-Furness', 'Basildon',
       'Basingstoke and Deane', 'Bassetlaw',
       'Bath and North East Somerset', 'Bedford', 'Bexley', 'Birmingham',
       'Blaby', 'Blackburn with Darwen', 'Blackpool', 'Bolsover',
       'Bolton', 'Boston', 'Bournemouth', 'Bracknell Forest', 'Bradford',
       'Braintree', 'Breckland', 'Brent', 'Brentwood',
       'Brighton and Hove', 'Bristol, City of', 'Broadland', 'Bromley',
       'Bromsgrove', 'Broxbourne', 'Broxtowe', 'Burnley', 'Bury',
       'Calderdale', 'Cambridge', 'Camden', 'Cannock Chase', 'Canterbury',
       'Carlisle', 'Castle Point', 'Central Bedfordshire', 'Charnwood',
       'Chelmsford', 'Cheltenham', 'Cherwell', 'Cheshire East',
       'Cheshire West and Chester', 'Chesterfield', 'Chichester',
       'Chiltern', 'Chorley', 'Christchurch', 'City of London',
       'Co

the list can be used to find the areas that you are interested in 

In [6]:
df[df.LocalAuthorityDistrictname2013 == 'Bradford'].head()



Unnamed: 0,LSOAcode2011,LSOAname2011,LocalAuthorityDistrictcode2013,LocalAuthorityDistrictname2013,IndexofMultipleDeprivationRank,IndexofMultipleDeprivationDecile,IncomeRank,IncomeDecile,EmploymentRank,EmploymentDecile,EducationSkillsandTrainingRank,EducationSkillsandTrainingDecile,HealthDeprivationandDisabilityRank,HealthDeprivationandDisabilityDecile,CrimeRank,CrimeDecile,BarrierstoHousingandServicesRank,BarrierstoHousingandServicesDecile,LivingEnvironmentRank,LivingEnvironmentDecile
3095,E01010646,Bradford 001A,E08000032,Bradford,30426,10,27564,9,29855,10,31470,10,22927,7,32447,10,25639,8,14518,5
3096,E01010647,Bradford 001B,E08000032,Bradford,26378,9,20319,7,22965,7,28414,9,12912,4,31074,10,31616,10,25406,8
3097,E01010648,Bradford 001C,E08000032,Bradford,29500,9,27109,9,27437,9,30579,10,19905,7,32574,10,19208,6,21196,7
3098,E01010692,Bradford 001D,E08000032,Bradford,29485,9,30820,10,26796,9,32677,10,22549,7,23108,8,20126,7,16628,6
3099,E01010691,Bradford 002A,E08000032,Bradford,32725,10,31852,10,32530,10,32786,10,27915,9,29647,10,29258,9,25899,8


The local authority district code might be interesting later or the area code.

## Once you have an area of interest you might want to filter on that area. Here is an example using Bradford

In [7]:
df.query('LocalAuthorityDistrictname2013 =="Bradford"')

Unnamed: 0,LSOAcode2011,LSOAname2011,LocalAuthorityDistrictcode2013,LocalAuthorityDistrictname2013,IndexofMultipleDeprivationRank,IndexofMultipleDeprivationDecile,IncomeRank,IncomeDecile,EmploymentRank,EmploymentDecile,EducationSkillsandTrainingRank,EducationSkillsandTrainingDecile,HealthDeprivationandDisabilityRank,HealthDeprivationandDisabilityDecile,CrimeRank,CrimeDecile,BarrierstoHousingandServicesRank,BarrierstoHousingandServicesDecile,LivingEnvironmentRank,LivingEnvironmentDecile
3095,E01010646,Bradford 001A,E08000032,Bradford,30426,10,27564,9,29855,10,31470,10,22927,7,32447,10,25639,8,14518,5
3096,E01010647,Bradford 001B,E08000032,Bradford,26378,9,20319,7,22965,7,28414,9,12912,4,31074,10,31616,10,25406,8
3097,E01010648,Bradford 001C,E08000032,Bradford,29500,9,27109,9,27437,9,30579,10,19905,7,32574,10,19208,6,21196,7
3098,E01010692,Bradford 001D,E08000032,Bradford,29485,9,30820,10,26796,9,32677,10,22549,7,23108,8,20126,7,16628,6
3099,E01010691,Bradford 002A,E08000032,Bradford,32725,10,31852,10,32530,10,32786,10,27915,9,29647,10,29258,9,25899,8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3400,E01010869,Bradford 061B,E08000032,Bradford,6501,2,7226,3,6851,3,6437,2,5613,2,4803,2,24626,8,5094,2
3401,E01010870,Bradford 061C,E08000032,Bradford,17084,6,18086,6,17443,6,18320,6,9678,3,17352,6,18590,6,10638,4
3402,E01010871,Bradford 061D,E08000032,Bradford,9227,3,11842,4,10403,4,9361,3,10664,4,4970,2,28437,9,2086,1
3403,E01010872,Bradford 061E,E08000032,Bradford,2927,1,3490,2,2625,1,2082,1,3035,1,4144,2,22187,7,9101,3


There are over 57 columns - it would be good to have a list of what type of data is being held - remember list(df.columns)

The reports provided explain about how the numerical values - how they are calculated and their significance. 
The lower the value e.g.  the number 1 is the most deprived in the Index of Multiple Deprivation factors.

This example looks at extracting for Bradford the those areas with a value of 1


Quick look at the data to see if anything interesting 

In [8]:
df_wide = df.pivot(index= 'IndexofMultipleDeprivationDecile', columns='LSOAname2011', values='IndexofMultipleDeprivationRank')
df_wide.head()

LSOAname2011,Adur 001A,Adur 001B,Adur 001C,Adur 001D,Adur 001E,Adur 001F,Adur 002A,Adur 002B,Adur 002C,Adur 002D,...,York 022F,York 023A,York 023B,York 023C,York 024A,York 024B,York 024C,York 024D,York 024E,York 024F
IndexofMultipleDeprivationDecile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,8864.0,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,15643.0,,,,,,...,,,,,,,,,,


In [None]:
df_wide.describe()

Let's focus on Bradford filter for IMD of 1 and 2

In [None]:
tmp = df.query('LocalAuthorityDistrictname2013 =="Bradford" & IndexofMultipleDeprivationDecile < 3')
tmp1_wide = tmp.pivot(index= 'IndexofMultipleDeprivationDecile', columns='LSOAname2011', values='IndexofMultipleDeprivationRank')
tmp1_wide.head()

From initial analysis of the IMD of there are 131 areas in Bradford that fall into the IMD of 1 and 2. We have over a number of columns that would be interesting to investigate.

In [None]:
list(df.columns)

You might want to explore specific data areas for an area. 
Filter the data further. Some examples are provided in data exploration notebook

In [None]:
df.query('LocalAuthorityDistrictname2013 =="Bradford" & IndexofMultipleDeprivationDecile == 1')

See what kind of count we have across the spectrum on Education

In [None]:
tmp2 = df.query('LocalAuthorityDistrictname2013 =="Bradford"')
tmp3_wide = tmp2.pivot(index= 'LSOAname2011',columns =  'EducationSkillsandTrainingDecile', values= 'EducationSkillsandTrainingRank ')

tmp2.groupby('EducationSkillsandTrainingDecile')['EducationSkillsandTrainingRank '].nunique().plot(kind='bar')
plt.show()

What happens when we fildter on IMD as 1 

In [None]:

tmp2 = df.query('LocalAuthorityDistrictname2013 =="Bradford" & IndexofMultipleDeprivationDecile == 1')
tmp3_wide = tmp2.pivot(index= 'LSOAname2011',columns =  'EducationSkillsandTrainingDecile', values= 'EducationSkillsandTrainingRank ')

tmp2.groupby('EducationSkillsandTrainingDecile')['EducationSkillsandTrainingRank '].nunique().plot(kind='bar')
plt.show()


When exploring the data in the previous notebook the results showed that a third of Bradford was classed in the most deprived decile (remember index decile). The data illustrated that the education skills and Training Decile for the most deprived is 1 or 2 (lowest). The graph helps to see this. However, there is more to be done with the rank value.

The reports shared about this data see The English Indices of Deprivation 2015 Research report. The following details have been abstracted from the report (see page 21)

The 32,844 Lower-layer Super Output Areas in England are ranked according to their deprivation score. For each of the neighbourhood-level indices, the most deprived Lower-layer Super Output Area in England is given a rank of 1, and the least deprived a rank of 32,844.

The deciles are produced by ranking the 32,844 Lower-layer Super Output Areas and dividing them into 10 equal-sized groups. Decile 1 represents the most deprived 10 per cent of areas nationally and decile 10, the least deprived 10 per cent of areas nationally.

The ranks and deciles can straightforwardly be interpreted as showing broadly whether a Lower-layer Super Output Area is more deprived than any other such area in the country. The ranks (and deciles) are relative: they show that one area is more deprived than another but not by how much. For example, if an area has a rank of 1,000, it is not half as deprived as a place with a rank of 500.

The ranks and deciles are based on scores: the larger the score, the more deprived the area. In the case of the Income and Employment deprivation domains and the supplementary Indices, the scores are meaningful and relate to a proportion of the relevant population experiencing that type of deprivation (see relevant sections below for details). This means that in addition to the ranks which show relative deprivation, the scores for these domains can be used to compare areas on an absolute scale 

Think about how you might further focus the data e.g. the rank data perhaps. 

It would be interesting to see employment and education value, living environment and health factors etc. 