# Introduction to Data Analysis with Pandas

In [1]:
# We tend to abbreviate the pandas library as pd
import pandas as pd
# Stop pandas from abbreviating tables to fit in the notebook
pd.options.display.max_columns = 1000
pd.options.display.max_rows = 1000
# Display graphs in the notebook
%matplotlib inline

## Getting the data into Python

The `pandas` library stores data in what it calls a *dataframe*, which is really just a smart table.

We use the `read_csv` function to read in our London Boroughs data.

In [2]:
# read in our csv file, and automatically change missing values (a dot in the csv) into NaN
boroughs = pd.read_csv('Boroughs.csv', na_values = ['.',' '])
# Use the head function to see the first few rows
boroughs.head(5)

Unnamed: 0,Borough,InnerOuter,Population,Households,Area,Density,Age,LT15,WorkAge,Over65,Netmig,Bornabroad,Migrant1,Migrant1p,Migrant2,Migrant2p,Migrant3,Migrant3p,BAME,E2L,Arriving,Arriving1,Arriving2,Arriving3,Employ,MEmploy,FEmploy,Unemploy,UnemployY,NEET,OOWBenefit,Disability,NoQual,Degree,Pay,MPay,FPay,Medianpay,Volunteer,Jobs,Publicsector,Jobdensity,Business,BSurvival,Crime,Fires,Ambulance,HPrice,CTax,NewHomes,HOwned,Mortgages,LARent,PRent,Green,Carbon,Recycle,Cars,Cycle,PublicT,GCSEs,LookedAfter,PupilNE,PupilOOW,MLE,FLE,TeenCR,Satisfaction,Worthwhile,Happy,Anxiety,CObesity,Diabetes,PMortality,PControl,Conservative,Labour,Libdem,Turnout
0,City of London,Inner London,8800,5326.0,290,30.3,43.2,11.4,73.1,15.5,665,,United States,2.8,France,2.0,Australia,1.9,27.5,17.1,152.2,India,France,United States,64.6,,,,1.6,,3.4,,,,,,,"£63,620",,500400.0,3.4,84.3,26130,64.3,,12.3,,799999.0,931.2,80.0,,,,,4.8,1036.0,34.4,1692,16.9,7.9,78.6,101.0,,7.9,,,,6.6,7.1,6.0,5.6,,2.6,129.0,,,,,
1,Barking and Dagenham,Outer London,209000,78188.0,3611,57.9,32.9,27.2,63.1,9.7,2509,37.8,Nigeria,4.7,India,2.3,Pakistan,2.3,49.5,18.7,59.1,Romania,Bulgaria,Lithuania,65.8,75.6,56.5,11.0,4.5,5.7,10.5,17.2,11.3,32.2,27886.0,30104.0,24602.0,"£29,420",20.5,58900.0,21.1,0.5,6560,73.0,83.4,3.0,13.7,243500.0,1354.03,730.0,16.4,27.4,35.9,20.3,33.6,644.0,23.4,56966,8.8,3.0,58.0,69.0,41.7,18.7,77.6,82.1,32.4,7.1,7.6,7.1,3.1,28.5,7.3,228.0,Lab,0.0,100.0,0.0,36.5
2,Barnet,Outer London,389600,151423.0,8675,44.9,37.3,21.1,64.9,14.0,5407,35.2,India,3.1,Poland,2.4,Iran,2.0,38.7,23.4,53.1,Romania,Poland,Italy,68.5,74.5,62.9,8.5,1.9,2.5,6.2,14.9,5.2,49.0,33443.0,36475.0,31235.0,"£40,530",33.2,167300.0,18.7,0.7,26190,73.8,62.7,1.6,11.1,445000.0,1397.07,1460.0,32.4,25.2,11.1,31.1,41.3,1415.0,38.0,144717,7.4,3.0,67.3,35.0,46.0,9.3,82.1,85.1,12.8,7.5,7.8,7.4,2.8,20.7,6.0,134.0,Cons,50.8,,1.6,40.5
3,Bexley,Outer London,244300,97736.0,6058,40.3,39.0,20.6,62.9,16.6,760,16.1,Nigeria,2.6,India,1.5,Ireland,0.9,21.4,6.0,14.4,Romania,Poland,Nigeria,75.1,82.1,68.5,7.6,2.9,3.4,6.8,15.9,10.8,33.5,34350.0,37881.0,28924.0,"£36,990",22.1,80700.0,15.9,0.6,9075,73.5,51.8,2.3,11.8,275000.0,1472.43,-130.0,38.1,35.3,15.2,11.4,31.7,975.0,54.0,108507,10.6,2.6,60.3,46.0,32.6,12.6,80.4,84.4,19.5,7.4,7.7,7.2,3.3,22.7,6.9,164.0,Cons,71.4,23.8,0.0,39.6
4,Brent,Outer London,332100,121048.0,4323,76.8,35.6,20.9,67.8,11.3,7640,53.9,India,9.2,Poland,3.4,Ireland,2.9,64.9,37.2,100.9,Romania,Italy,Portugal,69.5,76.0,62.6,7.5,3.1,2.6,8.3,17.7,6.2,45.1,29812.0,30129.0,29600.0,"£32,140",17.3,133600.0,17.6,0.6,15745,74.4,78.8,1.8,12.1,407250.0,1377.24,1050.0,22.2,22.6,20.4,34.8,21.9,1175.0,35.2,87802,7.9,3.7,60.1,45.0,37.6,13.7,80.1,85.1,18.5,7.3,7.4,7.2,2.9,24.3,7.9,169.0,Lab,9.5,88.9,1.6,36.3


### Q1

> What do you think `NaN` stands for?

## Accessing the columns

A single column of the data is accessible using Python dot notation

In [3]:
boroughs.

0             City of London
1       Barking and Dagenham
2                     Barnet
3                     Bexley
4                      Brent
5                    Bromley
6                     Camden
7                    Croydon
8                     Ealing
9                    Enfield
10                 Greenwich
11                   Hackney
12    Hammersmith and Fulham
13                  Haringey
14                    Harrow
15                  Havering
16                Hillingdon
17                  Hounslow
18                 Islington
19    Kensington and Chelsea
20      Kingston upon Thames
21                   Lambeth
22                  Lewisham
23                    Merton
24                    Newham
25                 Redbridge
26      Richmond upon Thames
27                 Southwark
28                    Sutton
29             Tower Hamlets
30            Waltham Forest
31                Wandsworth
32               Westminster
33              Inner London
34            

Or we can use square brackets, a bit like with a Python list or dictionary.

In [4]:
boroughs['Population']

0         8800
1       209000
2       389600
3       244300
4       332100
5       327900
6       242500
7       386500
8       351600
9       333000
10      280100
11      274300
12      185300
13      278000
14      252300
15      254300
16      301000
17      274200
18      231200
19      159000
20      175400
21      328900
22      303400
23      208100
24      342900
25      304200
26      197300
27      314300
28      202600
29      304000
30      276200
31      321000
32      242100
33     3535700
34     5299800
35     8835500
36    55609600
37    65999100
Name: Population, dtype: int64

### Q2

> Try out both ways of accessing columns.
>
> This isn't as helpful as it could be. Why not?

Square brackets are more flexible. We can give them a list of headings.

In [5]:
# note the nested brackets
boroughs[['Borough','Population','Happy']]

Unnamed: 0,Borough,Population,Happy
0,City of London,8800,6.0
1,Barking and Dagenham,209000,7.1
2,Barnet,389600,7.4
3,Bexley,244300,7.2
4,Brent,332100,7.2
5,Bromley,327900,7.4
6,Camden,242500,7.1
7,Croydon,386500,7.2
8,Ealing,351600,7.3
9,Enfield,333000,7.3


This is better. But it would be nice if we didn't have to keep including the `Borough` column. So let's make that our *index*

In [6]:
boroughs = boroughs.set_index(boroughs.Borough)
boroughs.head(5)

Unnamed: 0_level_0,Borough,InnerOuter,Population,Households,Area,Density,Age,LT15,WorkAge,Over65,Netmig,Bornabroad,Migrant1,Migrant1p,Migrant2,Migrant2p,Migrant3,Migrant3p,BAME,E2L,Arriving,Arriving1,Arriving2,Arriving3,Employ,MEmploy,FEmploy,Unemploy,UnemployY,NEET,OOWBenefit,Disability,NoQual,Degree,Pay,MPay,FPay,Medianpay,Volunteer,Jobs,Publicsector,Jobdensity,Business,BSurvival,Crime,Fires,Ambulance,HPrice,CTax,NewHomes,HOwned,Mortgages,LARent,PRent,Green,Carbon,Recycle,Cars,Cycle,PublicT,GCSEs,LookedAfter,PupilNE,PupilOOW,MLE,FLE,TeenCR,Satisfaction,Worthwhile,Happy,Anxiety,CObesity,Diabetes,PMortality,PControl,Conservative,Labour,Libdem,Turnout
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1
City of London,City of London,Inner London,8800,5326.0,290,30.3,43.2,11.4,73.1,15.5,665,,United States,2.8,France,2.0,Australia,1.9,27.5,17.1,152.2,India,France,United States,64.6,,,,1.6,,3.4,,,,,,,"£63,620",,500400.0,3.4,84.3,26130,64.3,,12.3,,799999.0,931.2,80.0,,,,,4.8,1036.0,34.4,1692,16.9,7.9,78.6,101.0,,7.9,,,,6.6,7.1,6.0,5.6,,2.6,129.0,,,,,
Barking and Dagenham,Barking and Dagenham,Outer London,209000,78188.0,3611,57.9,32.9,27.2,63.1,9.7,2509,37.8,Nigeria,4.7,India,2.3,Pakistan,2.3,49.5,18.7,59.1,Romania,Bulgaria,Lithuania,65.8,75.6,56.5,11.0,4.5,5.7,10.5,17.2,11.3,32.2,27886.0,30104.0,24602.0,"£29,420",20.5,58900.0,21.1,0.5,6560,73.0,83.4,3.0,13.7,243500.0,1354.03,730.0,16.4,27.4,35.9,20.3,33.6,644.0,23.4,56966,8.8,3.0,58.0,69.0,41.7,18.7,77.6,82.1,32.4,7.1,7.6,7.1,3.1,28.5,7.3,228.0,Lab,0.0,100.0,0.0,36.5
Barnet,Barnet,Outer London,389600,151423.0,8675,44.9,37.3,21.1,64.9,14.0,5407,35.2,India,3.1,Poland,2.4,Iran,2.0,38.7,23.4,53.1,Romania,Poland,Italy,68.5,74.5,62.9,8.5,1.9,2.5,6.2,14.9,5.2,49.0,33443.0,36475.0,31235.0,"£40,530",33.2,167300.0,18.7,0.7,26190,73.8,62.7,1.6,11.1,445000.0,1397.07,1460.0,32.4,25.2,11.1,31.1,41.3,1415.0,38.0,144717,7.4,3.0,67.3,35.0,46.0,9.3,82.1,85.1,12.8,7.5,7.8,7.4,2.8,20.7,6.0,134.0,Cons,50.8,,1.6,40.5
Bexley,Bexley,Outer London,244300,97736.0,6058,40.3,39.0,20.6,62.9,16.6,760,16.1,Nigeria,2.6,India,1.5,Ireland,0.9,21.4,6.0,14.4,Romania,Poland,Nigeria,75.1,82.1,68.5,7.6,2.9,3.4,6.8,15.9,10.8,33.5,34350.0,37881.0,28924.0,"£36,990",22.1,80700.0,15.9,0.6,9075,73.5,51.8,2.3,11.8,275000.0,1472.43,-130.0,38.1,35.3,15.2,11.4,31.7,975.0,54.0,108507,10.6,2.6,60.3,46.0,32.6,12.6,80.4,84.4,19.5,7.4,7.7,7.2,3.3,22.7,6.9,164.0,Cons,71.4,23.8,0.0,39.6
Brent,Brent,Outer London,332100,121048.0,4323,76.8,35.6,20.9,67.8,11.3,7640,53.9,India,9.2,Poland,3.4,Ireland,2.9,64.9,37.2,100.9,Romania,Italy,Portugal,69.5,76.0,62.6,7.5,3.1,2.6,8.3,17.7,6.2,45.1,29812.0,30129.0,29600.0,"£32,140",17.3,133600.0,17.6,0.6,15745,74.4,78.8,1.8,12.1,407250.0,1377.24,1050.0,22.2,22.6,20.4,34.8,21.9,1175.0,35.2,87802,7.9,3.7,60.1,45.0,37.6,13.7,80.1,85.1,18.5,7.3,7.4,7.2,2.9,24.3,7.9,169.0,Lab,9.5,88.9,1.6,36.3


### Q3 

> What changed?

Now, when we ask for column, we'll get the borough for free

In [7]:
boroughs[['Age','WorkAge']]

Unnamed: 0_level_0,Age,WorkAge
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
City of London,43.2,73.1
Barking and Dagenham,32.9,63.1
Barnet,37.3,64.9
Bexley,39.0,62.9
Brent,35.6,67.8
Bromley,40.2,62.6
Camden,36.4,71.0
Croydon,37.0,64.9
Ealing,36.2,66.8
Enfield,36.3,64.4


Now we can use the `loc` to *locate* the index Haringey.

In [10]:
boroughs.loc[['Haringey','Hackney']]

Unnamed: 0_level_0,Borough,InnerOuter,Population,Households,Area,Density,Age,LT15,WorkAge,Over65,Netmig,Bornabroad,Migrant1,Migrant1p,Migrant2,Migrant2p,Migrant3,Migrant3p,BAME,E2L,Arriving,Arriving1,Arriving2,Arriving3,Employ,MEmploy,FEmploy,Unemploy,UnemployY,NEET,OOWBenefit,Disability,NoQual,Degree,Pay,MPay,FPay,Medianpay,Volunteer,Jobs,Publicsector,Jobdensity,Business,BSurvival,Crime,Fires,Ambulance,HPrice,CTax,NewHomes,HOwned,Mortgages,LARent,PRent,Green,Carbon,Recycle,Cars,Cycle,PublicT,GCSEs,LookedAfter,PupilNE,PupilOOW,MLE,FLE,TeenCR,Satisfaction,Worthwhile,Happy,Anxiety,CObesity,Diabetes,PMortality,PControl,Conservative,Labour,Libdem,Turnout
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1
Haringey,Haringey,Inner London,278000,115608.0,2960,93.9,35.1,20.0,70.7,9.3,6675,39.6,Poland,4.3,Turkey,4.0,Jamaica,2.0,38.2,29.7,78.5,Romania,Italy,Bulgaria,71.3,77.6,64.8,5.7,5.6,3.5,9.7,16.3,8.8,49.2,31063.0,,29513.0,"£35,420",29.8,91500.0,17.8,0.5,12675,74.4,90.2,2.1,12.3,432500.0,1484.01,240.0,18.0,24.7,33.4,23.9,25.5,773.0,37.3,61515,14.1,4.3,59.7,67.0,48.0,16.9,80.1,84.9,22.6,7.2,7.5,7.2,3.2,23.8,5.9,183.0,Lab,0.0,84.2,15.8,38.1
Hackney,Hackney,Inner London,274300,115417.0,1905,144.0,33.1,20.7,72.1,7.2,3359,35.8,Turkey,3.6,Nigeria,2.7,Jamaica,1.8,43.6,24.1,46.0,Italy,France,Spain,69.0,72.8,65.3,5.9,4.8,3.0,10.7,17.9,10.8,49.2,32056.0,,31919.0,"£35,140",29.6,132800.0,18.1,0.7,18510,76.8,99.6,2.7,11.5,485000.0,1294.42,830.0,11.1,19.8,45.4,23.3,23.2,813.0,25.3,41800,25.2,4.9,60.6,53.0,44.2,19.7,78.5,83.3,24.7,7.0,7.3,7.0,3.8,27.0,5.8,211.0,Lab,7.0,87.7,5.3,39.4


### Q4

> Pick another borough to retreive the data for. Compare it to Haringey.

## Sorting and filtering

Let's find out which boroughs have the highest population.

`pandas` dataframes have a `sort_values` function.

### Q5

Remember in a jupyter notebook, you can put the cursor in the function brackets and hit `shift`+`tab` to bring up documentation for that function.

> Make the sort_values function below work, to put the boroughs in order of population
>
> Now put them in *descending* order
>
> Which borough has the largest population?

In [15]:
# *** broken ***
boroughs.sort_values(by='Population',ascending=False)

Unnamed: 0_level_0,Borough,InnerOuter,Population,Households,Area,Density,Age,LT15,WorkAge,Over65,Netmig,Bornabroad,Migrant1,Migrant1p,Migrant2,Migrant2p,Migrant3,Migrant3p,BAME,E2L,Arriving,Arriving1,Arriving2,Arriving3,Employ,MEmploy,FEmploy,Unemploy,UnemployY,NEET,OOWBenefit,Disability,NoQual,Degree,Pay,MPay,FPay,Medianpay,Volunteer,Jobs,Publicsector,Jobdensity,Business,BSurvival,Crime,Fires,Ambulance,HPrice,CTax,NewHomes,HOwned,Mortgages,LARent,PRent,Green,Carbon,Recycle,Cars,Cycle,PublicT,GCSEs,LookedAfter,PupilNE,PupilOOW,MLE,FLE,TeenCR,Satisfaction,Worthwhile,Happy,Anxiety,CObesity,Diabetes,PMortality,PControl,Conservative,Labour,Libdem,Turnout
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1
United Kingdom,United Kingdom,,65999100,,,,40.1,18.8,63.3,17.8,,13.3,,,,,,,,,20.0,Romania,Poland,Italy,73.5,78.5,68.5,5.3,,,,19.5,8.8,36.9,28213.0,30567.0,24833.0,"£30,600",,,17.6,0.8,2672025,75.0,,,,,,,32.3,31.8,17.9,17.9,,403797.0,,30333100,,,,,,18.2,,,,7.5,7.7,7.3,3.0,-,,,,,,,
England,England,,55609600,,13025967.0,,40.0,19.0,63.3,17.7,313240.0,14.6,India,1.3,Poland,1.1,Pakistan,0.9,,8.0,21.7,Romania,Poland,Italy,73.9,79.1,68.6,5.1,3.1,4.7,8.7,19.2,8.4,36.7,28503.0,30943.0,24965.0,,24.5,28556100.0,16.8,0.8,2348065,75.1,65.7,,,209995.0,,,32.1,31.8,17.5,18.5,87.5,324054.0,43.7,25696833,14.7,,56.8,60.0,15.7,14.0,79.545876,83.19603,22.8,7.5,7.7,7.3,3.0,19.8,6.2,183.0,,,,,
London,London,,8835500,3601963.0,157215.0,56.2,36.0,13.9,73.6,12.5,133901.0,36.6,India,3.2,Poland,1.9,Ireland,1.6,42.5,22.1,53.9,Romania,Italy,Spain,72.9,79.3,66.5,6.1,3.6,3.4,7.7,16.1,7.3,49.9,33776.0,36697.0,30979.0,"£39,110",25.7,5633400.0,15.3,1.0,541310,73.0,84.0,2.3,12.3,399950.0,,,22.0,27.9,23.1,26.8,38.3,35817.0,33.1,2664414,14.7,3.8,61.8,51.0,29.3,14.4,80.3,84.2,21.5,7.3,7.6,7.2,3.3,23.2,6.0,169.0,,33.1,57.3,6.3,38.9
Outer London,Outer London,,5299800,2079422.0,125424.0,42.3,36.9,13.8,71.7,14.5,82685.0,34.2,India,4.1,Poland,2.1,Pakistan,1.6,42.1,20.1,46.8,Romania,Poland,Italy,73.3,80.3,66.4,5.9,3.2,3.4,7.0,16.4,7.3,44.7,,,,"£38,360",,2190400.0,16.8,0.7,253725,75.2,69.4,2.1,11.8,350000.0,,,27.3,32.0,16.7,23.9,42.5,,,1939058,,3.0,,47.0,44.0,12.2,,,20.7,7.3,7.6,7.3,3.2,-,6.5,,,39.2,49.4,7.8,39.6
Inner London,Inner London,,3535700,1522541.0,31929.0,110.7,34.7,38.5,54.7,6.8,78597.0,40.1,Bangladesh,2.5,India,1.8,Ireland,1.7,43.1,25.2,63.1,Italy,Romania,Spain,72.3,78.0,66.6,6.4,4.1,3.3,8.6,15.6,7.2,57.0,,,,"£40,290",,3442500.0,14.4,1.4,287585,71.0,106.4,2.6,13.1,495000.0,,,14.6,22.2,32.1,30.9,21.7,,,725356,,4.9,,56.0,49.6,0.8,,,23.1,7.3,7.5,7.2,3.4,-,5.3,,,23.4,69.7,3.8,37.7
Barnet,Barnet,Outer London,389600,151423.0,8675.0,44.9,37.3,21.1,64.9,14.0,5407.0,35.2,India,3.1,Poland,2.4,Iran,2.0,38.7,23.4,53.1,Romania,Poland,Italy,68.5,74.5,62.9,8.5,1.9,2.5,6.2,14.9,5.2,49.0,33443.0,36475.0,31235.0,"£40,530",33.2,167300.0,18.7,0.7,26190,73.8,62.7,1.6,11.1,445000.0,1397.07,1460.0,32.4,25.2,11.1,31.1,41.3,1415.0,38.0,144717,7.4,3.0,67.3,35.0,46.0,9.3,82.1,85.1,12.8,7.5,7.8,7.4,2.8,20.7,6.0,134.0,Cons,50.8,,1.6,40.5
Croydon,Croydon,Outer London,386500,159010.0,8650.0,44.7,37.0,22.0,64.9,13.0,2438.0,29.4,India,3.6,Jamaica,2.5,Ghana,1.5,49.9,14.5,32.3,Romania,Poland,Italy,75.4,81.8,69.5,4.1,4.8,3.3,7.8,17.5,7.0,40.6,32696.0,35839.0,29819.0,"£37,000",27.2,141600.0,20.1,0.6,15540,75.3,77.0,2.2,12.8,300000.0,1494.13,2040.0,30.8,33.6,16.7,18.6,37.1,1237.0,39.9,140049,12.8,3.2,57.7,86.0,36.7,14.1,80.3,83.6,28.4,7.1,7.6,7.2,3.3,24.5,6.5,178.0,Lab,42.9,57.1,0.0,38.6
Ealing,Ealing,Outer London,351600,132663.0,5554.0,63.3,36.2,21.4,66.8,11.8,4007.0,47.4,India,7.6,Poland,6.4,Ireland,2.3,53.5,33.9,65.2,Poland,Romania,Italy,72.7,81.2,63.8,5.8,3.0,3.0,7.9,15.2,9.1,49.7,31331.0,32185.0,29875.0,"£36,070",32.1,160500.0,13.6,0.7,18700,75.8,75.5,1.9,11.3,430000.0,1335.93,720.0,20.1,30.2,14.3,35.0,30.9,1342.0,40.1,112845,15.0,3.3,62.1,46.0,43.6,13.1,80.6,84.2,17.8,7.3,7.6,7.3,3.6,23.8,6.9,164.0,Lab,17.4,76.8,5.8,41.2
Newham,Newham,Inner London,342900,119172.0,3620.0,94.7,32.1,22.7,70.2,7.0,11182.0,54.1,India,8.7,Bangladesh,6.8,Pakistan,5.3,73.1,41.4,109.6,Romania,India,Bulgaria,66.2,75.0,56.1,9.1,4.1,4.3,8.0,12.7,11.0,43.4,27942.0,30141.0,24006.0,"£28,780",8.4,111100.0,23.1,0.5,11055,70.0,90.8,2.5,12.2,305000.0,1240.54,1440.0,9.4,16.7,31.4,42.5,23.9,1261.0,17.2,61092,5.6,3.9,55.7,42.0,58.8,15.4,78.5,83.0,22.5,7.1,7.4,7.2,3.4,27.6,7.6,193.0,Lab,0.0,100.0,0.0,40.5
Enfield,Enfield,Outer London,333000,130328.0,8083.0,41.2,36.3,22.8,64.4,12.8,3164.0,35.0,Turkey,4.5,Cyprus (Not otherwise specified),3.6,Poland,1.9,42.3,22.9,43.8,Romania,Bulgaria,Poland,73.0,80.4,66.0,3.8,3.3,3.1,9.3,18.4,4.5,43.4,31603.0,35252.0,30222.0,"£33,110",22.4,128800.0,21.7,0.6,13925,74.2,69.4,2.2,12.2,320000.0,1420.17,670.0,25.6,36.2,17.2,21.0,45.6,1245.0,38.5,119653,7.9,3.0,59.9,43.0,55.3,17.4,80.7,84.1,24.6,7.3,7.6,7.3,2.6,25.2,7.0,152.0,Lab,34.9,65.1,0.0,38.2


What if we wanted to only include **innerLondon** boroughs?

In [None]:
boroughs[boroughs["InnerOuter"]=='Inner London']

So we can pass a Boolean into those square brackets to *filter* the data. `pandas` square brackets are clearly a bit more powerful than regular Python square brackets.

### Q6

> Filter the data to show only Outer London boroughs
>
> Apply `sort_values` to give the Outer London boroughs in descending order of population

If you want to combine two Booleans into one filter you'll need to put both into parentheses *for reasons*. For example,

In [None]:
boroughs[(boroughs.InnerOuter=="Inner London") | (boroughs.InnerOuter=="Outer London")]

It might be useful to have this table of *just* the individual boroughs, so let's assign that to a variable `justBoroughs`

In [None]:
justBoroughs = boroughs[(boroughs.InnerOuter=="Inner London") | (boroughs.InnerOuter=="Outer London")]
justBoroughs.head()

## Summary statistics

The dataframe has built in functions for statistical measures like `mean`, `std`, `quantile` but you need to be careful whether using them makes sense.

In [None]:
boroughs.loc['London']['Age']

In [None]:
justBoroughs["Age"].mean()

### Q7

> Why is the mean of the average ages not the same as the London average age?

So use the Inner London, Outer London and London averages rather than applying mean to a column.

## Investigating relationships

We would expect there to be an obvious relationship between unemployment rates and employment rates

In [None]:
justBoroughs.plot.scatter("Employ", "Unemploy")

Let's quantify that by asking for the correlation coefficient

In [None]:
justBoroughs.Employ.corr(justBoroughs.Unemploy)

### Q8

> How would you interpret this?
>
> Why isn't it a perfect correlation?
>
> Look for correlation between some other pairs of variables. Use a scatter plot first, then get the correlation coefficient

In [None]:
justBoroughs.plot.scatter("Employ", "Unemploy", c= "NEET")