# Using `pandas` to fetch data on FOI responses

This notebook details how we can use the `pandas` library to fetch the data on FOI requests and ask it some questions. 

First, we need to import the `pandas` library (naming it `pd` for quicker reference in the code), as well as installing `odfpy`, [which is](https://pypi.org/project/odfpy/) "a library to read and write OpenDocument v. 1.2 files." That will allow us to use the `read_excel()` function to import an ODS file.

In [3]:
#import the pandas library
import pandas as pd
#install the library we need
!pip install odfpy

Collecting odfpy
  Downloading odfpy-1.4.1.tar.gz (717 kB)
[?25l[K     |▌                               | 10 kB 15.6 MB/s eta 0:00:01[K     |█                               | 20 kB 21.5 MB/s eta 0:00:01[K     |█▍                              | 30 kB 27.3 MB/s eta 0:00:01[K     |█▉                              | 40 kB 30.7 MB/s eta 0:00:01[K     |██▎                             | 51 kB 33.8 MB/s eta 0:00:01[K     |██▊                             | 61 kB 28.1 MB/s eta 0:00:01[K     |███▏                            | 71 kB 23.5 MB/s eta 0:00:01[K     |███▋                            | 81 kB 24.6 MB/s eta 0:00:01[K     |████▏                           | 92 kB 25.8 MB/s eta 0:00:01[K     |████▋                           | 102 kB 25.9 MB/s eta 0:00:01[K     |█████                           | 112 kB 25.9 MB/s eta 0:00:01[K     |█████▌                          | 122 kB 25.9 MB/s eta 0:00:01[K     |██████                          | 133 kB 25.9 MB/s eta 0:00:01[K    

## Import the data

Next, import the data. We've copied the direct address of the file itself here (note that this is different to the HTML page that it's linked from).

We've also specified which sheet we want using the `sheet_name=` parameter. In this case it's the 12th sheet, which means it's numbered 11 in Python, where counting begins at 0.

In [6]:
#store the URL of the file
odsurl = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1017270/foi-statistics-q2-2021-statistical-tables.ods"
#read the file at that URL, fetching the 12th sheet
foidata = pd.read_excel(odsurl, sheet_name=11)
#check the first few rows
foidata.head()

Unnamed: 0,Worksheet 10: Exemptions and exceptions applied by monitored bodies when withholding non-routine information requests received from 1 April to 30 June 2021,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25
0,This worksheet contains three tables presented...,,,,,,,,,,,,,,,,,,,,,,,,,
1,Table 10a: Total figures,,,,,,,,,,,,,,,,,,,,,,,,,
2,Government body,Total requests where one or more exemptions / ...,S.22 - Information intended for future publica...,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to...",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducte...,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of publi...,"S.37 - Communications with Her Majesty, etc. a...",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,
3,All monitored bodies,3221,347,2,48,73,70,91,5,6,187,301,84,7,0,296,86,12,148,1475,87,21,265,222,288,
4,Departments of State,1886,258,2,38,69,68,82,5,6,11,151,74,1,0,291,80,11,57,767,70,13,243,50,193,


## Cleaning up the data

Already we hit a problem: the data isn't as clean as we'd like: the headings aren't in the first row - and in fact there are multiple tables on this sheet. 

Let's repeat the code importing the data - but this time add some more parameters specifying that we want to skip some rows to go straight to those with the data only.

In [7]:
#read the file at that URL, fetching the 12th sheet, starting at the 10th row and capturing 20 rows only
foidata = pd.read_excel(odsurl, sheet_name=11, header=9, nrows=20)
#check the first few rows
foidata.head()

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
0,Attorney General's Office,9,0,0,0,0,0,1,0,0,1,2,0,0,0,5,3,0,0,2,1,1,0,0,0,
1,Cabinet Office [note 4],180,41,1,9,14,1,11,1,0,0,11,0,1,0,49,14,7,1,46,9,2,26,0,0,
2,"Department for Business, Energy and Industrial...",108,12,0,0,4,0,4,0,1,1,9,0,0,0,16,3,0,1,24,3,0,28,2,37,
3,"Department for Digital, Culture, Media and Sport",39,5,0,1,0,0,0,0,0,4,0,0,0,0,10,7,1,0,17,1,0,5,0,0,
4,Department for Education [note 4],124,13,0,0,0,0,0,0,0,0,4,0,0,0,24,15,0,0,81,7,3,5,3,4,


Let's check the last few rows too.

In [31]:
#show the last 5 rows
foidata.tail()

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
15,Ministry of Justice,235,29,0,0,1,0,1,0,0,0,12,73,0,0,12,4,0,3,136,5,0,13,33,1,
16,Northern Ireland Office,23,22,0,1,2,0,0,0,0,0,1,0,0,0,5,0,0,0,12,0,0,0,0,0,
17,Scotland Office,3,0,0,0,0,0,1,2,1,0,1,0,0,0,3,0,0,0,3,0,0,1,0,0,
18,UK Export Finance,9,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,3,0,14,
19,Wales Office,4,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,4,0,0,0,0,0,


## Asking a question

What question do we want to ask of the data? Who used the international relations exemption the most?

Which column is that in? We can use `.columns` to see.

In [32]:
#show the column names
foidata.columns

Index(['Government body',
       'Total requests where one or more exemptions / exceptions were applied [note 18] [note 19]',
       'S.22 - Information intended for future publication',
       'S. 22A - Research intended for future publication',
       'S.23 - Information supplied by, or relating to, bodies dealing with security matters',
       'S.24 - National security', 'S.26 - Defence',
       'S.27 - International relations',
       'S.28 - Relations within the United Kingdom', 'S.29 - The economy',
       'S.30 - Investigations and proceedings conducted by public authorities',
       'S.31 - Law enforcement', 'S.32 - Court records, etc.',
       'S.33 - Audit functions', 'S.34 - Parliamentary privilege',
       'S.35 - Formulation of Government policy, etc.',
       'S.36 - Prejudice to effective conduct of public affairs',
       'S.37 - Communications with Her Majesty, etc. and honours',
       'S.38 - Health and Safety', 'S.40 - Personal information',
       'S.41 - Informati

It looks like column 7 (index 6), but we can check by adding an index.

In [33]:
#show the 7th column name
foidata.columns[6]

'S.26 - Defence'

Ah, no - actually there are two in our list that were shown close together. Let's try the 8th (index 7).

In [34]:
#show the 8th column name
foidata.columns[7]

'S.27 - International relations'

## Sorting by one column

There are [a number of ways to sort a pandas dataframe](https://realpython.com/pandas-sort-python/). We will use the method `.sort_values()`, specifying the name of the column we want to sort by.

It's best to copy and paste that name rather than typing it out manually. 

In [35]:
#sort by the column specified - it will default to ascending order
foidata.sort_values('S.27 - International relations')

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
9,Department of Health and Social Care,128,25,0,0,0,0,0,0,0,0,31,1,0,0,43,1,0,0,34,2,1,33,0,1,
16,Northern Ireland Office,23,22,0,1,2,0,0,0,0,0,1,0,0,0,5,0,0,0,12,0,0,0,0,0,
14,"Ministry of Housing, Communities and Local Gov...",105,7,0,0,0,0,0,0,0,0,0,0,0,0,16,2,0,5,21,0,1,4,0,60,
8,Department for Work and Pensions,99,19,0,0,3,0,0,0,0,0,13,0,0,0,6,0,1,0,53,5,0,8,4,1,
6,Department for International Trade,46,6,0,0,0,8,0,0,0,0,3,0,0,0,8,3,1,0,25,7,0,11,0,0,
19,Wales Office,4,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,4,0,0,0,0,0,
4,Department for Education [note 4],124,13,0,0,0,0,0,0,0,0,4,0,0,0,24,15,0,0,81,7,3,5,3,4,
3,"Department for Digital, Culture, Media and Sport",39,5,0,1,0,0,0,0,0,4,0,0,0,0,10,7,1,0,17,1,0,5,0,0,
7,Department for Transport [note 4],163,17,0,2,2,0,1,0,0,2,14,0,0,0,29,8,0,2,79,9,1,23,5,7,
18,UK Export Finance,9,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,3,0,14,


### Sorting in descending order

The results at the top have 0 instances of using this exemption, because it's sorted in **ascending** order (smallest to largest).

To specify we want it in **descending** order (or rather that we *don't* want it in ascending order), we need to add an extra parameter, `ascending=`, and set it to `False`.

In [36]:
#sort by the column specified - specifying we don't want it in ascending order
foidata.sort_values('S.27 - International relations', ascending=False)

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
10,"Foreign, Commonwealth and Development Office [...",48,8,1,9,15,2,28,1,0,0,3,0,0,0,6,0,1,9,32,4,1,6,0,0,
12,Home Office,196,23,0,6,12,0,14,0,0,1,26,0,0,0,21,16,0,7,77,2,1,14,2,0,
13,Ministry of Defence [note 4],190,18,0,8,15,57,13,0,0,2,7,0,0,0,8,2,0,29,62,3,2,47,1,0,
1,Cabinet Office [note 4],180,41,1,9,14,1,11,1,0,0,11,0,1,0,49,14,7,1,46,9,2,26,0,0,
2,"Department for Business, Energy and Industrial...",108,12,0,0,4,0,4,0,1,1,9,0,0,0,16,3,0,1,24,3,0,28,2,37,
5,"Department for Environment, Food and Rural Aff...",98,2,0,1,0,0,4,0,0,0,4,0,0,0,6,1,0,0,23,3,0,5,0,60,
11,HM Treasury [note 4],79,10,0,1,1,0,3,1,4,0,9,0,0,0,22,1,0,0,36,9,0,11,0,8,
18,UK Export Finance,9,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,3,0,14,
17,Scotland Office,3,0,0,0,0,0,1,2,1,0,1,0,0,0,3,0,0,0,3,0,0,1,0,0,
15,Ministry of Justice,235,29,0,0,1,0,1,0,0,0,12,73,0,0,12,4,0,3,136,5,0,13,33,1,


## Saving the results

So far we've only *shown* the results of sorting the data. To *save* it we need to assign it to a variable.

That can be a new variable - or we can assign it to the *same* variable, in effect overwriting it with a sorted version.

In [7]:
#sort by the column specified - specifying we don't want it in ascending order
foidata = foidata.sort_values('S.27 - International relations', ascending=False)

#show the first 5 rows
foidata.head()

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
10,"Foreign, Commonwealth and Development Office [...",48,8,1,9,15,2,28,1,0,0,3,0,0,0,6,0,1,9,32,4,1,6,0,0,
12,Home Office,196,23,0,6,12,0,14,0,0,1,26,0,0,0,21,16,0,7,77,2,1,14,2,0,
13,Ministry of Defence [note 4],190,18,0,8,15,57,13,0,0,2,7,0,0,0,8,2,0,29,62,3,2,47,1,0,
1,Cabinet Office [note 4],180,41,1,9,14,1,11,1,0,0,11,0,1,0,49,14,7,1,46,9,2,26,0,0,
2,"Department for Business, Energy and Industrial...",108,12,0,0,4,0,4,0,1,1,9,0,0,0,16,3,0,1,24,3,0,28,2,37,


## Exporting the results

Now we can export that as a CSV into the *Files* area on the left.

In [8]:
#export the results
foidata.to_csv("s27exemptionuses.csv")

## Putting whole numbers into context: adding a new column of percentages

Just because one department uses an exemption the most it doesn't necessarily mean that it's the biggest user of that exemption: it might simply be that it gets, or declines, the most FOI requests overall.

To check that, then, we need to divide the number of refusals under a particular exemption by the overall number of refusals, or FOI requests. 

Let's see if either data is in our table.

In [9]:
#show the column names
foidata.columns

Index(['Government body',
       'Total requests where one or more exemptions / exceptions were applied [note 18] [note 19]',
       'S.22 - Information intended for future publication',
       'S. 22A - Research intended for future publication',
       'S.23 - Information supplied by, or relating to, bodies dealing with security matters',
       'S.24 - National security', 'S.26 - Defence',
       'S.27 - International relations',
       'S.28 - Relations within the United Kingdom', 'S.29 - The economy',
       'S.30 - Investigations and proceedings conducted by public authorities',
       'S.31 - Law enforcement', 'S.32 - Court records, etc.',
       'S.33 - Audit functions', 'S.34 - Parliamentary privilege',
       'S.35 - Formulation of Government policy, etc.',
       'S.36 - Prejudice to effective conduct of public affairs',
       'S.37 - Communications with Her Majesty, etc. and honours',
       'S.38 - Health and Safety', 'S.40 - Personal information',
       'S.41 - Informati

### Calculating percentages

We have `'Total requests where one or more exemptions / exceptions were applied'`, so let's use that to add some context.

To calculate a percentage proportion, we need to divide the part by the whole. The result will be a decimal, e.g. "5 out of 10" is 5 divided by 10, which will give you a result of 0.5. When formatted as a percentage this will show as 50% ("0.5 of 1").

We don't have to do that for each row individually, however. Instead, we can just say 'divide this column by that column' and `pandas` will divide each item in the first column by the equivalent item in the same position in the second list, based on its position (1st item in one list divided by the 1st item in the other list, and so on).

The results will be another list, of the same length as our two columns.

In [8]:
#divide one column by another
foidata['S.27 - International relations']/foidata['Total requests where one or more exemptions / exceptions were applied [note 18] [note 19]']

0     0.111111
1     0.061111
2     0.037037
3     0.000000
4     0.000000
5     0.040816
6     0.000000
7     0.006135
8     0.000000
9     0.000000
10    0.583333
11    0.037975
12    0.071429
13    0.068421
14    0.000000
15    0.004255
16    0.000000
17    0.333333
18    0.111111
19    0.000000
dtype: float64

### Assigning the results of the calculation to a new column

Now we just need to add those results to our dataframe as a new column. 

To do that, we name the column as we would an existing column, by putting its name as a string inside square brackets after the name of the dataframe, like so:

`foidata['percentage s27']`

Then, an equals sign and the data we want to put into that column.

If the column doesn't exist (as it doesn't here), it will be created.

In [10]:
#add new column showing percentage of refused requests under s27
foidata['percentage s27'] = foidata['S.27 - International relations']/foidata['Total requests where one or more exemptions / exceptions were applied [note 18] [note 19]']
#show first few rows - the new column will be the last one
foidata.sort_values('percentage s27', ascending=False)

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25,percentage s27
10,"Foreign, Commonwealth and Development Office [...",48,8,1,9,15,2,28,1,0,0,3,0,0,0,6,0,1,9,32,4,1,6,0,0,,0.583333
17,Scotland Office,3,0,0,0,0,0,1,2,1,0,1,0,0,0,3,0,0,0,3,0,0,1,0,0,,0.333333
18,UK Export Finance,9,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,3,0,14,,0.111111
0,Attorney General's Office,9,0,0,0,0,0,1,0,0,1,2,0,0,0,5,3,0,0,2,1,1,0,0,0,,0.111111
12,Home Office,196,23,0,6,12,0,14,0,0,1,26,0,0,0,21,16,0,7,77,2,1,14,2,0,,0.071429
13,Ministry of Defence [note 4],190,18,0,8,15,57,13,0,0,2,7,0,0,0,8,2,0,29,62,3,2,47,1,0,,0.068421
1,Cabinet Office [note 4],180,41,1,9,14,1,11,1,0,0,11,0,1,0,49,14,7,1,46,9,2,26,0,0,,0.061111
5,"Department for Environment, Food and Rural Aff...",98,2,0,1,0,0,4,0,0,0,4,0,0,0,6,1,0,0,23,3,0,5,0,60,,0.040816
11,HM Treasury [note 4],79,10,0,1,1,0,3,1,4,0,9,0,0,0,22,1,0,0,36,9,0,11,0,8,,0.037975
2,"Department for Business, Energy and Industrial...",108,12,0,0,4,0,4,0,1,1,9,0,0,0,16,3,0,1,24,3,0,28,2,37,,0.037037


We can see that the FCO does indeed have the highest percentage of refusals under that exemption – but also that the Scotland Office has the second (largely due to the small number of refusals it has: only 3).

We can now export the updated dataset and overwrite the previous version.

In [None]:
#export the results
foidata.to_csv("s27exemptionuses.csv")

## Error messages: problems with mixed data

If you get an error when trying to do this, it may be because you have a mix of numeric and text data, which prevents it from sorting.

Below I've shown a situation where this happens. 

This time, we import two tables from the spreadsheet that sit above each other, by omitting the `nrows=` parameter which limited us to one table before.



In [37]:
#read the file at that URL, fetching the 12th sheet, starting at the 10th row 
foidata = pd.read_excel(odsurl, sheet_name=11, header=9)
#check the first few rows
foidata.head()

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
0,Attorney General's Office,9,0,0,0,0,0,1,0,0,1,2,0,0,0,5,3,0,0,2,1,1,0,0,0,
1,Cabinet Office [note 4],180,41,1,9,14,1,11,1,0,0,11,0,1,0,49,14,7,1,46,9,2,26,0,0,
2,"Department for Business, Energy and Industrial...",108,12,0,0,4,0,4,0,1,1,9,0,0,0,16,3,0,1,24,3,0,28,2,37,
3,"Department for Digital, Culture, Media and Sport",39,5,0,1,0,0,0,0,0,4,0,0,0,0,10,7,1,0,17,1,0,5,0,0,
4,Department for Education [note 4],124,13,0,0,0,0,0,0,0,0,4,0,0,0,24,15,0,0,81,7,3,5,3,4,


Now when we try to run the same code sorting by the specified column, we get an error.

In [38]:
#sort by the column specified - it will default to ascending order
foidata.sort_values('S.27 - International relations')

TypeError: ignored

### Checking a column to see the types of data

The error is a `TypeError` which should give you a clue that the cause relates to the type of data being dealt with. 

Let's have a look at that column to see what type of data it contains.

In [39]:
foidata['S.27 - International relations']

0                                  1
1                                 11
2                                  4
3                                  0
4                                  0
5                                  4
6                                  0
7                                  1
8                                  0
9                                  0
10                                28
11                                 3
12                                14
13                                13
14                                 0
15                                 1
16                                 0
17                                 1
18                                 1
19                                 0
20                               NaN
21                               NaN
22    S.27 - International relations
23                                 0
24                                 0
25                                 0
26                                 0
2

### Slicing to test a potential cause

You can see that rows 20 and 21 contain `NaN` and row 22 contains 'S.27 - International relations' - it's the heading in that column for the second table.

And it's also a string - unlike the integers in the rest of the column.

Is this the cause of the problem? We can test that hypothesis by 'slicing' the data to just the first 23 items and seeing if using `.sort_values()` generates the same error.

In [29]:
#sort the first 23 items in the column
foidata['S.27 - International relations'][0:22].sort_values()

9       0
16      0
14      0
8       0
6       0
19      0
4       0
3       0
7       1
18      1
15      1
17      1
0       1
11      3
2       4
5       4
1      11
13     13
12     14
10     28
20    NaN
21    NaN
Name: S.27 - International relations, dtype: object

This time it runs fine, without an error. Expand it to index 23 to include the string, however, and this happens.

In [40]:
#sort the first 24 items in the column
foidata['S.27 - International relations'][0:23].sort_values()

TypeError: ignored

### 'Dropping' a row from a dataframe

The solution to this problem, then, is either to narrow our analysis to the first table, or remove the headings (the strings) of the second table.

For some reason the index here is one less than we saw before: 22, rather than 23. It could be that this is because it's not including the column names as index 0 - but either way it's a good example of trial and error.

In [44]:
#show what dropping index 22 would look like
foidata.drop(22)

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
0,Attorney General's Office,9.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,5.0,3.0,0.0,0.0,2.0,1.0,1.0,0.0,0.0,0.0,
1,Cabinet Office [note 4],180.0,41.0,1.0,9.0,14.0,1.0,11.0,1.0,0.0,0.0,11.0,0.0,1.0,0.0,49.0,14.0,7.0,1.0,46.0,9.0,2.0,26.0,0.0,0.0,
2,"Department for Business, Energy and Industrial...",108.0,12.0,0.0,0.0,4.0,0.0,4.0,0.0,1.0,1.0,9.0,0.0,0.0,0.0,16.0,3.0,0.0,1.0,24.0,3.0,0.0,28.0,2.0,37.0,
3,"Department for Digital, Culture, Media and Sport",39.0,5.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,10.0,7.0,1.0,0.0,17.0,1.0,0.0,5.0,0.0,0.0,
4,Department for Education [note 4],124.0,13.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,24.0,15.0,0.0,0.0,81.0,7.0,3.0,5.0,3.0,4.0,
5,"Department for Environment, Food and Rural Aff...",98.0,2.0,0.0,1.0,0.0,0.0,4.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,6.0,1.0,0.0,0.0,23.0,3.0,0.0,5.0,0.0,60.0,
6,Department for International Trade,46.0,6.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,8.0,3.0,1.0,0.0,25.0,7.0,0.0,11.0,0.0,0.0,
7,Department for Transport [note 4],163.0,17.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,2.0,14.0,0.0,0.0,0.0,29.0,8.0,0.0,2.0,79.0,9.0,1.0,23.0,5.0,7.0,
8,Department for Work and Pensions,99.0,19.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,13.0,0.0,0.0,0.0,6.0,0.0,1.0,0.0,53.0,5.0,0.0,8.0,4.0,1.0,
9,Department of Health and Social Care,128.0,25.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,31.0,1.0,0.0,0.0,43.0,1.0,0.0,0.0,34.0,2.0,1.0,33.0,0.0,1.0,


### Replace the data with the 'cleaned' version and sort

Now we know it works, we can test it out.

In [47]:
#replace our dataset with one missing row index 22
foidata = foidata.drop(22)
#sort by the column specified - not in ascending order
foidata.sort_values('S.27 - International relations', ascending=False)

Unnamed: 0,Government body,Total requests where one or more exemptions / exceptions were applied [note 18] [note 19],S.22 - Information intended for future publication,S. 22A - Research intended for future publication,"S.23 - Information supplied by, or relating to, bodies dealing with security matters",S.24 - National security,S.26 - Defence,S.27 - International relations,S.28 - Relations within the United Kingdom,S.29 - The economy,S.30 - Investigations and proceedings conducted by public authorities,S.31 - Law enforcement,"S.32 - Court records, etc.",S.33 - Audit functions,S.34 - Parliamentary privilege,"S.35 - Formulation of Government policy, etc.",S.36 - Prejudice to effective conduct of public affairs,"S.37 - Communications with Her Majesty, etc. and honours",S.38 - Health and Safety,S.40 - Personal information,S.41 - Information provided in confidence,S.42 - Legal professional privilege,S.43 - Commercial interests,S.44 - Prohibitions on disclosure,All EIR exemptions,Unnamed: 25
10,"Foreign, Commonwealth and Development Office [...",48.0,8.0,1.0,9.0,15.0,2.0,28.0,1.0,0.0,0.0,3.0,0.0,0.0,0.0,6.0,0.0,1.0,9.0,32.0,4.0,1.0,6.0,0.0,0.0,
12,Home Office,196.0,23.0,0.0,6.0,12.0,0.0,14.0,0.0,0.0,1.0,26.0,0.0,0.0,0.0,21.0,16.0,0.0,7.0,77.0,2.0,1.0,14.0,2.0,0.0,
13,Ministry of Defence [note 4],190.0,18.0,0.0,8.0,15.0,57.0,13.0,0.0,0.0,2.0,7.0,0.0,0.0,0.0,8.0,2.0,0.0,29.0,62.0,3.0,2.0,47.0,1.0,0.0,
1,Cabinet Office [note 4],180.0,41.0,1.0,9.0,14.0,1.0,11.0,1.0,0.0,0.0,11.0,0.0,1.0,0.0,49.0,14.0,7.0,1.0,46.0,9.0,2.0,26.0,0.0,0.0,
32,National Archives,294.0,1.0,0.0,9.0,2.0,2.0,9.0,0.0,0.0,0.0,13.0,0.0,0.0,0.0,0.0,2.0,1.0,88.0,259.0,6.0,0.0,3.0,12.0,2.0,
2,"Department for Business, Energy and Industrial...",108.0,12.0,0.0,0.0,4.0,0.0,4.0,0.0,1.0,1.0,9.0,0.0,0.0,0.0,16.0,3.0,0.0,1.0,24.0,3.0,0.0,28.0,2.0,37.0,
5,"Department for Environment, Food and Rural Aff...",98.0,2.0,0.0,1.0,0.0,0.0,4.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,6.0,1.0,0.0,0.0,23.0,3.0,0.0,5.0,0.0,60.0,
11,HM Treasury [note 4],79.0,10.0,0.0,1.0,1.0,0.0,3.0,1.0,4.0,0.0,9.0,0.0,0.0,0.0,22.0,1.0,0.0,0.0,36.0,9.0,0.0,11.0,0.0,8.0,
18,UK Export Finance,9.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,14.0,
17,Scotland Office,3.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,
