# Lists, dictionaries - and functions

In this notebook we look at two more common types of variables: **lists** and **dictionaries**, how to recognise them, how to create them, and what you might use them for as a journalist.

## Lists are like columns of data

A list is a type of variable that allows you to store *multiple* values. It might be a list of numbers, a list of strings, a list of `True/False` values, or a mix of those. 

You can even have a list of lists, but that's a bit too mind-bending to get into now.

To create a list, you need to put **square brackets** around your list of items, and **separate each one with a comma**, like so:

In [None]:
#this is a list of numbers
refusals = [1, 11, 4, 0]
#this is a list of strings
orgs = ["AGO", "Cabinet Office", "DBEIS", "DCMS"]
#this is a list of Booleans
dept_tf = [False, False, True, True]

Those square brackets will also show when a list is printed - so it's a key way to recognise that you're dealing with a list (which might be the case if you've imported some data and then extracted one column).

In [None]:
print(dept_tf)

[False, False, True, True]


The data in the first two lists is from the [Freedom of Information statistics: April to June 2021 bulletin](https://www.gov.uk/government/statistics/freedom-of-information-statistics-april-to-june-2021/freedom-of-information-statistics-april-to-june-2021-bulletin) - specifically [the data tables](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1017270/foi-statistics-q2-2021-statistical-tables.ods) in the sheet called '10_Exemptions'.

Storing a series of data points in a list like this allows us to do a number of things:

* We can perform a repetitive action on each item (e.g. divide each one by a total to get a percentage)
* We can perform a calculation with *all* items (e.g. adding up all the refusals)
* We can sort items in a list (e.g. largest to smallest)
* We can filter the list (e.g. those that match a particular condition)
* We can combine it with another list to create a table (for example we can combine the number of refusals with the organisation in the same position to create a two-column table)
* We can access items at a particular position (e.g. the last item)

Lists are also often used in scrapers: for example, to scrape multiple pages you will need to store the URLs of each page in a list; equally if you extract data from a page you may need to store that in a number of lists (and then combine them to create a table).

## Dictionaries are like rows of data (with headings)

A dictionary is a type of variable that also allows you to store a list of values - but, along with those values, a list of *labels* (called **keys**).

These are called **key-value pairs**. Here's an example of what one of those pairs looks like: 

  `"name" : "Paul"` 

You can see 'name' is a **key**, or a label, for the value that follows. The value of the name is 'Paul'.

Here's another with a number:

`"age" : 18`

You might say here that `18` is the **value** of 'age'.

A dictionary allows you to store a series of these key-value pairs, much like a row of data that includes the headings (keys) of each row. 

To create a dictionary you use **curly brackets** to contain all your key-value pairs, and **commas to separate each pair**. 

Note also that a key-value pair needs to start with a **string naming the key**, followed by a **colon**, and then the value (which can be a string, a number, a Boolean, or even a list or dictionary).

Let's create one:

In [None]:
co_refusals = {"Government body" : "Cabinet Office", "Total requests where one or more exemptions / exceptions were applied" : 180, "S.27 - International relations" : 11, "dept_tf" : False}

If you want to make it easier to read, you can press Enter after each comma and the dictionary will continue, indented on the next line. The code will still work fine. 

In [None]:
co_refusals = {"Government body" : "Cabinet Office", 
               "Total requests where one or more exemptions / exceptions were applied" : 180, 
               "S.27 - International relations" : 11, 
               "dept_tf" : False}

Again, if you print a dictionary the curly brackets are the big giveaway that it's a dictionary.

In [None]:
print(co_refusals)

{'Government body': 'Cabinet Office', 'Total requests where one or more exemptions / exceptions were applied': 180, 'S.27 - International relations': 11, 'dept_tf': False}


In this case we've stored the data from one of the rows in the FOI data, with the column headings used as keys (apart from the last one, which we've created ourselves). 

This is quite common when importing data in Python, as we'll see - but you'll often want to change the keys to something more succinct. 

It's also quite common to use dictionaries to build up a table of data (a 'dataframe'), by starting with an empty dataframe and then using a dictionary to add a new row, and repeating the process for each row.

We will come back to this in later notebooks.

## Using functions

Lists and dictionaries provide a good opportunity to introduce **functions** in Python.

You've probably used functions already, in spreadsheets. For example, `SUM`, `AVERAGE` and `COUNT` are all widely-used functions in spreadsheets which you might use to add up a column of numbers, count how many there are, or calculate an average. You might have used more powerful functions like `VLOOKUP` to combine data, too. 

Python uses many of the same, or similar, functions too. Here, for example, we use Python's `sum` function to add up all the numbers in the list we created earlier.

In [None]:
sum(refusals)

16

Functions are basically **recipes**: a series of steps you might want to repeat more than once. In the case of `sum()` that recipe is "add all the numbers together".

A function is **always followed by parentheses** - even if those parentheses are empty. This is one of the ways to recognise that a word is a function.

And those parentheses **contain any ingredients the function needs**. In the case of `sum` those ingredients are the numbers that you want to add up, (or a list of numbers).

If you want to supply more than one ingredient **each ingredient is separated by a comma**.

Note that, like variables, functions in Python are **case-sensitive** and generally all **lower-case**. So while `sum()` will work, `SUM()` and `Sum()` will not because *those are not the names of the function*.

In [None]:
SUM(refusals)

NameError: ignored

In [None]:
Sum(refusals)

NameError: ignored

Pay attention to the details that the error gives: this is a `NameError` and it says `name 'Sum' is not defined` so we know that `Sum` is causing the problem, and we might pay more attention to making sure we've typed it correctly (including whether letters need to be upper or lower case).

You can also google `NameError` to find out more about it - and how to fix the problem. Some results will be easier to understand than others, so try a few. [This result](https://www.articledesk.net/python-nameerror-name-is-not-defined/) puts it clearer than most: 

> "A NameError is raised when you try to use a variable or a function name that is not valid."

### Functions only work on the type of values they're designed for

Now we come onto one of the reasons I've spent so much time talking about different variable types: if you try to use a function on a type of variable it's not designed for, then it will create an error.

Here's an example:

In [None]:
#create a new list, with a string in it
newlist = [1, "11", 4, 0]
#try to use sum on that list
sum(newlist)

TypeError: ignored

In this case the list contains 3 numbers, and a string. But the `sum()` function can't add numbers and text, so it throws an error. 

This error, by the way, is similar to the one when we tried to divide a string by a number in a previous notebook: a `TypeError`. 

This time it specifies the `+` operator - although we haven't used the plus sign ourselves we can guess that the `sum()` function *is* using the `+` operator in some way: when you use a function you are basically using the code that someone has written for that function. (In later notebooks you will learn how to create your own functions).

In this sense Python is less tolerant than a spreadsheet tool like Excel, which simply ignores text when calculating a total. 

But this also means you don't miss problems like a number stored as text (which would also be ignored by Excel in any calculations).

Another difference between Excel and Python, when it comes to the `sum()` function, is that [it will only work on lists and similar types of 'iterable' variables](https://www.w3schools.com/python/ref_func_sum.asp) (this means variables that contain multiple items in a similar list format). So this won't work, either:

In [None]:
sum(1, 11, 4, 0)

TypeError: ignored

By the way, you might notice that error saying that `sum` "expected at most 2 arguments". This refers to the number of ingredients a function needs (the ingredients are called **arguments**). In this case we gave four arguments - four numbers - but actually those numbers need to be contained within one thing: a list, for example. And that thing would be the first argument supplied.

*(The `sum()` function can also accept a second, optional argument, too, which is a number to add to whatever list of numbers you are supplying. But most of the time that second argument isn't used.)*

### Checking the type of a variable: the `type()` function

Having introduced functions it's worth listing some of the functions you might find useful. 

For example, how do we tell what type a variable is? With the `type()` function!

In [None]:
type(refusals)

list

### Converting strings to numbers and vice versa

And I mentioned in a previous notebook that it's possible to convert numbers to strings and vice versa. You can do that by:

* Putting the string you want to convert inside the function `int()` 
* Putting the number you want to convert to a string inside the function `str()`.

Below we do just that, first creating a string variable, then converting it, and checking the variable type using `type()` at each stage.


In [None]:
#store string '11'
my_number = "11"
#print the type
print(type(my_number))
#convert the string '11' to an integer
a_new_number = int(my_number)
#print the type
print(type(a_new_number))

<class 'str'>
<class 'int'>


There are plenty of other useful functions, too, which we'll come across as we'll go along - such as `len()` which tells you the length of a variable (the number of characters in a string, or the number of items in a list). Again, this will throw an error if you try to use it on a type of variable that it can't work with, like a number.

In [None]:
len(8)

TypeError: ignored

## Accessing more functions: libraries and user-defined functions

The functions we've used so far are called **built-in functions**: they are built into Python and can be used at any time. 

But there are two other types of functions you can use, too:

* User-defined functions 
* Functions from libraries

User-defined functions are functions that you create yourself in the code. The `def` command is used to do this, and we'll be creating our own functions in future notebooks with that.

Library functions have been created by other people and stored in a special collection of code called a **library**, which you can import in order to use its functions. We'll look at those in the next notebook...