# Sequences and mappings

In the [previous lesson](types.ipynb), we learned about some of the different data [types](extras/glossary.md#type) that Python recognizes, and we saw that Python treats each of them differently. In this lesson we will continue learning about types, but we will focus on a few particuular types that are especially useful for many programming tasks, and whose behavior it is worth considering in more detail.

## Sequences

We have already met both [tuples](extras/glossary.md#tuple) and [lists](extras/glossary.md#list).

In [1]:
shopping_list = ['eggs', 'bacon', 'black pudding', 'sausages', 'bread', 'Mazola', 'gravy mix']
shopping_tuple = ('eggs', 'bacon', 'black pudding', 'sausages', 'bread', 'Mazola', 'gravy mix')

Tuples and lists are both [sequences](extras/glossary.md#sequence). That is, they can contain more than one value, and the values are arranged in order. It is the order of the values that allows us to pick them out individually, using positions in the tuple or list as [indices](extras/glossary.md#index). (Don't forget Python's zero-based indexing system.)

In [2]:
shopping_list[0]

'eggs'

### Indexing

Let's begin by looking at a few more ways of using indices to get items from a sequence. We will take a list as our example, but bear in mind that the examples work for tuples as well (you can create yourself a tuple in the Spyder console and check if you like).

As well as literal numbers, we can use [variables](extras/glossary.md#variable) as indices. In this case, the value of the variable is used as the index.

In [3]:
item_num = 1

shopping_list[item_num]

'bacon'

Indices must be [integers](extras/glossary.md#integer). This makes sense; it is unclear what contents would be stored in 'entry two-and-a-half'.

In [4]:
item_num = 2.5

shopping_list[item_num]

TypeError: list indices must be integers or slices, not float

Note the content of the error message above. Python does not accept [float](extras/glossary.md#float) values as indices. This is the case even if the float represents a whole number.

In [5]:
item_num = 2.0

shopping_list[item_num]

TypeError: list indices must be integers or slices, not float

We saw in the last lesson that for basic arithmetic it often doesn't matter whether we represent numbers as integers or as floats; Python will convert them as necessary. But indexing is a situation in which we definitely need integers.

So if for some reason we end up with a float variable, but we want to use it as an index, we must first convert it to an integer.

In [6]:
item_num = 2.0
item_num = int(item_num)

shopping_list[item_num]

'black pudding'

When might we end up with a float value while working with indices? One semi-common case in which this can occur is if we want to get entries that are part way through the sequence, for example half way. In order to find the halfway point, we can divide the length of the sequence by two. But this will result in a non-whole number if the sequence has an odd number of items (and in fact, as we have seen, the result of division is always a float even if the result is a whole number).

In [7]:
n_items = len(shopping_list)
halfway = n_items / 2

print(halfway)

type(halfway)

3.5


float

So we must convert the result back into an integer in order to use it as an index. This also 'rounds down' the number to its whole part.

In [8]:
halfway = int(halfway)

print(halfway)

shopping_list[halfway]

3


'sausages'

Using negative numbers, we can get entries counting from the end of a sequence rather than from the beginning. So entry `-1` is the last entry, `-2` is the second from last entry, and so on.

In [9]:
shopping_list[-1]

'gravy mix'

What if we use an index that goes beyond the length of the sequence? As with many such questions, the best response is to try it out and see. Then we will know what to look out for if we suspect we have made this mistake in one of our programs. The content of the error message is pretty clear:

In [10]:
shopping_list[9000]

IndexError: list index out of range

### Slicing

We can also get multiple entries from a sequence. This is known as taking a '[slice](extras/glossary.md#slice)' of the sequence. We use the same square parentheses we always use for indexing, but this time they contain a range of indices, where the beginning and end of the range are separated by the colon character `:`.

So to get entries `1` and `2` (in the zero-based numbering system):

In [11]:
shopping_list[1:3]

['bacon', 'black pudding']

Wait a moment! You might have been expecting `[1:3]` to give you entries `1` to `3`, inclusive. If so, you are a normal, healthy person and your worldview is entirely valid. However, Python understands the end of a slice as meaning 'up-to-but-not-including'. So `[1:3]` means 'starting from `1` up to (but not including) `3`'.

If you are still reeling from the shock of the zero-based numbering system, this new quirk may seem like a slap in the face with a wet kipper. However, there is a logic to it that you may eventually come to love, and that can be explained with the help of the fancy infographic below.

![](images/slicing.png)

The indices in Python's indexing system can be understood as referring to the 'dividing' tick marks on a ruler. Indices refer not to the items in a sequence themselves but to the 'boundaries' between the items. So when we ask for slice `[1:3]` we get the items between tick mark `1` and tick mark `3` on the ruler. As you can see in the image above, this gets us the second and third items.

Single indices can be understood in a similar way. When we ask for item `0` in a sequence, we get the item that immediately *follows* that tick mark on the ruler, which is the first item. The second item comes after tick mark `1`, and so on. This is similar to the way that the floors of a building are numbered (at least in most countries). A floor is the part of the building immediately *above* the actual physical platform that divides the building into vertical sections. So the '1st floor' is the space above the first dividing platform. Below that is the 'ground floor', so called because it is located immediately above the ground. (We could also call the ground floor the '[zeroth](https://en.wikipedia.org/wiki/0th)' floor).

In [12]:
floors = ['ground', '1st', '2nd', '3rd']

floors[0:2]

'1st'

If the beginning of our desired slice is the beginning of the whole sequence, we can omit the beginning index before the colon. So `[:3]` means simply 'up to but not including `3`'.

In [13]:
shopping_list[:3]

['eggs', 'bacon', 'black pudding']

And similarly if the end of our desired slice is the end of the sequence.

In [14]:
shopping_list[3:]

['sausages', 'bread', 'Mazola', 'gravy mix']

Note that in the two last examples above, we were able to use the same index (`3`) both to end one slice and to begin another one, without the two slices 'overlapping' (i.e. both containing item `3`). This is one of the subtle advantages of Python's interpretation of indices as 'dividing lines'.

### Comprehensions

We have seen that the square parentheses can be used to create a list. The items in the list are separated by commas. Often, we will want to create a list without having to write out every individual item explicitly. If there is some regular pattern to the items in our desired list, we can generate them based on that pattern. For example, the pattern might be that the items in our desired list are all related to the items in another list in some consistent way.

We might want to know the length of each item in our shopping list, and to store those lengths in a new list. One way to do this would be to apply the `len()` function to each item separately:

In [15]:
item_lengths = [len(shopping_list[0]), len(shopping_list[1]), len(shopping_list[2])] # ... and so on

But this quickly gets tedious (as you can see I couldn't even be bothered to finish the full example above). We are also liable to make small mistakes such as missing out one item or putting them in the wrong order. Python provides an alternative. Take a look at the command below:

In [16]:
item_lengths = [len(x) for x in shopping_list]

item_lengths

[4, 5, 13, 8, 5, 6, 9]

This way of constructing a list is known as a 'list [comprehension](extras/glossary.md#comprehension)'. Instead of writing out each item explicitly, we give a sort of 'formula' for creating the items. The basic [syntax](extras/glossary.md#syntax) for a list comprehension is as follows:

* pick a variable name, any variable name (for example `x`)
* write:
  - `for`
  - then your chosen variable name
  - then `in`
  - then the name of the existing list that your new list is based on (for example our `shopping_list`)
* to the left of this, write your chosen variable name again
* do to this variable whatever it is that creates an item in your new list (for example apply the `len()` function)
* enclose the whole thing in square parentheses

This sort of formula is almost human-readable, if you are a particular kind of human. Our example above says: 'For every item in the shopping list, give me the length of that item (and put all these lengths in a new list)'. We can (and should) make our programs a bit more human-readable by picking more descriptive variable names than `x`. For example:

In [17]:
item_lengths = [len(item) for item in shopping_list]

item_lengths

[4, 5, 13, 8, 5, 6, 9]

List comprehensions can be an extremely useful tool for manipulating data. So here is another example for you to check your comprehension of comprehensions:

In [18]:
item_initials = [item[0] for item in shopping_list]

item_initials

['e', 'b', 'b', 's', 'b', 'M', 'g']

Though we may choose the name of the variable in a list comprehension (for example `x` or `item`), the roles of the words `for` and `in` are fixed. These are [keywords](extras/glossary.md#keyword) for Python; each has a particular effect on the workings of our program, and we must use that specific word if we want that effect. We will learn more about keywords in later lessons.

### Range

Before we leave sequences behind, let's consider one more data type, a new one this time. Python provides a [built-in](extras/glossary.md#builtin) function `range()` that generates a sequence of integers over a given range. The two [arguments](extras/glossary.md#argument) to `range()` are the start and end of the desired sequence of integers.

For example:

In [19]:
ten_numbers = range(0, 10)

Intuitively, we might expect the [return value](extras/glossary.md#return) of `range()` to be a tuple or list containing the requested integers. Unfortunately, it is instead yet another new [data type](extras/glossary.md#type), the `range` type.

In [20]:
type(ten_numbers)

range

We don't need to worry too much about this. Behind the scenes in the bowels of our computer, the `range` data type 'prepares' the requested integers for use, but it doesn't actually put them into a list or give us any of them until we ask it to do so.

We can see this if we try to print out the range variable. We aren't shown all the integers. Instead, we just see that we have a 'range'.

In [21]:
print(ten_numbers)

range(0, 10)


We only get the integers once we request a specific one:

In [22]:
ten_numbers[3]

3

Understanding the reason for this behavior takes us into some of the details of how computers work. When we create a list (or a tuple), it takes up some space in our computer's memory. Although a decent modern computer usually has squigabytes upon squigabytes of memory, the amount is still limited, and not all of that memory will be accessible for Python, as it will be in use by other programs (or by the many viruses that have infected your computer). So if we are working with an extremely large sequence of numbers, for example in a big data analysis, we might prefer to generate each of those numbers one by one, process them, then discard them, rather than first storing them all in memory.

However, for small sequences, the behavior of `range()` can be a bit of an annoyance. We can demand the full list of integers by just converting a range into a list:

In [23]:
ten_numbers = list(ten_numbers)

ten_numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Notice that the end of a range is interpreted in the same way as the end index of a slice, which we learned about above. `range(5, 10)` means 'the integers from `5` up to *but not including* `10`. Also like slices, we can omit the starting number if we want to start from `0`. So `range(10)` is a shorthand for `range(0, 10)`.

Ranges are often useful for creating a new list that contains some mathematical sequence. For example we can get a list of square numbers by requesting '`x` squared' for every `x` in a range:

In [24]:
squares = [x*x for x in range(1, 11)]

squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

#### Aside: Python 2

Back in Python 2, the `range()` function did the more intuitive thing and just returned a full list of the requested integers. This was good for the simplicity and clarity of Python programs, but bad for efficient use of the computer's memory, so it was changed in Python 3.

## Mappings

Consider the list below. It contains someone's name, age, and location.

In [25]:
info = ['Mildred Bonk', 22, 'USA']

A list isn't such a great way of storing this information, for a few reasons.

* The list mixes data types (strings and integers). Python allows this, but it usually isn't a good idea. We can make our programs clearer and less error-prone by storing only one type of data in each list we create.
* The order of the items is somewhat arbitrary. It makes some sense to begin with the name, but there is no particular reason to have age before location or the other way around.
* There is nothing in the list that tells us what each of the items refers to in the real world. We just have to remember that `info[0]` is the name, `info[1]` the age, and so on. For another programmer reading our work, `info[1]` could well be Mildred's shoe size instead.

If we find ourselves wanting to gather together data of different types, and we would like each piece of data to have an informative name, then we need to move on from sequences. We need a 'mapping'.

A mapping is

## Choosing a data representation



### List or tuple?

Tuples are [immutable](extras/glossary.md#mutability). We can't change a tuple unless we overwrite it completely, which is the behavior we are used to from the basic data types such as [strings](extras/glossary.md#string) and numbers. Lists on the other hand are mutable. We can change single values in a list, and list [methods](extras/glossary.md#method) change the list without our having to re-[assign](extras/glossary.md#assignment) the result back into the list variable.

For example:

In [26]:
shopping_list.reverse()
shopping_list

['gravy mix', 'Mazola', 'bread', 'sausages', 'black pudding', 'bacon', 'eggs']

### List or dictionary?

### Nested data 

## Exercises