# Collections (lists, tuples, dictionaries) and Loops

Collections are extremely important. Collections are objects that group other objects together. The three most important types of collection are `list`, `tuple` and `dict`. Manipulating these three basic collections is fundamental to everyday programming in Python.

Lists and tuples have a lot in common. Lists are enclosed in [] and tuples in (), with the elements separated by commas.

In [None]:
my_list = [100,200,300,400,500,600]

In [None]:
my_tuple = (100,200,300,400,500,600)

We'll look at the differences later. You'll find you work with lists more often than tuples.

Usually you want to do something to each element of a list or tuple in turn. This involves iterating (looping) over the elements, so this notebook also introduces the `for` loop, the most common and useful type of loop in Python.

## Lists

Lists are *ordered* collections. Each element in the list has a fixed index number depending on its relative position: the first element has index `0`, the second element has index `1` and so on (Python is like C and most other languages; in Fortran the first index of an array is 1). 

If you follow the name of the list with the index in *square* brackets `[ ]`, you get the element at that position.

In [None]:
my_list = [100,200,300,400,500,600] # create the list

my_list[0] # get the first element in the list

In [None]:
my_list[2]

In [None]:
my_list[3]

The `len` function returns the length of a list.

In [None]:
len(my_list)

Lists can be 'sliced' to return ranges of elements using the notation `x[start:stop:step]`, where `x` is a list and `start` is the **first index** you want to return and `stop` is the index **after the last one you want to return** (this is confusing at first, don't worry!). The `step` allows you to return either every element (`step=1`) or every 2nd element (`step=2`) or every third (`step=3`), etc.

All three parts of the slice syntax are *optional*, because if you leave them out, they get replaced with the following values:
- if you leave `start` out, it assumes start = 0
- if you leave `stop` out, it gets all elements from start up to *and including* the last one.
- if you leave `step` out, it assumes step = 1

You can also give **negative** numbers for `start`, `stop` and `step`; these have the same meaning as positive numbers but are interpreted **as counting backwards from the end of the list** rather than forwards from the beginning. So `my_list[-1]` returns the **last** element in the list.

This syntax for slicing lists is one of the most confusing things for Python beginners. The following examples are supposed to illustrate how this works.

***Understanding the following behaviour is very important*** for using Python to work with scientific data, in particular because the same ideas apply to the multidimensional numerical arrays that we'll look at later in `numpy`.

First, this example shows that the slice `[0:3]` has three elements, those at the indices 0, 1 and 2. `stop=3` here, so the index at element 3 is **not** returned.

In [None]:
my_list[0:3] # elements 0, 1 and 2, NOT INCLUDING element 3!

Play around with the rest of these examples until you get a feel for how slicing works.

In [None]:
my_list[1:] # This is a 'slice' of the list from the 2nd element to the last

In [None]:
my_list[-1] # Negative indices count backwards through the list

In [None]:
my_list[1:-1] # This is also a 'slice', here from the 2nd to the 2nd-last

In [None]:
my_list[:-1]

In [None]:
my_list[0:-1:2]

In [None]:
my_list[:-1:2]

In [None]:
my_list[::2]

In [None]:
my_list[1::2]

In [None]:
my_list[::-1]

Like the strings we saw in the Basic Python Review notebook, lists have methods. This example shows how to reverse a list.

In [None]:
my_list.reverse()
print(my_list)
my_list.reverse()
print(my_list)

There are two ways to create an empty list

In [None]:
empty_list_a = []
empty_list_b = list()

These are almost but not quite the same: I recommend you always use `list()` to avoid trouble.

Lists don't have to be collections of numbers. For example, we can also have lists of strings:

In [None]:
my_list = ['alpha','beta','gamma']
print(my_list)

Lists are 'mutable', which is Python jargon for saying we're allowed to change the values of their elements. The next example shows how to replace the 2nd element of `my_list` with a different value.

In [None]:
my_list = ['alpha','beta','gamma']
my_list[1] = 'omega'
print(my_list)

You can add items to a list that already exists using `append`.

In [None]:
my_list = ['alpha','beta','gamma']
my_list.append('delta')
print(my_list)

We can start off with an empty list, and append items to it.

In [None]:
empty_list = list() # initially empty
empty_list.append(99)
empty_list.append(100)
empty_list.append(999)
print(empty_list) # not empty anymore.

Lists have lots of methods and functions to operate on them, like `append`. We'll see more examples as we go on.

### Copies of lists and references to lists

One of the examples above was this:

In [None]:
my_list = ['alpha','beta','gamma']
my_list.append('delta')
print(my_list)

In this example, `my_list` is longer after the append operation. This seems obvious enough -- we say that the `append` function **modifies `my_list` in place**. The expression *in place* is used because there are a lot of other functions that **modify a copy** of the list rather than changing the list itself. These functions return the modified copy, leaving the original list unchanged. 

For example, we can also add two lists together using the `+` operator. If you think about it, it isn't obvious what 'adding two lists' with `+` *should* do. One common guess is what actually happens: the elements from the second list get added to the first list (i.e. the lists are **concatenated**).

In [None]:
print([1,2,3] + [7,8,9])

The result of this addition of two lists is **not** to modify either of them, but rather to return a **new** list (equivalent to modifying a copy of the first list). 

In [None]:
my_list  = ['alpha','beta','gamma']
new_list = my_list + ['delta']

# Check to see if my_list was changed:
print(my_list)
print(new_list)

Of course you could do this:

In [None]:
my_list = ['alpha','beta','gamma']
my_list = my_list + ['delta']

# Check to see if my_list was changed:
print(my_list)

But beware: this operation is still making a copy of `my_list`! If the list is long, this will waste CPU time and memory. It would be faster to modify the list in place using the `extend` function (see the end of the section on tuples below).

Regarding what `list + list` should do, you might also have guessed either of the following equally reasonable outcomes:

```
[8,10,14]
[1,2,3,[7,8,9]]
```

The first of these isn't consistent with the idea that lists can contain any type of object.

## Tuples

Tuples are very similar to lists. The most important difference is that they are **not mutable**. In other words, a tuple is a sort of 'frozen' list --  you can't change the elements of the tuple after you've made it. What happens if you try?

In [None]:
my_tuple = (100,200,300,400,500,600)
my_tuple[1] = 100

Unlike lists, tuples have no 'append' function, because they can't be changed. You can add two tuples together, but this returns a new tuple, not the old one with some extra elements.

In [None]:
my_tuple + (4,) # Returns a new tuple

Any time you see a comma (`,`) there is a tuple. The brackets are helpful but they're not the most important thing. The most important thing is the comma.

In [None]:
1, 2, 3, 4 # This makes a tuple

Like lists, tuples can mix data of different types. Both lists and tuples can include *any* Python object: the standard numerical classes, strings, `None`, or any other more complicated type of object. Because even *functions* are Python objects (as we'll see in a later notebook), they can be included too.

In [None]:
['this', 1, None, True, len, 'is all fine']

If this is your first time working with a language like Python, it might not be obvious what this `len` is that we're including in the list in the cell above. Here's a little more explanation. 

We already know that `len` is a function that we can use to get the length of a list, like this: `len(some_list)`.

When we add brackets to the end of a function name, as we do in `len(some_list)`, we are asking for the *result* of the function (`len`, in this case) given some arguments (`some_list`). When we write the *name* of a function without any `()` after it, as we did in the cell above, we are asking for the *function itself*, rather than some particular result of it. The function itself is an *object* in the same way the number `101` or the string `abc` are objects. We can manipulate functions like any other object in Python; the above example shows we can make lists of functions.

### The `in` statement

The `in` statement is a logical test that returns `True` if the item on the left-hand side is in a list or tuple (or any other sort of collection) on the right-hand side.

In [None]:
'a' in ['a','b','c']

### Nested collections

It follows  from the fact that lists and tuples can inclue anything that they can also include *other* lists and tuples. You can have tuples inside tuples, tuples of lists, and lists of tuples.

In [None]:
print([(1,2),(100,200),('alpha','beta','gamma')])

**Quick exercise**: As well as `append`, lists have a method called `extend`. What is the difference between `append` and `extend`? Read the documentation for `str.append` and `str.extend` and experiment in the blank cell below.

### Packing and unpacking tuples

The next example defines a function that returns a tuple. This is also the first example of how to define a function with `def`. For now, it should be obvious enough what's happening, so details of how to define functions will be given in a later notebook. The important thing about this example is that the function returns not one but *three* values.

You can see that what follows the return statement is a tuple definition (remember, tuples don't have to have brackets, the important sign of a tuple is a `,`). So, this function returns a tuple of values. The values are said to be *packed* into a tuple.

Notice the body of the function following `def` is indented and the block of code corresponding to the function ends when we go back to the same indentation level as the `def`, just like with `if` statements or `while` loops.

In [None]:
def three_powers(x):
    return x**2, x**3, x**4

y = three_powers(4)
print('y is a tuple: %s'%(str(y)))

Quick question: why do we have `str(y)` and not just `y` in the arguments to the string format in that example? The answer will be given in a few cells time.

If we want the individual values, we can 'unpack' the tuple into separate variables. The next example shows how this is done, by giving a tuple of variables on the left-hand side of the assignment.

In [None]:
alpha, beta, gamma = three_powers(4)
print(alpha)
print(beta)
print(gamma)

So the tuple `y` is equivalent to `y[0], y[1], y[2]`. 

Obviously, if this is going to work, there have to be as many variables on the left as there are elements in the tuple on the right. The next cell shows that an error is raised if this isn't the case.

In [None]:
alpha, beta = three_powers(4)

Now it should be clear why we needed to give `str(y)` to the print statement when we only had one `%s`. So this will give an error:

In [None]:
y = three_powers(4)
print('y is a tuple: %s'%(y))

`y` is a tuple of three elements, so giving three placeholders in the string will also work. 

In [None]:
print('y is a tuple, the elements of which are: %s %s %s'%(y))

### For loops

Now is a good time to look at `for` loops, which build on lists and tuples. A `for` defines a block of code that is executed once *for each item* in a sequence (e.g. each element of a list or tuple). The definition of the `for` loop assigns a variable name (`x` in the example below) that will be used to represent 'the current item from the sequence' inside the loop block.

In [None]:
for x in [1,2,3,4]:
    print(x)

Earlier we saw the `while` loop, which doesn't involve any sort of sequence in the definition of the loop:

In [None]:
x = 0
i = 0
while i < 10:
    x = x + i
    i = i + 1
print(x)

`for` loops are much more common in Python code than `while` loops. The same idea as the `while` loop in the previous cell is more often expressed like this:

In [None]:
x = 0
for i in range(0,10):
    x = x+i
print(x)

`range` is a useful function; it returns a `range` object with a start point, end point and step size.

In [None]:
range(0,20,2)

In [None]:
x = range(0,20,2)
print(x.start,x.stop,x.step)

`x.start` etc. here are expressions that fetch **attributes** of the `range` object we've called `x`. The relationship between variables and attributes is the same as that between functions and methods: attributes are just variables that are stuck to a particular object and you can get at them using a '`.`'.

`range` is better for loop counters than explicit lists, because it doesn't take up significant memory.

The indented `for` loop above is the most general way to write loops in Python, but in simple cases they can be written in a more compact way using **list comprehensions**, which basically means a for-loop *inside* the syntax for creating a list, like this:

In [None]:
[i**2 for i in range(0,10)]

So the example above can be written

In [None]:
x=0 ; print(sum([x+i for i in range(0,10)]))

`sum` is obviously a function that takes a list as an argument and adds up all the entries. 

***List comprehensions can be used to do sophisticated things in a concise way. This is very common in real Python code.***

What happens when we know how many items we want to loop over but we don't know how long the list we're looping over is? In this case we could write a complicated `while` loop, but Python offers an much more elegant and powerful solution called generator expressions. This is a bit advanced for this part of the tutorial -- look at the link given as further reading at the end of this section if you're interested.

#### Getting out of loops and skipping iterations

To get out of a loop part way through, use `break`.

In [None]:
for i in range(0,10):
    if i > 5: break
    print(i)

To go straight to the next iteration without breaking the loop, use `continue`.

In [None]:
for i in range(0,15):    
    if i == 5 or (i > 8 and i <13):
        print('Skipping this one ...')
        continue
    else:
        print(i)
    

#### Other types of sequences can be iterated over

So far we've been iterating over lists of numbers, but strings are also iterable, because they are collections of characters.

In [None]:
'abcdefg'[2:5]

The term 'iterable' is used to describe anything in Python that can be iterate over in a `for` loop. 

*Advanced tip:* Later we'll see how to define our own classes of objects; in Python it's standard practice to make anything that acts as a collection of many things into an `iterable` class so that it be used in the same way as the standard collections (list, tuple etc.) in `for` loops. That's easy to do, and an explanation of how to do it can be found [here](https://docs.python.org/3/tutorial/classes.html#iterators), but that's not important for now. The important thing is that, in a `for` loop definition, you should expect to be able to use **anything** that acts as a 'collection of things', not just lists and tuples.

Very often you'll want to loop over something iterable (call it `x`) and count the steps in the loop at the same time (call the step numbers `i`). You could make a counter variable for `i` and explicitly add `+1` in each iteration of the loop, but Python has a function `enumerate` to make things slightly neater:

In [None]:
for i,x in enumerate('carbon dioxide'):
    print('%3d : %s'%(i,x))

If you want to loop over two (or more) iterable things at the same time, use `zip`(). For example, `enumerate` is equivalent to this:

In [None]:
my_string = 'carbon dioxide'
for i,x in zip(range(0,len(my_string)), my_string):
    print('%3d : %s'%(i,x))

**Quick exercise**: What happens when the arguments to `zip` are have different lengths? Experiment in the cell below.

Occasionally you want strings to be treated like scalar variables while still treating lists as lists. This next example shows that just assuming this will work because strings are iterable doesn't always work like we might want.


In [None]:
def add_together_strings(s):
    """
    Arguments:
        s: a list of strings
    
    Returns:
        A single string with 'and' between each element of s
    """
    return '%s'%(' and '.join(s))

print(add_together_strings(['carbon','nitrogen','oxygen'])) # works ok for a list of strings
print(add_together_strings('carbon')) # Doesn't work as we want for a single string!

In this case the simplest thing to do is check the type explicitly.

In [None]:
def add_together_strings(s):
    """
    Arguments:
        s: a list of strings
    
    Returns:
        A single string with 'and' between each element of s
    """
    if isinstance(s,str):
        return s
    else:
        return '%s'%(' and '.join(s))

print(add_together_strings(['carbon','nitrogen','oxygen']))
print(add_together_strings('carbon'))
            

-----
*Further reading:*

- [StackOverflow question on iteration in Python](http://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols)
- [generator expressions](https://wiki.python.org/moin/Generators)
- [how generators work and the yield keyword (more advanced stuff)](http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do)
- the [iterools](https://pymotw.com/2/itertools/) standard library module
-----

## Dictionaries (`dicts`)

'dict' is short for **dictionary**. These are *unordered* collections of labels (**keys**) with associated data (**values**). Other languages sometimes call these 'hashes'. There are two basic ways to create them:

In [None]:
morphologies = dict(ngc4849 = 'spiral', ngc550 = 'elliptical', ngc4994 = 'spiral', ngc2337 = 'irregular')

`dict()` here is a function that turns all the arguments you give it into (key,value) pairs. The next example shows an alternative syntax using special brackets `{}`, similar to the `[]` for `list`:


In [None]:
morphologies = {'ngc4849': 'spiral', 'ngc550': 'elliptical', 'ngc4994' : 'spiral', 'ngc2337' : 'irregular'}

Long expressions on single lines like this can be hard to read. Here's how I usually write them. You don't have to line things up as neatly as this if you don't want to, the important thing is that you can break lines at commas. Python doesn't care about the whitespace *within* lines, only at the start.


In [None]:
morphologies = {'ngc4849' : 'spiral',
                'ngc550'  : 'elliptical', 
                'ngc4994' : 'spiral', 
                'ngc2337' : 'irregular'}

Notice:

- in the first (`dict()`) form, the keys have to be strings. In the {} form they don't, but they do have to satisfy some requirements, otherwise you will get an error about 'unhashable types'. The keys of dictionaries are usually strings but they can also be numbers.

- the values can be any mix of types, functions, collections, `None`, classes, whatever you like, including `list`s and other `dict`s. Nested `dict`s are very common. 

We can look up the value associated with a particular key like this:

In [None]:
morphologies['ngc550'] # get the value for a given key

Both the keys and values can be returned as lists. Compare the order to the order in the expression that creates the dictionary above.

In [None]:
print(morphologies.keys())
print(morphologies.values())

The `items()` method of a `dict` returns a generator that yields key-value pairs.

In [None]:
for k,v in morphologies.items():
    print("%s is %s"%(k,v))

**The `dict` is extremely important in Python**. Since making a simple structure with named 'properties' using a `dict` is so easy, it's much less common to see **classes** (which we'll look at later) used to represent simple 'bundles' of data in Python compared to some other 'object oriented' languages (e.g. java). This is particularly true for the common case of returning 'structured' results from functions. For example:

In [None]:
def my_complicated_function(x):
    """
    This function is obviously over-complicated for what it does.
    
    A realistic version might be, for example, reading 
    the header from a data file.
    """
    results                  = dict()
    results['powers']        = dict()
    results['odd_multiples'] = dict()
    
    results['powers']['square'] = x**2
    results['powers']['cube']   = x**3
    
    for m in range(1,12,2):
        results['odd_multiples'][m] = x*m

    return results

answer_three = my_complicated_function(3)
answer_four  = my_complicated_function(4)

print(answer_three)
print(answer_four)
print(answer_four['powers']['cube'])

However, for lots of more complicated cases, creating classes with methods will be much more useful. The point is that `dict` can be used in a lot of cases where other languages would force you to write your own class (or use something like C's `struct`).

We can use `update` to include the elements of one `dict` in another, like this:

In [None]:
letters      = {'a':'alpha', 'b':'beta','c':'gamma', 'd':'delta'}
more_letters = {'c':'chi','g':'gamma'}


print('Before update :', letters)
letters.update(more_letters)
print('After update  :', letters)

Notice that `update` changes the original dictionary, it doesn't make a copy.

What if some of the keys and/or items are duplicated in the two dicts?

In [None]:
letters      = {'a':'alpha', 'b':'beta','c':'gamma', 'd':'delta'}
more_letters = {'c':'chi','g':'gamma'}

print('Before update :', letters)
letters.update(more_letters)
print('After update  :', letters)

The old keys are overwritten with the new values.

Update modifies the first dictionary in place. If you want to concatenate two dictionaries together following the same logic that we introduced above for lists, you could first make a copy, then update that.

In [None]:
letters      = {'a':'alpha', 'b':'beta','c':'gamma', 'd':'delta'}
more_letters = {'c':'chi','g':'gamma'}

new_letters  = letters.copy()
new_letters.update(more_letters)

print('Before update :', letters)
new_letters.update(more_letters)

print('After update  :', letters)
print('New dict      :', new_letters)

There is no '+' operator for dicts. The question of what it means to 'add' two dictionaries has even more equally valid answers than the same question for lists! If you want to combine two dictionaries, it's up to you to write your own function that deals with things like duplicate keys in a way that's suited to your application.

Further reading:
    - collections.OrderedDict
    - the other items in the collections standard library module
    - the struct module
    - The internal dictionary __dict__
    - named tuples

## End of Notebook