# 7PAVITPR: Introduction to Statistical Programming
# Python practical 8

_Angus Roberts<br/>
Department of Biostatistics and Health Informatics<br/>
Institute of Psychiatry, Psychology and Neuroscience<br/>
King's College London<br/>_

# Data structures: list comprehensions, sets, dictionaries

### List comprehensions

We often want to compute one list from the values in others. So far, we have done this with `for` loops. For example, try this code, which creates a list of weights in kg from a list of weights in lbs:


In [None]:
w_lbs = [120, 256, 145, 190, 173, 201]
w_kgs = []

for x in w_lbs:
    w_kgs.append(round(x/2.2,1))
    
print(w_kgs)
    

__List comprehensions__ are a syntactic shortcut for this. Here is the above example, written as a list comprehension - try it:

In [None]:
w_kgs = [round(x / 2.2, 2) for x in w_lbs]
print(w_kgs)

The list comprehension, on the second line, has the following parts:

- an assignment - the resulting list is being assigned to something, a new list w_kgs in our example
- `[`
- _expression_ written in terms of some variable - x in our example. This will be computed and appended to the list
- `for`
- _variable_ - x in our example
- `in`
- _sequence_ that will be iterated over to provide values for the variable
- `]`

List comprehensions can have more than variabale in their initial expression, and more than one `for` in them, and also conditional `if` clauses. You can imagine each `for` or `if` statement being indented on the line below the previous one, and a final `append()` being indented at the end.

For example, try this code:


In [None]:
# Print all possible two base pairs, where the bases are not the same

bases = ['C', 'T', 'G', 'A']

combinations = [x+y for x in bases for y in bases if x != y]

print( combinations )

## <font color=green>💬 Discussion point</font>

Can you explain in words what the above list comprehension is doing?


It is also possible for list comprehensions to be nested - the initial expression can itself be a list comprehension. These can get hard to understand, and so it is not a bad thing to avoid them! 

## <font color=green>❓ Question</font>

A model takes two parameters:

- _t_, which has possible values True and False
- _w_, which has possible values 10, 20, 30, 40
- _w_ cannot be greater than 20 if _t_ is False

Write a list comprehension to compute a list of all possible parameter pairs as tuples, (t, w).

- Remember, a tuple is an immutable sequence of the form (x, y)
- You will need an `if` at the end, to satisfy the second requirement on _w_. Try using an `or` in its condition
- The lists of individual parameters are short, you could include them in comprehension as explicit lists, e.g. `[.... for x in [1, 2, 3]...]` 

## <font color=green>⌨️ Your answer</font>

In [None]:
# Complete this code

parameters = [ REPLACE ]

# Print out the results
print( parameters )

# Is the result right?
print( parameters == [(True, 10), (True, 20), (True, 30), (True, 40), (False, 10), (False, 20)])

### Sets

Python sets model mathematical sets - unordered collections of objects with no duplication of members.

Sets can be made from a sequence, or by enclosing objects in curly braces `{  }`

Try the code below:

In [None]:
# Create a list - note that it contains some duplicates
schizo = ['PT145', 'PT766', 'PT901', 'PT773', 'PT274', 'PT274', 'PT274']

# Create a set from the list
schizo = set(schizo)

# Create a set using braces - note that it contains some duplicates
diabetes = {'PT145', 'PT766', 'PT901', 'PT872', 'PT634', 'PT634'}

# Take a look at the lists - note that there are no duplicates
print(schizo)
print(diabetes)

## <font color=green>❓ Question</font>

Sets support several useful operations. Using the sets above, write code to show what these ones do:

- member `in` set
- set `-` set
- set `|` set
- set `&` set
- set `^` set

## <font color=green>⌨️ Your answer</font>

In [None]:
# Give your answer below

## <font color=green>💬 Discussion point</font>

- Where can you find documentation for `set`?
- What other operators does `set` support?
- Why don't sets support slices?
- What is the difference between `set` and `frozenset`
- Which set operations are not available for `frozenset`, and why not?

### Dictionaries

- In lists, each value is associated with a positional index - 0, 1, 2, 3 etc.
- In dictionaries, the values are also associated with indices
- In dictionaries, the indices are known as _keys_
- _Keys_ can be any immutable object - they are usually strings or numbers
- Dictionaries associate  _keys_ with their _values_
- You can think of them as sets of _key=value_ pairs
- Each _key_ must be unique, and is associated with a value

Below is an example - a dictionary associating patient identifiers with haemoglobin A1C results. It shows how to add, change and retrieve values. Try it:


In [None]:
# Create a dictionary
hba1c = {'PT145':4.5, 'PT766':4.2, 'PT901':6.1, 'PT872':4.5, 'PT634':5.1}

# Retrieve a value for a key
print(hba1c['PT901'])

# Change the value for a key
hba1c['PT901'] = 8.0
print(hba1c['PT901'])

# Add a new key/ value pair
hba1c['PT999'] = 3.0
print(hba1c)

## <font color=green>💬 Discussion point</font>

There are other ways to construct dictionaries.  Common ones are shown below.

- When might the first of these be useful?
- When might the second of these be useful?

In [None]:
# Construct a dictionary from a list of key/value tuples
hba1c = dict([('PT145', 4.5), ('PT766', 4.2), ('PT901',6.1)])
print(hba1c)

# Construct a dictionary from keywords and values
hba1c_again = dict(PT145=4.5, PT766=4.2, PT901=6.1)
print(hba1c)

## <font color=green>💬 Discussion point</font>

Below is another way to construct a dictionary. Can you explain what is happening?

In [None]:
patients = ['PT145', 'PT766', 'PT901']
results = [4.5, 4.2 ,6.1]

hba1c = {x for x in zip(patients, results)}

print(hba1c)


So far we have looked at dictionaries with strings as keys, and numbers as values. We can use anything immutable as a key, and we can use any object as a value, e.g.

- Dictionaries of lists
- Dictionaries of lists of dictionaries
- Dictionaries of sets of lists
- ...

Most of these are unwieldy to handle, and if you think you need them, you should probably learn how to write your own classes in Python (or find a package that does what you want already).

We will, however, look at dictionaories of lists, as they are a simple and useful example.

## <font color=green>❓ Question</font>

A dictionary holds the last three HbA1C results for patients, oldest first. Its construction is shown in the code below.

Write code to:

- print the second result for PT145, remembering that:
  - hba1c[key] will return a list, and
  - list[n] will return a value from the list
- for patient PT901, remove the oldest result, and add a new result of 5.7 to the end
  - pop(n) removes a value from a list
- Print how many patients there are in the dictionary
    - This is the dictionary's length - look  it up in the Python Standard Library docs if you are unsure
- Print out the number of results PT766 has
- Test if there are any results for PT564


## <font color=green>⌨️ Your answer</font>

In [None]:
# Complete the code that manipulates the dictionary below,
# following the instructions in the comments
hba1c = dict(PT145=[4.5,4.5,4.7], PT766=[4.2,4.3,4.4], PT901=[6.1,6.0,5.8])

# Print the second result for PT145
# YOUR CODE HERE

# Remove the oldest result from PT901 and print it
# then add a new latest result of 5.7 and print all 
# the results for this person
# YOUR CODE HERE

# Print many patients are there in the dictionary
# YOUR CODE HERE

# Print how many results PT766 has
# YOUR CODE HERE

# Test if the dictionary has any results for PT564
# YOUR CODE HERE


