# Dictionaries

In the previous chapter we talked about lists. Lists can contain any value and if you want to extract a specific value from a list you need to know at which position in the list it is located. There is another *container* type in Python that I want to introduce in this chapter which is also capable of storing values: Dictionaries. A dictionary stores each value with a so-called *key*. This key can also be any Python value, but usually it is a string. Let's take a look at an example: 

In [None]:
capitals = { 'Germany': 'Berlin', 'France': 'Paris', 'England': 'London' }

When you want to define a dictionary, you use curly braces ({}) to surround the values within that dictionary. In the above example the *values* are:

In [None]:
capitals.values()

And the *keys* are:

In [None]:
capitals.keys()

If you want to know what the capitol of Germany is, you can use the string "Germany" as a key:

In [None]:
capitals[ 'Germany' ]

Just like with lists, you access the elements contained in a dictionary by using square brackets ([]). You can also use the `in` operator to check if a *key* is in a dictionary:

In [None]:
'Spain' in capitals

Conversely, to check if a value is contained in a dictionary, you have to access the values first:

In [None]:
'Paris' in capitals.values()

Why are dictionaries important for us? One common use case for biological data analysis is mapping of identifiers from one type to another:

In [None]:
human_gene_to_protein = { 'ENSG00000180914': 'OXYR_HUMAN',
                          'ENSG00000163914': 'OPSD_HUMAN',
                          'ENSG00000184845': 'DRD1_HUMAN'
                        }

human_gene_to_protein[ 'ENSG00000163914' ]

And sometimes you even want to chain together mappings because you do not have a direct mapping:

In [None]:
human_protein_to_readable_name = {
    'DRD1_HUMAN': 'D(1A) dopamine receptor',
    'OXYR_HUMAN': 'Oxytocin receptor',
    'OPSD_HUMAN': 'Rhodopsin'
} #note that the order of the entries does not matter

human_protein_to_readable_name[ human_gene_to_protein[ 'ENSG00000163914' ] ] #note the nested dictionaries

Obviously, if a direct mapping is not available, you can create it yourself:

In [None]:
human_gene_to_readable_name = {} #start with an empty dictionary

for key, value in human_gene_to_protein.items(): # the method `items` gives us each key-value pair
    human_gene_to_readable_name[ key ] = human_protein_to_readable_name[ value ] # insert new key-value pair

human_gene_to_readable_name

To insert a value into a dictionary, you can simply say `dictionary[ key ] = value`.

<span style="color:teal">Task:</span> The above example uses the method `items` to get the key-value pairs. Your task is to create a new cell with a `for` loop that goes over `human_gene_to_readable_name`. This time do not use the `items` method (or any other method). Which values do you get from the for loop?

Another common use case of dictionaries with biological data is joining datasets together. Let's say you have two tables:

<table>
 <thead>
  <tr><th>RNA molecule</th><th>expression</th></tr>
 </thead>
 <tbody>
  <tr><td>hsa-miR-3200-3p</td><td>12</td></tr>
  <tr><td>hsa-let-7f-5p</td><td>0</td></tr>
  <tr><td>hsa-miR-4781-3p</td><td>7</td></tr>
 </tbody>
</table>

<table>
 <thead>
  <tr><th>RNA molecule</th><th>position</th><th>sequence</th></tr>
 </thead>
 <tbody>
  <tr><td>hsa-miR-3200-3p</td><td>chr22:30731610-30731631</td><td>CACCUUGCGCUACUCAGGUCUG</td></tr>
  <tr><td>hsa-let-7f-5p</td><td>chrX:53557246-53557267</td><td>UGAGGUAGUAGAUUGUAUAGUU</td></tr>
  <tr><td>hsa-miR-4781-3p</td><td>chr1:54054124-54054145</td><td>AAUGUUGGAAUCCUCGCUAGAG</td></tr>
 </tbody>
</table>

Then you could join these two tables together based on the common field "RNA molecule".

In [None]:
rna_expressions = {
    'hsa-miR-3200-3p': 12,
    'hsa-let-7f-5p': 0,
    'hsa-miR-4781-3p': 7
}

rna_details = {
    'hsa-miR-3200-3p': { 'position': 'chr22:30731610-30731631', 'sequence': 'CACCUUGCGCUACUCAGGUCUG' },
    'hsa-let-7f-5p':   { 'position': 'chrX:53557246-53557267', 'sequence': 'UGAGGUAGUAGAUUGUAUAGUU' },
    'hsa-miR-4781-3p': { 'position': 'chr1:54054124-54054145', 'sequence': 'AAUGUUGGAAUCCUCGCUAGAG' }
}

# and now join these together:
rna_in_more_detail = {}
for rna, expression in rna_expressions.items():
    details = rna_details[ rna ].copy() # we make a copy because we will make changes to the dictionary
    details[ 'expression' ] = expression # dictionary[ key ] = value
    rna_in_more_detail[ rna ] = details # put the dictionary `details` into the dictionary `rna_in_more_detail` using the key `rna`

# let's print the new "table" in a more readable way:
print( 'RNA name\texpr.\tnucleotide sequence\tposition' ) # I renamed the headers a bit so that they print nicely
for rna, details in rna_in_more_detail.items():
    print( rna, end = "\t" ) # print automatically outputs an end of line marker. To change this, use the `end` parameter 
    for key in [ 'expression', 'sequence', 'position' ]:
        print( details[ key ], end = '\t' )
    print() # just output and end of line marker

That was quiete a big example! And we also saw a new aspect of the `print` function: The `end` parameter. Normally, when you use `print` it will also print an end-of-line marker so that any further output will be shown on a new line. In the above example we need to output a table which means we do not want to have a new line whenever we print a variable. Instead, we want a new column and that is why we use `'\t'` as the parameter to `end`.

<span style="color:teal">Task:</span> Remove the `copy` in the above example and run the whole code again. Then investigate the `rna_details` variable. What do you notice? Why could this be a bad thing?

In [None]:
from IPython.display import display, HTML # in the next chapter you will learn what this line means

#what is wrong here?

chapter_links = {
    'moduls': '<a href="11 Modules.ipynb" target="_blank">Time for the next chapter!</a>'
}

display( HTML( chapter_links[ 'modules' ] ) )