# Dictionaries

Dictionaries are used to store values associated with a key, which can be of any hashable type, including strings, integers, and floats. Values in dictionaries can be of any type. Dictionaries can be very useful for storing data that does not naturally fall into a list-type structure.

## Creating a Dictionary

To create a dictionary, we use curly braces ```{}```. If these braces are empty, the dictionary will be empty. We can also provide a series of key-value pairs, separated by commas. The key and value are separated by a colon. For example:

In [1]:
empty_dict = {} # Create an empty dictionary
fruity_dict = {"melons": 5, "peaches": 4} # Create a dictionary with two key-value pairs

print(empty_dict)
print(fruity_dict) # When printing a dictioanry, key-value pairs are separated by commas
# Each key-value pair is represented as the key for followed by a colon and then the value

{}
{'melons': 5, 'peaches': 4}


## Accessing Values from a Dictionary

We can access individual values of the dictionary by using the key in square brackets ```[]```. Just like values from a list, we can use these values as part of an expression. For example:

In [2]:
print(fruity_dict["melons"]) # Access a value by using its key
b = fruity_dict["peaches"] + 5 # You can use a value in an expression
print(b)

5
9


If we try to access a value using a key not in a dictionary, a KeyError exception will be raised.

In [3]:
print(fruity_dict["plums"])

KeyError: 'plums'

## Updating a Dictionary

We can add new key-value pairs or update the value of existing key-value pairs by writing the name of the dictionary, followed by the key in square brackets, followed by an equals sign and the new value. For example:

In [4]:
fruity_dict["bananas"] = 2 # Add a new key-value pair to a dictionary
fruity_dict["peaches"] = 6 # Change the value associated with an existing key
print(fruity_dict)

{'melons': 5, 'peaches': 6, 'bananas': 2}


We can delete a key-value pair from a dictionary using the ```del``` keyword.

In [5]:
del fruity_dict["melons"] # Delete a key-value pair from a dictionary
print(fruity_dict)

{'peaches': 6, 'bananas': 2}


## Looping over a Dictionary

We can loop over the keys or values of a dictionary.

In [6]:
for x in fruity_dict: # Looping over a dictionary gives the keys
    print(x)

for x in fruity_dict.values(): # To loop over the values, we need the "values" method
    print(x)

peaches
bananas
6
2


We can also loop over the keys and values of a dictionary at the same time using the ```items()``` method.

In [7]:
for fruit, number in fruity_dict.items(): # To loop over both keys and values, we need the "items" method
    print(f"There are {number} {fruit}")

There are 6 peaches
There are 2 bananas


## Order of Dictionaries

Note that, for earlier versions of Python, the order of the keys may not be the same as the order in which they were added to the dictionary. This is because dictionaries are implemented using hash tables, which are not ordered. In Python 3.6 and later, dictionaries are ordered, so the order of the keys will be the same as the order in which they were added to the dictionary. However, relying on the "order" of a dictionary is not normally good practice as it can make your code less portable and tends to encourage design patterns which do not use dictionaries in the best way.

## Exercise: Counting Bases

In genetics, DNA can be represented as a string of the characters representing the four bases: adenine (A), cytosine (C), guanine (G), and thymine (T). For example, the DNA sequence ```ATGCGATACGCTTGA``` has 5 adenine bases, 4 cytosine bases, 2 guanine bases, and 5 thymine bases. Knowing the number of each of these bases present in a DNA sequence can be useful for some types of genetic analysis.

In the code cell below write a function named ```base_counter``` which accepts a single string as an argument representing a genetic sequence, and returns a dictionary with the number of each base in the string. If a base doesn't appear in the genetic sequence, a key-value pair representing it should still be present in the dictionary, with a value of 0.

You may assume the argument provided will always be a string consisting only of the characters A, C, G, and T. 

Avoid using the ```count``` method of the string class to begin with. Once you have a solution which works, see if you can make it any simpler by using this method.

The code cell below also contains a number of calls, you can check your function works appropriately.

In [None]:
# Write your code here






# Some tests
print(base_counter("")) # Should print {'A': 0, 'C': 0, 'G': 0, 'T': 0}
print(base_counter("ACGT")) # Should print {'A': 1, 'C': 1, 'G': 1, 'T': 1}
print(base_counter("TGCAACGT")) # Should print {'A': 2, 'C': 2, 'G': 2, 'T': 2}
print(base_counter("GATTACA")) # Should print {'A': 3, 'C': 1, 'G': 1, 'T': 2}

A sample solution can be found in the file [```Sample Solutions/Sample Solutions 5 - Dictionaries.ipynb```](Sample%20Solutions/Sample%20Solutions%205%20-%20Dictionaries.ipynb).

## Exercise: Read a FASTA File

Biological sequences such as DNA and protein are often stored in a simple text format known as a [FASTA file](https://en.wikipedia.org/wiki/FASTA_format). 

The file [`aroK.fasta`](Data/aroK.fasta) contains the DNA sequence for one gene from the bacterium *Escherichia coli*.

Complete the code below to read in this file and print out the complete DNA sequence. You will probably want to look back at the Strings notebook for some useful methods.

In [None]:
PATH_TO_FILE = "Data/aroK.fasta"

# a variable to store the DNA string
dna = ""

with open(PATH_TO_FILE) as file:
    # read the first line and print it to the screen.
    print(file.readline())  

    # read in the DNA line by line ...

# print out the DNA ...


What is the [GC content](https://en.wikipedia.org/wiki/GC-content) (percentage of the DNA that is G or C) of this gene?

Can you write a function to calculate the GC content of any DNA sequence provided?

In [None]:
# Write your code here


## Extension: Translating from DNA to Protein

To find the [amino acid](https://en.wikipedia.org/wiki/Amino_acid) sequence for the protein that this gene encodes, there are two steps:

1. Chop the DNA sequence into a list of [codons](https://rosalind.info/glossary/codon/), where each codon is 3 consecutive nucleotides.

2. Translate each codon into its corresponding amino acid. *E. coli* uses this translation table: [`codon_table_11.txt`](Data/codon_table_11.txt)

In [10]:
# Write your code here


Finally, save your translated protein as a FASTA file.

In [11]:
# Write your code here
