# Data Structures

In this notebook, you will learn about storing structured data using dictionaries. If you finish early, the advanced exercises at the end of the notebook will get you thinking about classes, and how you might want to design your own data structure to create useful objects.

## Dictionaries

Dictionaries consist of key-item pairs, where each key is a unique value that maps to another value.

In [7]:
age  = {"John": 20, "Marie" : 22, "Charlie" : 24}

You can access a value in a dictionary by indexing using the corresponding key.

In [3]:
age['John']

20

You can iterate over the keys in a dictionary, just like you would a list.

In [4]:
for person in age:
    print(person, age[person])

John 20
Marie 22
Charlie 24


You can add and remove entries to a dictionary. To add an item to the dictionary, we simply set the dictionary entry with a new key to the value we want. To remove an item, we use the 'del' keyword to remove the entry corresponding to a key.

In [5]:
age['Amy'] = 27
del age['John']
for person in age:
    print(person, age[person])

Marie 22
Charlie 24
Amy 27


The keys of a dictionary are ordered by insertion: the first item inserted is at the top, and so on. This is a relatively new feature, and is only guaranteed in Python 3.7 and above. For example, if we add John back into the dictionary, notice how he now appears last when iterating over the dictionary. Note that modifying an existing entry does not change the ordering.

In [6]:
age['John'] = 20
for person in age:
    print(person, age[person])

Marie 22
Charlie 24
Amy 27
John 20


Like any collection, we can check if a key is present in a dictionary. If we try to access an entry that does not exist, a KeyError is raised.

In [11]:
print('John' in age, 'Charlie' not in age)
age['The Universe']

True False


KeyError: 'The Universe'

Dictionary keys can be any immutable type (e.g. ints, floats, strings, and even tuples), while values can be any type. Like a list, entries do not have to be of the same type. The values of a dictionary can even be another dictionary! This can be a powerful way of keeping track of heterogeneous data. For example, we might have a dictionary of personal details.

In [17]:
people = {}
people['John'] = {'age': 20, 'height': 172, 'likes': ['Python', 'Dictionaries'], 'dislikes': ['Java', 'Ruby', 'Arrays'], 'favourite animal': 'python'}
people['Amy'] = {'age': 27, 'height': 167, 'likes': ['Java'], 'dislikes': ['Python', 'John'], 'favourite colour': 'magenta', 'favourite sport': 'hockey'}

In [19]:
people['John']

{'age': 20,
 'height': 172,
 'likes': ['Python', 'Dictionaries'],
 'dislikes': ['Java', 'Ruby', 'Arrays'],
 'favourite animal': 'python'}

In [20]:
people['John']['likes']

['Python', 'Dictionaries']

### Exercise 1

Download the file 'Blosum62.txt' from Canvas. This contains the Blosum62 matrix for amino acid substitutions where the i, j-th entry is the score associated with substituting amino acid i with amino acid j. Select a 4x4 subset of the matrix, including Alanine (A) and Glycine (G), and create a dictionary that will return the cost of substituting one amino acid for another. Verify that your dictionary returns the correct score for substituting Alanine for Glycine.

In [None]:
blosum = {}

blosum['A']['G']

### Exercise 2 (Optional, but recommended)

Load the contents of Blosum62.txt and, using a loop, create a dictionary to represent the entire Blosum62 matrix.


### Exercise 3

Download the file 'codon_table.txt' from Canvas. This contains the standard DNA codon - amino acid genetic code. Load the contents of this file and create a dictionary that maps each codon to the corresponding amino acid. Make sure the start and stop codons are mapped appropriately. Create a second dictionary that maps each amino acid to a tuple containing all codons that code that amino acid, again ensuring the start and stop codons are mapped appropriately.

### Exercise 4

Referring back to Regex.ipynb from Lesson 5, create a dictionary that maps each DNA base to the corresponding mRNA base. Using this dictionary, write a function that transcribes a DNA sequence to an mRNA sequence. Using this dictionary and your dictionary from Exercise 3, write a piece of code to generate a dictionary that maps mRNA codons to their corresponding amino acid.


### Exercise 5
Using your dictionaries, write a function that detects whether a sequence is DNA or mRNA, locates the first start codon and the corresponding stop codon, and translates the sequence between the start and stop codon into an amino acid sequence. Hint: find the start codon, then break the following part of the sequence into codons to identify the stop codon.

In [None]:
sequence = "GTGCTCAATGGATAATACTGAGCTCGAGGTGGACTTCTATAGTTGCGTACACTCGATGAC"

## Classes

Referring back to Lesson 08 on multidimensional arrays and raster graphics, let's create a class to represent an image. If you've made it this far with time to spare, have a go at the following exercise, referring to the [documentation](https://docs.python.org/3/tutorial/classes.html#a-first-look-at-classes) as needed.

### Exercise 6

Here is a template for a class to represent a PBM image. Complete the init function so that when passed the magic number, title, dimensions, and pixels for a PBM image, it assigns each of these to an instance variable in the class. Modify your class so that you can create an empty instance of PBMImage by not passing any arguments.

Using your code from Lesson08, load feep.pbm and use the values to create an instance of PBMImage.

In [1]:
class PBMImage:
    
    def __init__(self, magic_number, title, dimensions, pixels):
        pass

### Exercise 7

Give your class an instance method which reads a .pbm file and sets the instance variables to the values taken from the file.

### Exercise 8

Give your class an instance method which writes a correctly-formatted .pdb file to a user-specified file name.

### Exercise 9

Give your class an instance method which prints an informative message about the type and dimensions of the image.

### Exercise 10

Give your class an instance method which computes the histogram of the image and returns the result. 

### Exercise 11

Give your class an instance method which inverts the PDB image and returns the result as a new instance of BPMImage.

### Exercise 12

The \_\_str\_\_(self) instance method is used to tell Python how to represent a class as a string, allowing you to print a meaningful representation of your class. Give your class a \_\_str\_\_(self) method which converts the PBM image into a human-readable string and returns the result. You could simply use the same format as the image file, or you might choose to omit the pixels and simply describe the format of the image. Check that you can print your class.