# Dictionaries

## topics
1. What is a dictionary
2. Using dictionaries
3. Some dictionary uses
4. Dictionary psets

# What is a dictionary?

A dictionary is a hash table (https://en.wikipedia.org/wiki/Hash_table). It basically lets us create an association between a key and its value and allows us to rapidly obtain the values of the keys. Note that, unlike lists, a dictionary is unordered.

In [2]:
'''
Let's take a quick look at how we can create dictionaries
and what the look like.
'''
myDict = {'key1': 'value1',
          'key2': 'value2',
          'key3': 'value3'}

print(myDict)
print(type(myDict))

{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}
<type 'dict'>


Dictionary keys can only be immutable objects; these include strings, integers, floats, and tuples.
Dictionary values can be anything else. Here are some examples.

In [3]:
'''
Here, I create a dictionary with a variety of keys and values
'''
testDictA = {1: 'hi',
             2.0: [5,6,'c'],
             'a': myDict}

print(testDictA)

{'a': {'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}, 1: 'hi', 2.0: [5, 6, 'c']}


What happens if we use mutable objects as keys in dictionaries? By mutable objects, I mean structures like lists and dictionaries.

In [4]:
'''
Let's use a list as a key
'''
testDictB = {[2,3,4]:'a list'}

TypeError: unhashable type: 'list'

In [7]:
'''
Let's use a variable as a key
'''
anotherDict = {'a': 'hi'}

testDictC = {anotherDict: 'a dictionary'}

TypeError: unhashable type: 'dict'

# Using Dictionaries

We have a dictionary, how do we use it?

In [8]:
'''
We have a dictionary, how can we get the values given a key?
'''
myDict = {'key1': 'value1',
          'key2': 'value2',
          'key3': 'value3'}

# to get a value, use the dictionary[key] format
print(myDict['key1'])

value1


In [28]:
'''
But what if I want to get all the values? keys?
Like a list, we can loop through dictionaries
'''
### Getting Values ###
# looping through dictionary
for i in myDict:
    print(myDict[i])

# to get a list of values, use the values method
dictVal = myDict.values()
print(dictVal)

### Getting Keys###
# Similar to values, you can loop through a dictionary to get keys
for i in myDict:
    print(i)
    
# to get a list of keys, you can use the keys method
dictKeys = myDict.keys()
print(dictKeys)

'''
You can be creative with this. If you want, you can get a list of values and then loop through that.
'''

value3
value2
value1
value4
['value3', 'value2', 'value1', 'value4']
key3
key2
key1
key4
['key3', 'key2', 'key1', 'key4']


In [13]:
'''
Now I want key and value pairs!
'''
for i in myDict:
    print(i, myDict[i])

('key3', 'value3')
('key2', 'value2')
('key1', 'value1')


In [15]:
'''
I'm missing key4 a value4, so let's add this to myDict
'''
# to do this, simply use the following format: dictionary[newkey] = newvalue
myDict['key4'] = 'value5'

print(myDict)

{'key3': 'value3', 'key2': 'value2', 'key1': 'value1', 'key4': 'value5'}


In [16]:
'''
I accidentally set value5 to key4, when I actually meant to do key5.
You can change the values of a key by reassigning a new value to the key
'''
myDict['key4'] = 'value4'

print(myDict)

{'key3': 'value3', 'key2': 'value2', 'key1': 'value1', 'key4': 'value4'}


In [29]:
'''
I actually don't want key2 nor its value in my dictionary.
Let's remove it using the del function
'''
print(myDict)

del myDict['key2']

print(myDict)

{'key3': 'value3', 'key2': 'value2', 'key1': 'value1', 'key4': 'value4'}
{'key3': 'value3', 'key1': 'value1', 'key4': 'value4'}


In [7]:
'''
I have 2 lists of corresponding values, and I'm too lazy to rerwrite my code,
how can I make a dictionary?
'''
# Dictionaries can be made using list of lists using dict function
print([[1,2],[3,4]])
print(dict([[1,2],[3,4]]))

# 2 lists of the first, second, and third letter of alphabet
listA = ['a','b','c']
listB = [1,2,3]

# the zip() function creates a list of tuples based on 2 lists
listOfTuples = zip(listA, listB)
print(listOfTuples)

# convert the "zipped" lists into a dictionary
print(dict(listOfTuples))

[[1, 2], [3, 4]]
{1: 2, 3: 4}
[('a', 1), ('b', 2), ('c', 3)]
{'a': 1, 'c': 3, 'b': 2}


In addition to creating dictionaries and modifying values and whatnot, you can perform operations on the values as long as the data structure of the values allow it. Let's look at an example of appending elements to a list.

In [26]:
'''
We have a new dictionary of the letters of the alphabet broken into 10 elements per bin.
However, we make a mistake and forget to add the letter j to the first bin.
Let's fix that problem
'''
# I'm just generating the dictionary I described above here
def bad_alpha(chunkSize, badLetter):
    alphabet  = 'abcdefghijklmnopqrstuvwxyz'
    bins      = len(alphabet)//chunkSize + 1    
    alphaDict = {}
    for i in range(bins):
        chunk = [j for j in alphabet[i*chunkSize:i*chunkSize + chunkSize]]
        if badLetter in chunk:
            chunk.remove(badLetter)
        alphaDict[i+1] = chunk
    return alphaDict


alphaDict = bad_alpha(10, 'j')

print(alphaDict) # let's make sure that this worked


# let's add a 'j' to the first bin
alphaDict[1].append('j')

print(alphaDict)

{1: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'], 2: ['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't'], 3: ['u', 'v', 'w', 'x', 'y', 'z']}
{1: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], 2: ['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't'], 3: ['u', 'v', 'w', 'x', 'y', 'z']}


# Nested Dictionaries

Nested dictionaries are those in which the values of a key is anothehr dictionary. This can be useful when you're working with data that have multiple categories.

In [18]:
'''
We want to look at some descriptors for multiple organisms.
We can divide them by kingdom then by more specific categories using nested loops
'''
mammalDict = {'rat': 'small and furry',
              'fish': 'tasty and aquatic',
              'bird': 'not all fly'}
bacteriaDict = {'ecoli': 'some are bad',
                'listeria': 'uses actin'}

lifeDict = {'eukaryotes': mammalDict,
            'bacteria' : bacteriaDict}

print(lifeDict)

# pull sub dictionary under eukaryote
print(lifeDict['eukaryotes'])

# pull data on listeria in life dictionary
print(lifeDict['bacteria']['listeria'])

{'bacteria': {'ecoli': 'some are bad', 'listeria': 'uses actin'}, 'eukaryotes': {'rat': 'small and furry', 'fish': 'tasty and aquatic', 'bird': 'not all fly'}}
{'rat': 'small and furry', 'fish': 'tasty and aquatic', 'bird': 'not all fly'}
uses actin


# Some Dictionary Uses

To reiterate, dictionaries are good for obtaining values associated to keys. Here, we go over some examples of how dictionaries can be used.

## Let's get protein sequence from cDNA sequence

Let's open cdna2prot.py and work from there. I'll put some snippets of code below to show how it works.

In [11]:
'''
Here's the codon table function
'''
def codon_table():
    '''
    This function will create the codon table
    Returns a dictionary of codons and their corresponding AA
    '''
    bases = ['T', 'C', 'A', 'G']
    codons = [a+b+c for a in bases for b in bases for c in bases]
    amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
    codon_table = dict(zip(codons, amino_acids))

    return codon_table

codonTable = codon_table()

print(codonTable)

{'CTT': 'L', 'TAG': '*', 'ACA': 'T', 'ACG': 'T', 'ATC': 'I', 'AAC': 'N', 'ATA': 'I', 'AGG': 'R', 'CCT': 'P', 'ACT': 'T', 'AGC': 'S', 'AAG': 'K', 'AGA': 'R', 'CAT': 'H', 'AAT': 'N', 'ATT': 'I', 'CTG': 'L', 'CTA': 'L', 'CTC': 'L', 'CAC': 'H', 'AAA': 'K', 'CCG': 'P', 'AGT': 'S', 'CCA': 'P', 'CAA': 'Q', 'CCC': 'P', 'TAT': 'Y', 'GGT': 'G', 'TGT': 'C', 'CGA': 'R', 'CAG': 'Q', 'TCT': 'S', 'GAT': 'D', 'CGG': 'R', 'TTT': 'F', 'TGC': 'C', 'GGG': 'G', 'TGA': '*', 'GGA': 'G', 'TGG': 'W', 'GGC': 'G', 'TAC': 'Y', 'TTC': 'F', 'TCG': 'S', 'TTA': 'L', 'TTG': 'L', 'TCC': 'S', 'ACC': 'T', 'TAA': '*', 'GCA': 'A', 'GTA': 'V', 'GCC': 'A', 'GTC': 'V', 'GCG': 'A', 'GTG': 'V', 'GAG': 'E', 'GTT': 'V', 'GCT': 'A', 'GAC': 'D', 'CGT': 'R', 'GAA': 'E', 'TCA': 'S', 'ATG': 'M', 'CGC': 'R'}


In [13]:
'''
Let's see if the amino acid converter works
'''
def convert_AA(sequence = 'AAACCCGGGTTT'):
    # initialize codon table and amino acid sequence variable
    codonTable = codon_table()
    aaSeq      = []

    #loop through sequence
    for i in range(0, len(sequence), 3):
        codon = sequence[i:i+3]

        if codonTable[codon] != '*': # if the codon is not a stop codon, add AA
            aaSeq.append(codonTable[codon])
        else: # if codon is a stop codon, stop the loop--we'll only sequence up to this point
            break

    return ''.join(aaSeq)

print(convert_AA())

KPGF


# Dictionary psets