# Dictionaries

The final major **data structure** we'll teach you in this class is called a Python **dictionary**. Essentially, a dictionary acts as a "lookup table" -- just like a real life dictionary. If you're wondering what a definition of a particular word is (pretend this is 20ish years ago, before Google), you open up a dictionary starting around where you expect that word to be alphabetically, and flip the pages forwards or backwards until you find it. The definition is listed right next to that word.

Let's start with an example that hopefully illustrates how a dictionary can be useful in programming. Pretend you have a data file of occurences that lists the location and country of a particular species. It looks something like this:

| occurenceID | locality | country |
|-------------|----------------|--------------------------|
| ABC001 | Washington, DC | United States of America |
| ABC002 | Mexico City | Mexico |
| ABC003 | New York City | USA |
| ABC004 | Toronto | Canada |
| ABC005 | Miami | USA |
| ... | ... | ... |

Through the magic of Python, you've parsed this file and now you have all of the values for the country column in a Python list.

In [None]:
countries = ['United States of America',
            'Mexico',
            'USA',
            'Canada',
            'USA']

Now, knowing what we've covered so far in class, how would you answer the following question:

*How many of these occurrences were in the USA?*

In [None]:
usa_counter = 0
for country in countries:
    if country == 'United States of America':
        usa_counter += 1
    elif country == 'USA':
        usa_counter += 1
print(usa_counter)

Ok, we were able to figure that one out, but now imagine that the list is actually much longer, and we see a mix of country codes and full country names for other countries too. We start to get to a point, but trying to handle this with *if* statements gets out of control.

A big advantage of programming is to write code that we can use over and over again. Let's say we know that this organism only exists in North America, so let's create a **dictionary** that stores the code and full country name for the countries of North America. The codes on the left are called dictionary **keys**, and the full names on the right are called dictionary **values**.

In [None]:
country_codes = {'CAN': 'Canada',
                 'MEX': 'Mexico',
                 'USA': 'United States of America'}

Now that we've created a Python dictionary, let's go through some of the things we can do with it.

To access a value, we use a similar method as with lists, and put the key inside brackets.

In [None]:
print(country_codes['USA'])

To add a new entry to a dictionary, just set a value equal to a new key. 

*If the key is already in the dictionary, the value will be overwritten.*

In [None]:
country_codes['GTM'] = 'Guatemala'
print(country_codes['GTM'])

To combine dictionaries, we use the .update() method. Unfortunately, it's not as easy as dict1 + dict2.

In [None]:
more_country_codes = {'BLZ': 'Belize',
                      'SLV': 'El Salvador',
                      'HND': 'Honduras',
                      'NIC': 'Nicaragua',
                      'USA': 'United States of America'}
country_codes.update(more_country_codes)
print(country_codes)

Running a for-loop on a dictionary by itself only gives you the key.

In [None]:
for code in country_codes:
    print(code)

To access both key and value of dictionary at the same time, you need to use items() method.

In [None]:
for key, value in country_codes.items():
    print(key, value)

Just like with lists, we can have *nested* dictionaries. Pretend that we also want to be able to handle 2-letter country codes as well, but we want to keep them separate.

In [None]:
two_letter_codes = {'BZ': 'Belize',
                    'CN': 'Canada',
                    'HN': 'Honduras',
                    'MX': 'Mexico',
                    'NI': 'Nicaragua',
                    'US': 'United States of America',
                    'SV': 'El Salvador',
                    'GT': 'Guatemala'}

In [None]:
country_lookup = {2: two_letter_codes,
                  3: country_codes}

In [None]:
import json
print(json.dumps(country_lookup, indent=2, sort_keys=True))

Now let's bring it all together in a final example. 

We now have a bigger list that contains full names, 3-letter codes, and 2-letter codes. Also, we don't just want to count the number of USA samples, but we want a count for each country that shows up.

In [None]:
country_list = ['MEX','United States of America', 'Mexico', 'US', 'Canada', 'USA', 'NIC',
                'SV', 'El Salvador', 'CAN', 'GTM', 'Guatemala', 'BEL']

Here's a quick skeleton of how we would handle this problem, but it contains some bugs. Let's try and work through those as a class...

In [None]:
country_counter = {}
for country in country_list:
    if len(country) == 2:
        country = country_lookup[2][country]
    elif len(country) == 3:
        country = country_lookup[3][country]
    print(country)
    country_counter[country] += 1
        
print(country_counter)