# Module 3: Working with data - Dictionaries and Tuples



In this lesson, we will learn how to use Python to work with information stored in files. Solving the world's problems requires having the right information at hand to make decisions. In our camp we will use data (information) that has been collected by other people and stored in special files that we can use.

Before we dive into data files, we need to understand a little bit about how Python can help use use and understand information. One of the ways to do this is to use the lists that we covered in Module 1 -- Another is to use *tuples* to group things and *dictionaries* to relate information together.

Let's start by looking at an example problem. Imagine we have health information collected from several people and are trying to understand the relationships between smoking and cancer. Let's say we have the following data:

| Name    | Age | Location  | Smoker? | Has Cancer? |
|---------|-----|-----------|:-------:|:-----------:|
| Alice   | 50  | Tennessee |    Y    |      Y      |
| Bob     | 55  | Arkansas  |    N    |      Y      |
| Charlie | 33  | Tennessee |    Y    |      N      |


How do we use this data in Python? There isn't an obvious way with the tools we have to do this, but we *could* make do. Check out this approach:

## The hard way.

In [1]:
names     = [ "Alice", "Bob", "Charlie" ]
ages      = [ 50, 55, 33 ]
locations = [ "Tennessee", "Arkansas", "Tennessee" ]
smokes    = [ True, False, True ]
cancer    = [ True, True, False ]

We created *five* lists, one for each column of the table. Let's see how we can work with this data. Let's try to print the people in our table that are smokers. We'll need to use the `range` and `len` functions and a loop.

In [2]:
for index in range(len(smokes)):
    if (smokes[index] == True):
        print(names[index])


Alice
Charlie


Let's try to unpack all of this. First let's look at `len(smokes)`. This should give us back the number 3, since there are three entries in the `smokes[]` list.

In [3]:
len(smokes)

3

When we pass in the number 3 to range, this creates a list of numbers: 0, 1, 2. There are three numbers, starting at zero. For a list with 3 elements, the last element in the list will be element 2. (This is why `range` stops one less than the number we provide). So let's take a look at our loop structure now:

In [4]:
for index in range(len(smokes)):
    print(index)

0
1
2


When we use `smokes[index]` we are looking at that element in the list. Let's change our loop to look at the values explicitly:

In [5]:
for index in range(len(smokes)):
    print(smokes[index])

True
False
True


Now, we can use an `if` statement the current value (index) in the list is true or not. If it is true, then the person is a smoker. 

The first elements of all five lists correspond to Alice, the second elements go with Bob, etc.

So, if the first and third elements of `smokes[]` are true, that means that Alice and Charlie are the smokers. We can use the same index values 0 and 2 (for the first and third list items) to get their names from the `names[]` list.

We could change this just a little bit to get a count of the number of smokers in our data set. Notice here we're changing the if statement just a little, since the things in the `smokes[]` list are true or false. We can skip the `== True` part.

In [6]:
count = 0
for index in range(len(smokes)):
    if (smokes[index]):
        count = count + 1
print("The number of smokers is",count)

The number of smokers is 2


## The easier way -- Using tuples.

Keeping track of multiple lists is tricky and can be complicated if we have a lot of data to work with. To help us keep things simple, we can use two other features in Python. 

The first feature is called a *tuple*, and is used to group related information together. We can create a tuple  variable for Alice like this:

In [7]:
aTuple = ("Alice", 50, "Tennessee", True, True)

We can access a part of the tuple much like a list. Starting from 0, we can pick the parts of the tuple we want.

In [8]:
print(aTuple[0], "is", aTuple[1], "years old and is from", aTuple[2])

Alice is 50 years old and is from Tennessee


We can also take a tuple and break it back out into its component parts:

In [9]:
(name, age, loc, smokes, cancer) = aTuple
print (name, "is", age, "years old and is from", loc)

Alice is 50 years old and is from Tennessee


With tuples, we can store an entire row of our initial data table in a single variable. We *could* make a big list of tuples and work with that. Let's try it and see how we it would look to print out the names of all of the smokers. First we have to create our data structure. Normally we can get this data from a file and we won't have to type in ourselves.

In [10]:
people_list = [ ("Alice", 50, "Tennessee", True, True), ("Bob", 55, "Arkansas", False, True), ("Charlie", 33, "Tennessee", True, False) ]
print(people_list)

[('Alice', 50, 'Tennessee', True, True), ('Bob', 55, 'Arkansas', False, True), ('Charlie', 33, 'Tennessee', True, False)]


Now let's write a little bit of Python to print the names of the smokers:

In [11]:
for tup in people_list:    # for every tuple in the list
    if tup[3]:             # if the fourth field in the tuple (smokes) is true
        print(tup[0])

Alice
Charlie


Not bad, eh? We can even make it a little easier to read, by combining the things we've learned. Instead of setting `tup` to be the whole tuple, let's break it out into parts:

In [12]:
for (name, age, loc, smokes, cancer) in people_list:
    if smokes:
        print(name)

Alice
Charlie


# Dictionaries

The last big Python feature we'll look at before we start working on some real data sets is the *dictionary*. A dictionary that you might use for school is a collection of words and their definitions. 

In Python, dictionaries do a very similar thing, but we don't have to limit ourselves to words and definitions. A dictionary contains a set of *keys* and each key is matched to a *value*. In a regular dictionary, the word we're looking up is the key, and the definition is the value. Let's see how this works by using a Python dictionary as a regular dictionary. There are a couple of ways to do it.

In [None]:
dictionary = {}      # create an empty dictionary
dictionary['duck'] = "A waterbird with a broad blunt bill, short legs, webbed feed, and a waddling gait."

print("The definition of duck is: '", dictionary['duck'],"'")

We can also create a dictionary with multiple entries at once. We just need to provide both the key and value for each entry in the dictionary.

In [None]:
mascots = { 'Rhodes' : 'Lynx', 'Memphis' : 'Tigers', 'CBU' : 'Buccaneers' }

print("The mascot for Memphis is the", mascots['Memphis'])

One of the best things about dictionaries is that you can find things in them with a name, instead of having to remember the place number that we have to use with tuples and lists.

In [None]:
mascotList = [ 'Lynx', 'Tigers', 'Buccaneers' ]

print("The mascot for Memphis is the", mascotList[1])

Same result, but with a dictionary, we can refer the Tigers as the Memphis mascot, but with a list we have to remember that Memphis is the second entry in the list (index 1).

Finally, let's go back to our original data table and create it with a dictionary of tuples, instead of a list of tuples.

In [None]:
people_dict = { "Alice" : (50, "Tennessee", True, True), 
                "Bob" : (55, "Arkansas", False, True), 
                "Charlie" : (33, "Tennessee", True, False) 
              }

print(people_dict['Bob'])

To loop over a dictionary, we need a new helper `keys()` which gives us all the keys in the dictionary:

In [None]:
people_dict.keys()

Now we can use this in a loop to go over all of the keys (names) in the dictionary.

In [None]:
for person in people_dict.keys():
    print(person, "is", people_dict[person][0], "years old") # the zero is the first part of the tuple, the agea

If we wanted to print out all of the smokers just like our list/tuple example, it would look like this:

In [None]:
for person in people_dict.keys():
    (age, loc, smokes, cancer) = people_dict[person]  # break out the tuple into parts
    if smokes:                                        # if the current person smokes
        print(person)                                 #   print out their name

## Congrats! You are a Python Master!