<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

Adapted by Sarah Connell from a notebook created by [Nathan Kelber](http://nkelber.com) and Ted Lawless for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/). See [here](https://ithaka.github.io/tdm-notebooks/book/all-notebooks.html) for the original version. Some contents were adapted from teaching notebooks created by Laura Nelson, University of British Columbia, and from [Python for Everybody](https://www.py4e.com/). Warm thanks to Kate Kryder, Data Analysis & Visualization Specialist at Northeastern University, for helping to develop these notebooks.<br />
___

# Python Data Structures

So far, we've seen a few of Python's inbuilt data types: integers, floats, strings, and Booleans. This lesson will cover two additional data types: [tuples](https://constellate.org/docs/key-terms/#tuple) and [dictionaries](https://constellate.org/docs/key-terms/#dictionary). These help us store many values inside of a single variable.

The fundamental difference between a tuple and a dictionary is that a tuple stores items in sequential order (starting from 0) while a dictionary stores items in key/value pairs. When we want to retrieve an item in a tuple, we use an index number or a set of index numbers as a reference. When we want to retrieve an item from a dictionary, we supply a key that returns the value (or set of values) associated with that key. Another important difference is that tuples are **immutable**, which means they can't be changed, but dictionaries are **mutable** and can be changed after they are initialized.

Each of these approaches can be beneficial depending on what kind of data we are working with, and what we intend to do with the data.

## Tuples

A tuple can store anywhere from zero to millions of items. The items that can be stored in a tuple include the data types we have already learned: integers, floats, and strings—and a tuple may contain different data types. A tuple assignment statement takes the form:

`my_tuple = (item1, item2, item3, item4)`

with the items separated by commas and the tuple enclosed in parentheses.

In [None]:
# A tuple containing integers
my_favorite_numbers = (17, 19, 100)
print(my_favorite_numbers)

In [None]:
# A tuple containing strings
my_favorite_philosophers = ('Margaret Cavendish', 'Mary Wollstonecraft', 'Hannah Arendt')
print(my_favorite_philosophers)

Both `my_favorite_numbers` and `my_favorite_philosophers` have three items, but we could have also initialized them with no items `my_favorite_numbers = ()` or many more items. Each item has an **index number** that depends on their order. The first item is 0, the second item is 1, the third item is 2, etc. In the `my_favorite_philosophers` tuple, `'Hannah Arendt'` is item 2.

To retrieve an item from a tuple, we put the name of the tuple, followed by the index number for the item we want to retrieve in square brackets.

In [None]:
# Retrieving an item in a tuple
my_favorite_philosophers[2]

What do you think will happen if we change the index number to 1? What about 3? 

Tuples can also contain other tuples. To retrieve a value from a tuple within a tuple, we use two indexes (or indices).

In [None]:
# Retrieving an item from a tuple within a tuple
my_favorite_philosophers = (('Margaret Cavendish', 'Mary Wollstonecraft', 'Hannah Arendt'),
                            ('Anne Conway', 'Mary Astell', 'Judith Butler'))
my_favorite_philosophers[0][2]

How would you change the index above to retrieve Mary Astell?

We can retrieve a group of consecutive items from a tuple using [slices](https://constellate.org/docs/key-terms/#slice) instead of a single index number. We create a **slice** by indicating a starting and ending index number, separated by a colon. The slice contains all the items between our starting and stopping index number.

In [None]:
# Taking a slice of a tuple
historical_periods = ('Classical Antiquity',
                      'Early Middle Ages', 
                      'High Middle Ages', 
                      'Late Middle Ages', 
                      'Early Modern Period', 
                      'Late Modern Period', 
                      'Contemporary History')
historical_periods[3:5]

Notice that the second index in a slice is the stopping point. This can be confusing if you were expecting three items instead of two. One way to remember this is by subtracting the indexes in your head (5 - 3 = 2 items).

It is not uncommon for tuples to be hundreds or thousands of items long.  If you want to know the length of a tuple, you can use the `len()` function.

In [None]:
# Using the len() function to discover the number of items in the tuple
len(historical_periods)

### The `in` and `not in` Operators

If we have a long tuple, it may be helpful to check whether a value is in the tuple. We can do this with the `in` and `not in` operators, which return a boolean value: **True** or **False**.

In [None]:
# Checking whether an item is in a tuple using the `in` operator

# Create a tuple called `restaurants_near_northeastern`
restaurants_near_northeastern = ('B.GOOD',
 'Starbucks',
 'Dunkin Donuts' ,
 'Amelias Taqueria',
 'Tatte',
 'Sweet Tomatoes',
 'Mamacita',
 'Kigo Kitchen',
 'QDOBA',
 'Popeyes',
 'University House of Pizza',
 'Boston Shawarma',
 'Gyroscope',
 'Our House East',
 'Caffe Strega',
 'Ginger Exchange',
 'Pho and I',
 'Panera')

# Check whether a restaurant is in restaurants_near_northeastern
'Tatte' in restaurants_near_northeastern

True

In [None]:
# Is "Wagamama" in restaurants_near_northeastern? 
# What about "Qdoba"? 
# What happens if we change `in` to `not in`?


**Note:** We won't be covering them in this lesson, but if you continue working with Python you will probably encounter another important data type: **lists**. Lists are very similar to tuples, but, unlike tuples, they can be changed after they are initialized. That is, the main difference between lists and tuples is that lists are **mutable**. Tuples are immutable; they cannot be changed once they are initialized. If you want to learn more about lists, see [here](https://www.w3schools.com/python/python_lists.asp).

## Dictionaries

Like a tuple, a [dictionary](https://docs.constellate.org/key-terms/#dictionary) can hold many values within a single variable. We have seen that the items of a tuple are stored in a strictly-ordered fashion, starting from item 0. In a dictionary, each [value](https://docs.constellate.org/key-terms/#key-value-pair) is stored in relation to a descriptive [key](https://docs.constellate.org/key-terms/#key-value-pair) forming a [key/value pair](https://docs.constellate.org/key-terms/#key-value-pair). This structure makes dictionaries very useful, because you can supply a key and receive a value without needing to refer to a specific index number. You are not allowed to have duplicate keys in Python dictionaries; each key can be used only once.

Whereas a tuple is typed with parentheses `()`, a dictionary is typed with braces `{}`.  

`example_dictionary = {key1 : value1, key2 : value2, key3 : value3}`

Another important difference between dictionaries and tuples is that dictionaries are mutable: they can be changed after they are initialized. 

The values in dictionaries can be any data type. The keys can be most data types, including integers, floats, and strings; however, dictionary keys cannot be **mutable** data types like lists or dictionaries. You can have a mix of different data types within a dictionary.

Here is an example dictionary with the menu items from a restaurant as **keys** and their prices as **values**. 


In [None]:
# An example of a dictionary storing menu items and prices
breakfast_menu ={
 'Breakfast Sandwich': 9.75,
 'Croissant Breakfast Sandwich': 11.0,
 'Biscuit Sandwich': 9.0,
 'Spinach, Sunchoke, & Egg Plate': 11.0,
 'Salmon, Avocado, and Egg Sandwich': 11.50,
 'Scrambled Egg Plate': 9.75,
 'Museli': 6.50,
 'Hash': 14.50,
 'Egg in a Hole': 12,
 'Croque Madame': 13.50,
 'Bread & Butter': 6.0}

In [None]:
# Let's take a look at what we just created
print(breakfast_menu)

In [None]:
# If you don't like the look of that, you can use this code instead
# We'll talk more about what is happening here in our lesson on importing functions
from pprint import pprint
pprint(breakfast_menu)

We relied on order to reference the items in our tuples by their index numbers. Dictionaries are different. We use the keys to look up corresponding values, in this format:
`dictionary_name[key]`

For example: 

In [None]:
breakfast_menu['Breakfast Sandwich']

The key `'Breakfast Sandwich'` always maps to the value `9.75` so the order of the items doesn’t matter.

This is very different from tuples, where changing the order of the items will also change the item that is retrieved for a particular index number. 

For example:

In [None]:
# Re-initializing my_favorite_philosophers if this is a new session
my_favorite_philosophers = ('Margaret Cavendish', 'Mary Wollstonecraft', 'Hannah Arendt')

In [None]:
# Retrieving an item from the tuple
my_favorite_philosophers[0]

'Margaret Cavendish'

In [None]:
# Overwriting the tuple with a different order of items
my_favorite_philosophers = ('Hannah Arendt', 'Mary Wollstonecraft', 'Margaret Cavendish')


In [None]:
# Retrieving a different item with the same index number
my_favorite_philosophers[0]

Try it yourself! In the code block below, call up the values for `Museli` and for `Egg in a Hole`. Make sure to pay attention to the capitalization–remember that everything is case sensitive!

In [None]:
# Fill in your code here
# What happens if you enter a key that is not in our dictionary?

We noted above that dictionaries, unlike tuples, are mutable: we can change them. For example, we might want to add a key/value pair to our dictionary, like so:

In [None]:
breakfast_menu['Waffle'] = 12.00

In [None]:
# When we print breakfast_menu, we can see that it has been updated:
print(breakfast_menu)

# Or, un-comment-out the line below to use the "pretty print" function
# You will need to comment out the first line, unless you want to print the dictionary twice!
# And, be careful about the indentation!
#pprint(breakfast_menu)

You can also change the values in a dictionary. Maybe we want to raise the price of the Scrambled Egg Plate:

In [None]:
breakfast_menu['Scrambled Egg Plate'] = 20.00

In [None]:
# In this code block, call up the value of the 'Scrambled Egg Plate' key to confirm our change


We can use the `type` function with dictionaries and tuples, just like we did with other data types.

In [None]:
# Run this to check the type of a tuple, then check the type of breakfast_menu to check a dictionary
type(my_favorite_philosophers)

The `len` function works on dictionaries; it returns the number of key/value pairs:

In [None]:
len(breakfast_menu)

The `in` operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing as a value is not good enough).

In [None]:
'Hash' in breakfast_menu

In [None]:
9.75 in breakfast_menu

**Note**: You won't need it for this project, but if you're curious, here is how you would check to see if a value is in a dictionary:

In [None]:
9.75 in breakfast_menu.values()

We can also use dictionaries and tuples in conditional statements. For example, the code below initializes a variable called `menu_item` and then uses an `if` statement to check whether `menu_item` is a key in the `breakfast_menu` dictionary. 

If it is, the code prints the concatenation of "The price is $ " and the value associated with `menu_item` (converted to a string). If `menu_item` is not a key in the `breakfast_menu` dictionary, the code prints instead "Sorry, this is not in our menu!"

**Note**: This example shows a simpler approach to printing than concatenation; you just print the string and then print the value. This means that we don't need to change the data type of the value we are retrieving to concatenate it with a string. However, if the spacing bothers you, you could fix this by using concatenation instead. 

In [None]:
# Run this, then try changing menu_item to look up a different string
menu_item = 'Hash'
if menu_item in breakfast_menu:
    print('The price is $',breakfast_menu[menu_item])
else:
    print('Sorry, this is not in our menu!')

The code above might also have been written using the `input()` function to make it a bit more flexible:

In [None]:
# Using `input()` to initialize the menu_item variable
menu_item = input("What would you like for breakfast? ")
if menu_item in breakfast_menu:
    print('The price is $',breakfast_menu[menu_item])
else:
    print('Sorry, this is not in our menu!')

As we saw above, Python data structures can be nested within each other. For example, you might have a set of tuples within a dictionary. 

**Note**: Just a quick reminder that mutable data types like lists and dictionaries cannot be keys in dictionaries.

In [None]:
breakfast_menu_sizes ={
 ('Breakfast Sandwich', 'regular'): 9.75,
 ('Breakfast Sandwich', 'large'): 10.75,
 ('Croissant Breakfast Sandwich', 'regular'): 11.00,
 ('Croissant Breakfast Sandwich', 'large'): 12.00,
 ('Biscuit Sandwich', 'regular'): 9.00,
 ('Biscuit Sandwich', 'large'): 10.00,}

The code above initializes a dictionary that contains a set of key/value pairs. Each of the keys is a tuple, which itself contains two strings. Each of the values is a float. This kind of nesting can be confusing at first, but it gets easier if you read through the structure carefully and you remember how the different data types are expressed. 

You can use the same tools you've already learned to retrieve or modify the items in this dictionary. For example:

In [None]:
breakfast_menu_sizes[('Breakfast Sandwich', 'regular')]

Or, you might have dictionaries that contain other dictionaries. These are called **nested** dictionaries. 

For example, each of the values in `cavendish_publications` below is itself a dictionary. 

In [None]:
# Initialize the cavendish_publications dictionary
cavendish_publications = {
  "observations" : {
    "title" : "Observations Upon Experimental Philosophy",
    "pub_date" : 1667,
    "genre" : "natural philosophy"
  },
  "william" : {
    "name" : "The Life of the Thrice Noble, High and Puissant Prince William Cavendishe",
    "year" : 1666,
    "genre": "biography"
  },
  "blazing" : {
    "name" : "The Blazing World",
    "year" : 1666,
    "genre": "science fiction"
  }
}

In [None]:
# You can retrieve items from nested dictionaries like so
cavendish_publications['blazing']['genre']

## Sets 

Our focus here has been on tuples and dictionaries, but we are briefly going to discuss one more data type: **sets**. Sets are the last of the four data types in Python that can store collections of data. To recap: these are tuples, lists, sets, and dictionaries.

Sets are expressed like so:

`example_set = {"item1", "item2", "item3"}`

Sets are **unordered** and **unindexed**. That means that the items in a set could be in a different order any time you reference them. You cannot refer to the items in a set by index numbers, as you do with tuples. You'll also note from our example that the items in a set are not keyed, as with dictionaries. Because sets are unindexed, you are not allowed to have duplicate values.

You cannot change the items in a set once the set has been initialized. However, you can remove items or add new ones. Taking our generic example, this means that we could remove `item3` or add `item4`, but we couldn't change `item1` to `item_one`.

Sets may contain a mix of different data types. However, sets cannot contain **mutable** data types like dictionaries or lists.

In [None]:
# Initializing an example set
my_favorites = {"green", 19, "lily of the valley", "blackberries", "cookie dough"}

In [None]:
# We can use the `len()` function to determine the length of a set
len(my_favorites)

In [None]:
# And, we can use `type()` as well
type(my_favorites)

One thing that makes sets useful is that they do not allow repeat items: this means that putting items into a set is an effective way of removing any duplicates. Sets are also useful for certain math operations that we aren't going to get into here but that you'll see in the future.

In [None]:
# If you do put duplicate values in a set, they will be ignored
my_absolute_favorites = {"green", 19, "lily of the valley", "blackberries", "cookie dough","green"}
print(my_absolute_favorites)

That's it on tuples, dictionaries, sets, and lists! In our next lesson, we'll cover how to write code that can iterate through these data structures and perform repeated tasks.

# Practice Exercises
As with the previous two lessons, you should first try running the quick exercises in this notebook, and practice making changes and testing their results. 

**Exercise One**

Initialize a tuple called `class_topics` that contains the following strings, in this exact order: "Introduction to the Course", "Types of Inference", "Probability Theory", "Programming Fundamentals", "Bayesian Inference", "Probabilistic Graphical Models", "Information Theory", and "Advanced Topics".

In [None]:
# Initialize the class_topics tuple here

Now, using its index number, retrieve "Probability Theory" from the `class_topics` tuple.

In [None]:
# Fill in your code here

Finally, take a slice that retrieves "Bayesian Inference", "Probabilistic Graphical Models", and "Information Theory".

In [None]:
# Fill in your code here

**Exercise Two**

Initialize a dictionary called `snowfall_totals` with the following key/value pairs:
<br>Boston: 24.5
<br>Brookline: 15
<br>Cambridge: 14
<br>Framingham: 12.2
<br>Malden: 20
<br>Wakefield: 21.2

Make sure to change the town names to strings!

In [None]:
# Initialize your dictionary 
# Then, use `len` to check the length


Now, add a new key/value pair to your dictionary:
<br>Norwood: 19.5

In [None]:
# Add the new key/value pair for Norwood's snowfall
# Then use `len` to check the length


This just in! Wakefield actually got 22.4 inches of snowfall. Update the value in the code block below. 

In [None]:
# Update the value for Wakefield to 22.4


Finally, call up the value of Wakefield to confirm that your change went through. 

In [None]:
# Call up the value of Wakefield


**Exercise Three**

Below is a dictionary with information on the Boston subway system, lightly modified for convenience. Each **key** is a tuple with two strings: the line and the direction (in Boston, "C" can be a direction). Each **value** is a string with the last stop of that line and direction.

In [None]:
t_stops = {("orange", "north"):"Oak Grove",
           ("orange", "south"):"Forest Hills",
           ("blue", "north"):"Wonderland",
           ("blue", "south"):"Bowdoin",
           ("red", "north"):"Alewife",
           ("red", "southeast"):"Braintree",
           ("red", "southwest"):"Mattapan",          
           ("green", "north"):"Lechmere",
           ("green", "B"):"Boston College",
           ("green", "C"):"Cleveland Circle",
           ("green", "D"):"Riverside",
           ("green", "E"):"Heath Street"}

Now, we will construct a program using `if` and `else` that does one of two things:
1. If the user input matches a key from the dictionary, the program prints out "Your last stop is " concatenated with the value for that key.
2. If the user input does not match a key from the dictionary, the program prints "Are you sure you're on the T?"

We've provided some code to get you started with initializing the `my_direction` variable based on user input. You might want to run the template code first and then print `my_direction` to get a better sense of what you will be working with.

**Hint**: You can look above in this notebook to the program for printing prices based on menu items for a model.

In [None]:
# This code will initialize the my_direction tuple based on user input
my_line = input("What line are you on? ")
my_heading = input("Which way are you going? ")
my_direction = (my_line, my_heading) 

# Fill in your code here to write a program that either tells the user their last stop or asks if they are on the T

# Solutions
These are some sample solutions, but (as we've already noted) you might have taken a different approach. 

In [None]:
# Exercise One
# Initialize the class_topics tuple
class_topics = ("Introduction to the Course", "Types of Inference", "Probability Theory", "Programming Fundamentals", "Bayesian Inference", "Probabilistic Graphical Models", "Information Theory", "Advanced Topics")

In [None]:
# Exercise One
# Retrieve "Probability Theory"
class_topics[2]

In [None]:
# Exercise One
# Retrieve "Bayesian Inference", "Probabilistic Graphical Models", and "Information Theory"
class_topics[4:7]

In [None]:
# Exercise Two
# Initialize your dictionary 
snowfall_totals = {"Boston": 24.5,
"Brookline": 15,
"Cambridge": 14,
"Framingham": 12.2,
"Malden": 20,
"Wakefield": 21.2}

In [None]:
# Exercise Two
# Check the length of the dictionary
len(snowfall_totals)

In [None]:
# Exercise Two
# Add the new key/value pair for Norwood's snowfall 
# Then use `len` to check the length 
snowfall_totals['Norwood'] = 19.5
len(snowfall_totals)

In [None]:
# Exercise Two
# Update the value for Wakefield to 22.4
snowfall_totals['Wakefield'] = 22.4

In [None]:
# Exercise Two
# Call up the value for Wakefield
snowfall_totals['Wakefield']

In [None]:
# Exercise Three
# Initialize the t_stops dictionary
t_stops = {("orange", "north"):"Oak Grove",
           ("orange", "south"):"Forest Hills",
           ("blue", "north"):"Wonderland",
           ("blue", "south"):"Bowdoin",
           ("red", "north"):"Alewife",
           ("red", "southeast"):"Braintree",
           ("red", "southwest"):"Mattapan",          
           ("green", "north"):"Lechmere",
           ("green", "B"):"Boston College",
           ("green", "C"):"Cleveland Circle",
           ("green", "D"):"Riverside",
           ("green", "E"):"Heath Street",}

# Initialize the my_direction tuple based on user input
my_line = input("What line are you on? ")
my_heading = input("Which way are you going? ")
my_direction = (my_line, my_heading)

# Use `if` and `else` to either inform the user of their last stop or ask if they are on the T
if my_direction in t_stops:
    print("Your last stop is " + t_stops[my_direction]) 
else:
    print("Are you sure you're on the T?")