<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

Adapted by Sarah Connell, Dipa Desai, and Emre Tapan from a notebook created by [Nathan Kelber](http://nkelber.com) and Ted Lawless for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/). See [here](https://ithaka.github.io/tdm-notebooks/book/all-notebooks.html) for the original version. Some contents were adapted from teaching notebooks created by Laura Nelson, University of British Columbia, and from [Python for Everybody](https://www.py4e.com/). Warm thanks to Kate Kryder, Data Analysis & Visualization Specialist at Northeastern University, for helping to develop these notebooks.<br />
___

# Python Data Structures

So far, we've seen a few of Python's inbuilt data types: integers, floats, and strings. **We've also used lists without fully defining them. This lesson will talk about how lists store data**, and cover two additional data types: [tuples](https://constellate.org/docs/key-terms/#tuple) and [dictionaries](https://constellate.org/docs/key-terms/#dictionary). These help us store many values inside of a single variable.

The fundamental difference between a tuple and a dictionary is that a tuple stores items in sequential order (starting from 0) while a dictionary stores items in key/value pairs. When we want to retrieve an item in a tuple, we use an index number or a set of index numbers as a reference. When we want to retrieve an item from a dictionary, we supply a key that returns the value (or set of values) associated with that key. Another important difference is that tuples are **immutable**, which means they can't be changed, but dictionaries are **mutable** and can be changed after they are initialized.

Each of these approaches can be beneficial depending on what kind of data we are working with, and what we intend to do with the data.

For your class assignments using Python, you will be asked to retrieve and change data that is stored in mutable data containers like lists and dictionaries. We will purposefully be spending more time on lists and dictionaries to get you ready for these assignments.

###Lists

In our last workshop, we used iterations to execute commands on each element within a list. We saw that lists can store integers, float values, and strings. We also saw that lists are changeable, or **mutable**. 

Lists can store anywhere from a zero to millions of values, and it stores these values in order.

To initialize a list, we use a list definition statement, in the following format: 

`my_list = [value0, value1, value2, value3, ...]`

Since data in lists are stored in a set order, individual values can be referenced by the value's [index number](https://constellate.org/docs/key-terms/#index-number), or position in the list. 

Recall that in the last workshop, we used the for loops to count each value in a list, and we saw that Python begins counting each element in the list starting at index number 0.


In [None]:
#Let's initialize a list. 
my_list = [85, 89, 81, 90, 96, 100]

So, if we want to retrieve only `value1` in `my_list`, we would use the following code block:

`my_list[index number]`

In [None]:
# Retrieving an item in a list
my_list[1]  #We want to retrieve the second element in the list. Since the index number begins at 0, we refer to index number 1 to retrieve the second element in the list.

What happens when you change the index number to 3? Try it for yourself!

We can also retrieve slices of lists by referring to the index numbers. To do this, we use the following code block:

`my_list[starting index number : ending index number]`

This is especially useful when you want to retrieve a subset of data from a larger list of values. 

In [None]:
#Retrieving a slice from a list
my_list[1:4] 

In the above code, the second index number is the stopping point. This can be confusing if we were expecting 4 values as the output. One easy way to keep track of this is to subtract the index numbers to check how many values should be in the output:

(4 - 1 = 3)


We saw last week that we can modify lists. To replace an individual element in a list, we can refer to the element's index number and use an assignment statement to re-assign the element's value. Let's take a look at this.

For example, let's say my_list represents your grades over the past few assignments. You did some extra credit to get 5 points added to your lowest grade. Let's use the index number to specify the lowest value in `my_list` and tell Python to re-assign it with a new value.

In [None]:
my_list = [85, 89, 81, 90, 96, 100] #The lowest value has a index number of 2.
my_list[2]= (81+5) #Re-assigning the value to be 81+5
my_list #Recall my_list to double-check the change was executed

We can also use the len function on lists to check how long our lists are. This is useful if you have been making changes to your lists and want to keep track of how many data elements are stored in your list.

In [None]:
#Check how many elements are in your list with the len function
len(my_list)

Let's move on to looking at tuples! We will see in the next section that lists and tuples look similar, but a key difference between lists and tuples is that lists are mutable. Tuples are not changeable after they are initialized.

## Tuples

A tuple can store anywhere from zero to millions of items. The items that can be stored in a tuple include the data types we have already learned: integers, floats, and strings—and a tuple may contain different data types. A tuple assignment statement takes the form:

`my_tuple = (item0, item1, item2, item3)`

with the items separated by commas and the tuple enclosed in parentheses.

In [None]:
# A tuple containing integers
my_favorite_numbers = (17, 19, 100)
print(my_favorite_numbers)

In [None]:
# A tuple containing strings
my_favorite_philosophers = ('Margaret Cavendish', 'Mary Wollstonecraft', 'Hannah Arendt')
print(my_favorite_philosophers)

Both `my_favorite_numbers` and `my_favorite_philosophers` have three items, but we could have also initialized them with no items `my_favorite_numbers = ()` or many more items. Each item has an [index number](https://constellate.org/docs/key-terms/#index-number) that depends on their order. The first item is 0, the second item is 1, the third item is 2, etc. In the `my_favorite_philosophers` tuple, `'Hannah Arendt'` is item 2.

To retrieve an item from a tuple, we put the name of the tuple, followed by the index number for the item we want to retrieve in square brackets.

`Retrieving an item in a tuple:`
  
  &nbsp;&nbsp;&nbsp;&nbsp; `tuple name[index number]`

In [None]:
# Retrieving an item in a tuple
my_favorite_philosophers[2]

What do you think will happen if we change the index number to 1? What about 3? 

We can retrieve a group of consecutive items from a tuple using [slices](https://constellate.org/docs/key-terms/#slice) instead of a single index number. We create a **slice** by indicating a starting and ending index number, separated by a colon.

`Taking a slice of a tuple:`
  
  &nbsp;&nbsp;&nbsp;&nbsp; `tuple name[starting index number: ending index number]`

The slice contains all the items between our starting and stopping index number.

In [None]:
# Taking a slice of a tuple
historical_periods = ('Classical Antiquity',
                      'Early Middle Ages', 
                      'High Middle Ages', 
                      'Late Middle Ages', 
                      'Early Modern Period', 
                      'Late Modern Period', 
                      'Contemporary History')
historical_periods[3:5]

Notice that, again, the second index in a slice is the stopping point. Remember a way to check this is by subtracting the indexes in your head (5 - 3 = 2 items).

It is not uncommon for tuples to be hundreds or thousands of items long.  If you want to know the length of a tuple, you can use the `len()` function.

In [None]:
# Using the len() function to discover the number of items in the tuple
len(historical_periods)

## Dictionaries

Like a tuple, a [dictionary](https://constellate.org/docs/key-terms/#dictionary) can hold many values within a single variable. We have seen that the items of a tuple are stored in a strictly-ordered fashion, starting from item 0. In a dictionary, each [value](https://constellate.org/docs/key-terms/#key-value-pair) is stored in relation to a descriptive [key](https://constellate.org/docs/key-terms/#key-value-pair) forming a [key/value pair](https://constellate.org/docs/key-terms/#key-value-pair). This structure makes dictionaries very useful, because you can supply a key and receive a value without needing to refer to a specific index number. You are not allowed to have duplicate keys in Python dictionaries; each key can be used only once.

Whereas a tuple is typed with parentheses `()`, a dictionary is typed with braces `{}`.  The following examples show how key/value pairs can be used to store different kinds of data in a dictionary.

`example_dictionary = {keyA : valueA, keyB : valueB, keyC : valueC}`

`menu_dictionary = {itemA : priceA, itemB : priceB, ...}`

`gradebook_dictionary = {nameA : gradeA, nameB : gradeB, ...}`

Another important difference between dictionaries and tuples is that dictionaries are **mutable**, meaning that data in dictionaries can be changed after the dictionary is initialized. Tuples, on the other hand, cannot be changed after being initialized. 

The values in dictionaries can be any data type. You can have a mix of different data types within a dictionary. The keys can be most data types, including integers, floats, and strings.

Here is an example dictionary with the menu items from a restaurant as **keys** and their prices as **values**. 


In [None]:
# An example of a dictionary storing menu items and prices
breakfast_menu ={
 'Breakfast Sandwich': 9.75,
 'Croissant Breakfast Sandwich': 11.0,
 'Biscuit Sandwich': 9.0,
 'Spinach, Sunchoke, & Egg Plate': 11.0,
 'Salmon, Avocado, and Egg Sandwich': 11.50,
 'Scrambled Egg Plate': 9.75,
 'Museli': 6.50,
 'Hash': 14.50,
 'Egg in a Hole': 12,
 'Croque Madame': 13.50,
 'Bread & Butter': 6.0}

In [None]:
# Let's take a look at what we just created
print(breakfast_menu)

In [None]:
# If you don't like the look of that, you can use this code instead
# We'll talk more about what is happening here in our lesson on importing functions
from pprint import pprint
pprint(breakfast_menu)

We relied on order to reference the items in our tuples by their index numbers. Dictionaries are different. We use the keys to look up corresponding values, in this format:
`dictionary_name[key]`

For example: 

In [None]:
breakfast_menu['Breakfast Sandwich']

The key `'Breakfast Sandwich'` always maps to the value `9.75` so the order of the items doesn’t matter.

This is very different from tuples, where changing the order of the items will also change the item that is retrieved for a particular index number. 

For example:

In [None]:
# Re-initializing my_favorite_philosophers if this is a new session
my_favorite_philosophers = ('Margaret Cavendish', 'Mary Wollstonecraft', 'Hannah Arendt')

In [None]:
# Retrieving an item from the tuple
my_favorite_philosophers[0]

In [None]:
# Overwriting the tuple with a different order of items
my_favorite_philosophers = ('Hannah Arendt', 'Mary Wollstonecraft', 'Margaret Cavendish')


In [None]:
# Retrieving a different item with the same index number
my_favorite_philosophers[0]

Try it yourself! In the code block below, call up the values for `Museli` and for `Egg in a Hole`. Make sure to pay attention to the capitalization–remember that everything is case sensitive!

In [None]:
# Fill in your code here
# What happens if you enter a key that is not in our dictionary?

We noted above that dictionaries, unlike tuples, are mutable: we can change them. For example, we might want to add a key/value pair to our dictionary, like so:

In [None]:
breakfast_menu['Waffle'] = 12.00

In [None]:
# When we print breakfast_menu, we can see that it has been updated:
print(breakfast_menu)

# Or, un-comment-out the line below to use the "pretty print" function
# You will need to comment out the first line, unless you want to print the dictionary twice!
# And, be careful about the indentation!
#pprint(breakfast_menu)

You can also change the values in a dictionary. Maybe we want to raise the price of the Scrambled Egg Plate:

In [None]:
breakfast_menu['Scrambled Egg Plate'] = 20.00

In [None]:
# In this code block, call up the value of the 'Scrambled Egg Plate' key to confirm our change


The `len` function works on dictionaries; it returns the number of key/value pairs:

In [None]:
len(breakfast_menu)

We can use the `type` function with dictionaries and tuples, just like we did with other data types.

In [None]:
# Run this to check the type of a tuple, then check the type of breakfast_menu to check a dictionary
type[my_favorite_philosophers]


###Writing a generalized function to modify a data container

You may want to use a function multiple times to change a data container like a dictionary. You can write a generalized function, a function that is written to accept different inputs. 

Recall that in the last workshop, we learned that writing a function first requires us to define a function name. Then, we write the code block of the task that we want the function to execute.

You may want a generalized function to add new key/value pairs to an existing dictionary. For example, let's add a new menu item to the `breakfast_menu`. In the code blocks below, we will define a function called `update_breakfast_menu` to add Pancakes that cost 11 dollars to our original `breakfast_menu` and return the updated menu. Let's look at a specific example of how to add Pancakes and their price to the menu, and then write our generalized function.

In [None]:
#Write code to add Pancakes that cost $10 to the breakfast_menu dictionary
key = 'Pancakes'
value = 10 #Note that we are adding Pancakes that cost $10 in this step, so when we run our generalized function, we can check to see if the value is updated.
breakfast_menu[key] = value

In [None]:
#Writing a generalized function to add key/value pairs to a dictionary
def update_breakfast_menu (key, value):   #Define the function name and inputs
  update_breakfast_menu = breakfast_menu[key] = value    #Set the function to add the input key/value pair to the breakfast_menu
  result = breakfast_menu  
  return result    #Set the function to return the updated breakfast_menu

update_breakfast_menu('Pancakes', 11)    #Run the function to add Pancakes that cost $11 to the breakfast_menu to see if the price has been updated



In the `update_breakfast_menu` function, you specify the key, value pair you want to add to the original `breakfast_menu` dictionary.  You can re-run this function with different key/value pairs to add multiple, new menu items and prices.

Generalized functions can also be written to modify the values in a dictionary. For example, let's say the cost of avocados has gone up, and you need to double the price of the  Salmon, Avocado, and Egg sandwich in the `breakfast_menu`. 

In the following generalized function, you specify the function inputs, telling the function to multiply the value of a specific key/value pair in the dictionary. You can re-use this function to change the costs of different key/value pairs by changing the function inputs.

In [None]:
#Writing a generalized function to change a value in a dictionary
def double_price(key, value, multiplier):   #Define the function name and inputs
  double_price = breakfast_menu[key]= value * multiplier  #Set the function to multiply the value of the input key/value pair by the multiplier
  result = breakfast_menu   
  return result    #Set the function to return the updated breakfast_menu

double_price('Salmon, Avocado, and Egg Sandwich', 11.5, 2)    #Run the function with specified inputs to double the price of the Salmon, Avocado, and Egg Sandwich


## `for` loops and dictionaries
In last week's workshop, we learned how to use for loops to iterate a task over a list. You can also write `for` loops to iterate a task over tuples and dictionaries. 

By default, if you put a dictionary into a `for` loop, it will iterate over the keys in that dictionary. Here's an example of how that works.

In [None]:
short_menu ={
 'Breakfast Sandwich': 9.75,
 'Croissant Breakfast Sandwich': 11.00,
 'Biscuit Sandwich': 9.00}

for key in short_menu:
    print(key)

`key` is the iteration variable in the block above. Confirm this for yourself by updating the code below to use a different name for your iteration variable (make sure to change both instances!).

In [None]:
# Update this code to use a different iteration variable
short_menu ={
 'Breakfast Sandwich': 9.75,
 'Croissant Breakfast Sandwich': 11.00,
 'Biscuit Sandwich': 9.00}

for sandwich in short_menu:
    print(sandwich)

We can also use the tools we learned earlier in the lesson to work with dictionaries. For example, we might want to retrieve values, instead of keys: 

In [None]:
# A program that prints the prices of menu items
short_menu ={
 'Breakfast Sandwich': 9.75,
 'Croissant Breakfast Sandwich': 11.00,
 'Biscuit Sandwich': 9.00}

for key in short_menu:
    print("Price: $",short_menu[key])

Or, pulling these together, we might want both values and keys:

In [None]:
# A program that prints the names and prices of menu items
short_menu ={
 'Breakfast Sandwich': 9.75,
 'Croissant Breakfast Sandwich': 11.00,
 'Biscuit Sandwich': 9.00}

for key in short_menu:
    print("For our " + key + ", the price is $" + str(short_menu[key])) # We put the `short_menu[key]` inside `str()` to turn the float data type into a string and print the phrase
    

## Sets 

Our focus here has been on lists, tuples, and dictionaries, but we are briefly going to discuss one more data type: **sets**. Sets are the last of the four data types in Python that can store collections of data. To recap: these are tuples, lists, sets, and dictionaries.

Sets are expressed like so:

`example_set = {"item1", "item2", "item3"}`

Sets are **unordered** and **unindexed**. That means that the items in a set could be in a different order any time you reference them. You cannot refer to the items in a set by index numbers, as you do with tuples. You'll also note from our example that the items in a set are not keyed, as with dictionaries. Because sets are unindexed, you are not allowed to have duplicate values.

You cannot change the items in a set once the set has been initialized. However, you can remove items or add new ones. Taking our generic example, this means that we could remove `item3` or add `item4`, but we couldn't change `item1` to `item_one`.

Sets may contain a mix of different data types. However, sets cannot contain **mutable** data types like dictionaries or lists.

In [None]:
# Initializing an example set
my_favorites = {"green", 19, "lily of the valley", "blackberries", "cookie dough"}

In [None]:
# We can use the `len()` function to determine the length of a set
len(my_favorites)

In [None]:
# And, we can use `type()` as well
type(my_favorites)

One thing that makes sets useful is that they do not allow repeat items: this means that putting items into a set is an effective way of removing any duplicates. Sets are also useful for certain math operations that we aren't going to get into here, but that you'll see in the future.

In [None]:
# If you do put duplicate values in a set, they will be ignored
my_absolute_favorites = {"green", 19, "lily of the valley", "blackberries", "cookie dough","green"}
print(my_absolute_favorites)

That's it on lists, tuples, dictionaries, and sets! In our next lesson, we'll cover how to write functions with conditional statements that can perform calculations on a certain subset of data.

# Practice Exercises
As with the previous two lessons, you should first try running the quick exercises in this notebook, and practice making changes and testing their results. 

**Exercise One**

Initialize a tuple called `class_topics` that contains the following strings, in this exact order: "Introduction to the Course", "Types of Inference", "Probability Theory", "Programming Fundamentals", "Bayesian Inference", "Probabilistic Graphical Models", "Information Theory", and "Advanced Topics".

In [None]:
# Initialize the class_topics tuple here

Now, using its index number, retrieve "Probability Theory" from the `class_topics` tuple.

In [None]:
# Fill in your code here

Finally, take a slice that retrieves "Bayesian Inference", "Probabilistic Graphical Models", and "Information Theory".

In [None]:
# Fill in your code here

**Exercise Two**

Initialize a dictionary called `snowfall_totals` with the following key/value pairs:
<br>Boston: 24.5
<br>Brookline: 15
<br>Cambridge: 14
<br>Framingham: 12.2
<br>Malden: 20
<br>Wakefield: 21.2

Make sure to change the town names to strings!

In [None]:
# Initialize your dictionary 
# Then, use `len` to check the length


Now, add a new key/value pair to your dictionary:
<br>Norwood: 19.5

In [None]:
# Add the new key/value pair for Norwood's snowfall
# Then use `len` to check the length


This just in! Wakefield actually got 22.4 inches of snowfall. Update the value in the code block below. 

In [None]:
# Update the value for Wakefield to 22.4


Finally, call up the value of Wakefield to confirm that your change went through. 

In [None]:
# Call up the value of Wakefield


**Exercise Three**

Below is a dictionary with information on the Boston subway system, lightly modified for convenience. Each **key** is a tuple with two strings: the line and the direction (in Boston, "C" can be a direction). Each **value** is a string with the last stop of that line and direction.

In [None]:
t_stops = {("orange", "north"):"Oak Grove",
           ("orange", "south"):"Forest Hills",
           ("blue", "north"):"Wonderland",
           ("blue", "south"):"Bowdoin",
           ("red", "north"):"Alewife",
           ("red", "southeast"):"Braintree",
           ("red", "southwest"):"Mattapan",          
           ("green", "north"):"Lechmere",
           ("green", "B"):"Boston College",
           ("green", "C"):"Cleveland Circle",
           ("green", "D"):"Riverside",
           ("green", "E"):"Heath Street"}

Now, we will construct a function using a for loop that iterates through the t_stops dictionary and prints each tuple key.

**Hint**: You can look above in this notebook to the program for printing prices from a menu for a model.

In [None]:
# This code will initialize the my_direction tuple based on user input
t_line = input("What are the major subway lines of the MBTA? ")
t_heading = input("Which directions do the major lines run? ")

# Fill in your code here to write a function that prints all the tuple keys from the t_stops dictionary

# Solutions
These are some sample solutions, but (as we've already noted) you might have taken a different approach. 

In [None]:
# Exercise One
# Initialize the class_topics tuple
class_topics = ("Introduction to the Course", "Types of Inference", "Probability Theory", "Programming Fundamentals", "Bayesian Inference", "Probabilistic Graphical Models", "Information Theory", "Advanced Topics")

In [None]:
# Exercise One
# Retrieve "Probability Theory"
class_topics[2]

In [None]:
# Exercise One
# Retrieve "Bayesian Inference", "Probabilistic Graphical Models", and "Information Theory"
class_topics[4:7]

In [None]:
# Exercise Two
# Initialize your dictionary 
snowfall_totals = {"Boston": 24.5,
"Brookline": 15,
"Cambridge": 14,
"Framingham": 12.2,
"Malden": 20,
"Wakefield": 21.2}

In [None]:
# Exercise Two
# Check the length of the dictionary
len(snowfall_totals)

In [None]:
# Exercise Two
# Add the new key/value pair for Norwood's snowfall 
# Then use `len` to check the length 
snowfall_totals['Norwood'] = 19.5
len(snowfall_totals)

In [None]:
# Exercise Two
# Update the value for Wakefield to 22.4
snowfall_totals['Wakefield'] = 22.4

In [None]:
# Exercise Two
# Call up the value for Wakefield
snowfall_totals['Wakefield']

In [None]:
# Exercise Three
# Initialize the t_stops dictionary
t_stops = {("orange", "north"):"Oak Grove",
           ("orange", "south"):"Forest Hills",
           ("blue", "north"):"Wonderland",
           ("blue", "south"):"Bowdoin",
           ("red", "north"):"Alewife",
           ("red", "southeast"):"Braintree",
           ("red", "southwest"):"Mattapan",          
           ("green", "north"):"Lechmere",
           ("green", "B"):"Boston College",
           ("green", "C"):"Cleveland Circle",
           ("green", "D"):"Riverside",
           ("green", "E"):"Heath Street",}

# Initialize the my_direction tuple based on user input
t_line = input("What are the major subway lines of the MBTA? ")
my_heading = input("Which directions do the major lines run? ")

# Fill in a function that prints every tuple key from the t_stops dictionary
def print_direction(menu):
  for key in menu:
    print(key, t_stops[key])
print_direction(t_stops)