<a href="https://colab.research.google.com/github/Princeton-CDH/python4poets/blob/main/3_Lists_and_dictionaries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3. Data Types II: Putting Things Together -- or, on Lists and Dictionaries

This lesson will describe a second kind of data types, grouped together here due to their ability to collect (and structure) individual data entries. Specifically, we will look in more detail at two kinds of data collections:

*   Lists;
*   and Dictionaries.


There's also two more common data types which we will just briefly touch upon: 

*   Tuples;
*   and Sets.




## 3.1 Lists



Lists (`list`) are among the data types that can be used to store collections of data. Specifically, lists differ from the other three data types introduced above in the following ways:

*   lists are **ordered**;
*   lists are **changeable**;
*   and lists allow **duplicate values**.

We'll go over these in more detail below.

The general syntax for declaring lists is as follows:

In [None]:
# declare a list
list_1 = []


Pay attention to the use of square brackets `[]` to form the list. But the example above forms an *empty* list. Let's check the *contents* and the *type* of that list: 

In [None]:
print(list_1)
type(list_1)

[]


list

So yes, we formed a list that is... empty. Perhaps not too useful, but it's a start!

To actually include something in a list, we'd have to add it, either when declaring the list, or later on:


*   Declare a list with content like this:
`list_2 = ["item_1, "item_2"]` (Note that separate items in a list are separated by a comma `,`)

*  Add new items to an existing list like this: `list_2.append("item_3")` 

The latter is what we call a *method*. Essentially, it's "[a function that 'belongs to' an object](https://docs.python.org/3/tutorial/classes.html#instance-objects)." We'll learn more about functions later, but for now, note how the method `append` -- in `list_2.append("X")` -- is itself *appended* to the object we'd like to manipulate. 

Let's try that process on `list_1` that we declared above:

In [None]:
list_1.append("This list used to be empty!")
print(list_1)

['This list used to be empty!']


∇  *What do you think happens if you execute the kernel above multiple times?*

In other words, lists can be changed.



### 3.1.1 Adding to Lists

So far we've added only a `str` to our list. Let's add more, including other data types:

In [None]:
# add a str
list_1.append("clutter")

# add an int
list_1.append(100)

# add a float
list_1.append(4.123)

print(list_1)

['This list used to be empty!', 'clutter', 100, 4.123]


You might have also notived that a `list` can contain different data types. `list_1`, our example list, contains items that are `str`, `int`, and `float`. Note that the **order** of the added items remains the same. We could even add another list as an item to a list, or any other data type:


In [None]:
# add a list to a list
list_within_list = ["This", "is", "a", "list"]
list_1.append(list_within_list)

print(list_1)

['This list used to be empty!', 'clutter', 100, 4.123, ['This', 'is', 'a', 'list']]


### 3.1.2 Removing from Lists

Maybe we overdid it now... so let's remove the last item -- the `list` within the `list` -- again. You can remove items from a list in the same way that you append items, specifically through a method. This time we're using the following syntax:

In [None]:
list_1.remove(list_within_list)

print(list_1)

['This list used to be empty!', 'clutter', 100, 4.123]


∇  *What do you think happens if you execute the kernel above a second time?*



###3.1.3 Use Case & List Index

Now, let's move on from our initial list and form a list of some of Langston Hughes' poems: 

In [None]:
poems_selection = ["50-50", "Black Maria", "Blues in Stereo", "Blues on a Box", "Catch", "Crossing Jordan", "Dreams", "Motto", "Seashore through Dark Glasses", "Snail", "Winter Moon", "You and your whole race."]

You may notice that that's a rather long list and not really legible anymore. To provide better readability, one way you can format your lists is as follows:

In [None]:
poems_selection_2 = [
                   "Boogie: 1 A.M.",
                   "Brass Spittoons", 
                   "Cross",
                   "Crowing Hen Blues",
                   "Daybreak in Alabama",
                   "Dream Boogie",
                   "Dying Beast",
                   "Harlem Sweeties",
                   "I, Too",
                   "Lincoln Theatre",
                   "Out of Work",
                   "Sailor",
                   "Snail",
                   "Sylvester’s Dying Bed",
                   "Theme for English B", 
                   "The Weary Blues",
                   "Yesterday and Today"
                   ]

If we wanted to attach the second list to the first, we could do this as follows:

In [None]:
poems_selection.extend(poems_selection_2)

print(poems_selection)

['50-50', 'Black Maria', 'Blues in Stereo', 'Blues on a Box', 'Catch', 'Crossing Jordan', 'Dreams', 'Motto', 'Seashore through Dark Glasses', 'Snail', 'Winter Moon', 'You and your whole race.', 'Boogie: 1 A.M.', 'Brass Spittoons', 'Cross', 'Crowing Hen Blues', 'Daybreak in Alabama', 'Dream Boogie', 'Dying Beast', 'Harlem Sweeties', 'I, Too', 'Lincoln Theatre', 'Out of Work', 'Sailor', 'Snail', 'Sylvester’s Dying Bed', 'Theme for English B', 'The Weary Blues', 'Yesterday and Today']


The code above highlights yet another list method, and you'll notice that we kept the order of the two lists and just connected them. So let's **sort** them alphabetically, through yet another list method:




In [None]:
poems_selection.sort()

print(poems_selection)

['50-50', 'Black Maria', 'Blues in Stereo', 'Blues on a Box', 'Boogie: 1 A.M.', 'Brass Spittoons', 'Catch', 'Cross', 'Crossing Jordan', 'Crowing Hen Blues', 'Daybreak in Alabama', 'Dream Boogie', 'Dreams', 'Dying Beast', 'Harlem Sweeties', 'I, Too', 'Lincoln Theatre', 'Motto', 'Out of Work', 'Sailor', 'Seashore through Dark Glasses', 'Snail', 'Snail', 'Sylvester’s Dying Bed', 'The Weary Blues', 'Theme for English B', 'Winter Moon', 'Yesterday and Today', 'You and your whole race.']


Ok, so far so good. We created two lists, combined them, and changed the order. But if you'll look closely through the output, you'll notice that we have one item in there twice, because both lists contained it: specifically, we now list the poem [*Snail*](https://www.poetryfoundation.org/poems/150988/snail-5d73bf7842530) twice. Let's double-check that:



In [None]:
poems_selection.count("Snail")

2

We can check where the *first* occurrence of this item happens through the **index**, which functions the same way as in a string (remember, Python starts counting at `0`). We can call the index of the *first* occurrence of `"Snail"` in our list as follows:

In [None]:
poems_selection.index("Snail")

21

Given that our list is now sorted alphabetically, the two occurrences are at index `21` and `22`. We can **slice** the list (again like a `str`) to show the two `"Snail"`s:

In [None]:
poems_selection[21:23]

['Snail', 'Snail']

∇  *Why did we specify* `23`*?*

If we're bothered by the fact that we now have that the title of this poem twice in our list, we could remove one of these occurrences. You already learned the `list.remove(item)` method, so write it out in the kernel below and print the list, so you can check manually: 

In [None]:
# remove the str "Snail"

# print list


Another thing you can do with slices is to include **steps**, which means that "jump" through the list and list every n-th item. The basic syntax of slicing is therefore as follows:

`list[start:stop:step]`

You don't have to actually fill in every part of the above syntax, and you could just specifiy the last part as follows:

In [None]:
poems_selection[::12]

['50-50', 'Dreams', 'The Weary Blues']

In other words, we start with index `0` until the end of the list, listing every `12`th item of the list. 

Let's store a list -- starting at index `2`, with step `10`, as a new list for now:

In [None]:
poems_short_list = poems_selection[6::6]
print(poems_short_list)

['Catch', 'Dreams', 'Out of Work', 'The Weary Blues']


## 3.2 Dictionaries



Dictionaries (`dict`) go one step further than lists in that they assign *values* to corresponding *keys*. Dictionaries are declared as follows:

`dict = {"key_1": "value_1", "key_2": "value_2"}`


Note the use of curly braces `{}` and the internal **key:value** structure that are needed to form a dictionary. Overall, the following features are noteworthy about dictionaries:

*   they are ordered;
*   they are changeable;
*   and they cannot have duplicate keys.

In other words, you can store data in dictionary pairs and call *values* by their corresponding *keys*. While keys must be immutable data types (which includes `int`, `float`, `str`, or `bool`), values do not have this constraint. In other words, you can store any other data type as a value of a dictionary, including more comprehensive types such as a `list` or yet another `dict`.

> Dictionaries are *ordered* as of Python version 3.7; before that, dictionaries were *unordered*. Ordered means that a sequence has a set order, and that we can refer to items by an index. 
> 
> *As a side note, it's generally a good idea to pay attention to the topic of versions.*

In sum, dictionaries can be really useful. But let's revisit the basic structure:

In [None]:
Wintersession_students = {
    "Python for Poets": 46,
    "Introdcution to Machine Learning": 152,
    "Put Census data into R to make maps": 42
}

∇  *What does the dictionary in the kernel above mean?*



###3.2.1 Use Case 1: Bibliographic Entry

Compare also the following bibliographic entry, formatted as a dictionary:

In [None]:
bibliographic_entry_1 = {
    "Author": "Michel Foucault",
    "Title": "The Order of Things: an Archaeology of the Human Sciences",
    "Publication Place": "London",
    "Publisher": "Routledge", 
    "Year": 2002
}

While this dictionary is still manually readable, imagine a much longer dictionary -- how would you get data out of this format again? Fortunately, dictionary methods can help with that -- to *get* the value of a specified key, just type the following:

In [None]:
bibliographic_entry_1.get("Author")

'Michel Foucault'

But what if you don't even know the key? Well, another method allows you to call all the keys:

In [None]:
bibliographic_entry_1.keys()

dict_keys(['Author', 'Title', 'Publication Place', 'Publisher', 'Year'])

There's also the corresponding method for all values:

In [None]:
bibliographic_entry_1.values()

dict_values(['Michel Foucault', 'The Order of Things: an Archaeology of the Human Sciences', 'London', 'Routledge', 2002])

The following method helps if you're interested in the dictionary items as pairs:

In [None]:
bibliographic_entry_1.items()

dict_items([('Author', 'Michel Foucault'), ('Title', 'The Order of Things: an Archaeology of the Human Sciences'), ('Publication Place', 'London'), ('Publisher', 'Routledge'), ('Year', 2002)])

If you wanted to add another key:value pair to a given dictionary -- remember, dictionaries are *mutable* -- 

In [None]:
bibliographic_entry_1.update({"My own Reading Notes": "Really insightful but pretty dense (usual for this author!)"})

print(bibliographic_entry_1.get("My own Reading Notes"))

Really insightful but pretty dense (usual for this author!)


### 3.2.2 Use Case 2: Longer Bibliography 

So far so good -- but a single entry might not be too helpful, so let's put that into a longer format:


In [None]:
bibliography = [
    {
        "Author": "Michel Foucault",
        "Title": "The Order of Things: an Archaeology of the Human Sciences",
        "Publication Place": "London",
        "Publisher": "Routledge", 
        "Year": 2002
    },
    {
        "Author": "郑张尚芳",
        "Title": "温州方言志",
        "Publication Place": "Beijing",
        "Publisher": "中华书局",
        "Year": 2008
    },
    {
        "Author": "中村祐司",
        "Title": "2020年東京オリンピックとは何だったのか：欺瞞の祭典が残したもの",
        "Publication Place": "Tokyo",
        "Publisher": "成文堂",
        "Year": 2022
    }
]

∇  *Can you describe the structure of the above example?*

As you can see, Python does not (really) have an issue with handling different scripts, as the bibliography above includes also works in Chinese and Japanese. Depending on the writing system you are working on, there might be some adjustments you might have to make (for example, right-to-left scripts require some additional adjustments).


### 3.2.3 Use Case 3: Poems?

Of course, you can also dictionaries for other forms of data. We could even store poems, such as in the following example ([*Dreams*](https://www.poetryfoundation.org/poems/150995/dreams-5d767850da976) by Langston Hughes):

In [4]:
poems_dict = {
    "Catch": [
        "Big Boy came",
        "Carrying a mermaid",
        "On his shoulders",
        "And the mermaid", 
        "Had her tail",
        "Curved",
        "Beneath his arm.",
        "\n",
        "Being a fisher boy,",
        "He’d found a fish",
        "To carry—",
        "Half fish,",  
        "Half girl", 
        "To marry."],
    "Dreams": [
        "Hold fast to dreams",
        "For if dreams die",
        "Life is a broken-winged bird",
        "That cannot fly.",
        "\n",
        "Hold fast to dreams",
        "For when dreams go",
        "Life is a barren field",
        "Frozen with snow."]
        }

∇  *What do you think will be the output of the kernel below?*


In [None]:
print(poems_dict.get("Dreams")[:4])

['Hold fast to dreams', 'For if dreams die', 'Life is a broken-winged bird', 'That cannot fly.']


As you can see, lists and dictionaries can get relatively long. Often, you would want to store this sort of data in a separate file. These files store data in other formats, which can be imported to and adapted to Python data types. While you might think of a spreadsheet such as an Excel file, that format entails a [few more steps](https://www.geeksforgeeks.org/reading-excel-file-using-python/) to adapt to native Python. Easier -- and just as common -- are examples such as [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) files or [JSON](https://www.json.org/json-en.html) files. To process those formats, you'd need to import a Python module, something we'll introduce in a later module.

## 3.3 Tuples and Sets in a Nutshell


There's two more sequential data types that we'll briefly touch upon: tuples and sets.





### 3.3.1 Tuples

The first one, `tuple`, consists of an ordered, unchangeable object. It allows for duplicate items therein. You form a tuple as follows:

`genre = ("poetry", "prose")`

Note the use of round brackets `()` to form a tuple. Some of us may always use lists over tuples; that's ok. Tuples usually come in handy when you're thinking about code efficiency (as they are unchangeable). So that's nothing you have to worry about yet!



### 3.3.2 Sets

The second one, `set`, consists of an unordered, unchangeable object (you can change items in a set, but you *cannot* add or remove items). They are therefore unindexed. Additionally, and crucially, a set consists of *unique* values. You form a set as follows:

`books_read_for_generals = {"book C", "book A", "book B"}`

Note the use of curly braces `{}` to form a set. Since a set does not care about element order, it's also more memory efficient than a `list`. That being said, you may at first also not find yourself using sets nearly as much as a list.


### 3.3.3 General Overview & Summary
So, in summary...

...use a **dictionary** when:
- ...you need a key:value pair and the logical association they provide;
- ...you need to lookup your data based on a custom key;
- ...you need to modify your data; dictionaries are mutable, after all.

...use a **list** when:
- ...you have a collection of data that needs a set order; 
- ...you need to modify an iterable collection frequently.

...use a **tuple** when:
- ...your data is immutable, and therefore cannot change.

...use a **set** when:
- ...you need the elements to be unique. 

## 3.4 Loops



Loops are used to iterate over a sequence. That means you can loop over a list, a dictionary, a tuple, a set -- or even a string. More specifically, loops are used to repeat a given section of code, either a certain number of times or until a particular condition is met. Expressed differently: with a loop, you're "looking" through each individual entry that you provide.

There's two loops you need to know about:

*   a *while loops* repeats a statement while a condition is `True`;
*   a *for loop* executes a statement for each instance in a sequence.







###3.4.1 *For* Loops

Let's look at the *for loop* first. Essentially, a simple for loop consists of two lines of code:
1.   a line that begins with the word `for`, followed by a new variable for each item in the sequence, the word `in`, the name of the sequence, and a colon `:` 
2.   a line that specifies what should be done.

A `for` loop executes the *second* part *for each instance* in a sentence.

So let's try that out on a list:

In [None]:
# declaring a new list
existential_list = [True, False, True, False, False, True]

# loop in two lines:
for boolean in existential_list:
  print("To be, or not to be, that is the question")

To be, or not to be, that is the question
To be, or not to be, that is the question
To be, or not to be, that is the question
To be, or not to be, that is the question
To be, or not to be, that is the question
To be, or not to be, that is the question


∇  *Can you describe in your own words what happened above?*

Also: Note that *indendation matters* (we'll return to that).

As mentioned at the beginning of this section, you can also loop over strings. Compare the following example:

In [None]:
# declare a new string:
example_string = "गते गते पारगते पारसंगते बोधि स्वाहा"
 
# iterate over the string:
character_count = 0
for character in example_string:
  character_count+=1

print(character_count)

35


∇  *Can you explain what the loop above does?*

Of course, Python has a built-in function -- `len(str)` -- to achieve the same result in a single line. Whenever possible, try to express yourself in a more concise way:

In [None]:
len(example_string)

35

###3.4.2 *While* Loops

Returning to loops -- `while` loops execute code as long as a certain condition is met. 

These loops contain -- minimally -- the following elements:

1.   a line that begins with the word `while`, folled by the condition that you intend to be met at first, folled by a colon `:`
2.   a line that will be executed.

The second part will be executed *while* the specified condition is `True`.

In [None]:
text = "Write me!"
counter = 0

while counter < 5:
  print(text)
  counter += 1

Write me!
Write me!
Write me!
Write me!
Write me!


∇  *Can you explain what the loop above does?*

Again, *indentation matters.* Compare what happens in the following kernel:



In [None]:
text = "Write me!"
counter = 0

while counter < 5:
  print(text)
counter += 1

∇  *Yikes, what happened here?*





###3.4.3 Loop Use Case 

You can also loop through a dictionary, but for this data type, you need to pay attention to the keys:values pairing. In the following example, we are printing *each line* in *each poem* in the dictionary `poems_dict` that we declared earlier.

In [None]:
# print 
for poem in poems_dict.values():
  for line in poem:
    print(line)
  print("\n---------\n")

Big Boy came
Carrying a mermaid
On his shoulders
And the mermaid
Had her tail
Curved
Beneath his arm.


Being a fisher boy,
He’d found a fish
To carry—
Half fish,
Half girl
To marry.

---------

Hold fast to dreams
For if dreams die
Life is a broken-winged bird
That cannot fly.


Hold fast to dreams
For when dreams go
Life is a barren field
Frozen with snow.

---------



∇  *Let's read the code above carefully and really understand what's going on. Can you read it against the background of the dictionary* `poems_dict` *itself?*

(to make that a bit easier, print that dictionary below:)

In [None]:
print(poems_dict)

{'Catch': ['Big Boy came', 'Carrying a mermaid', 'On his shoulders', 'And the mermaid', 'Had her tail', 'Curved', 'Beneath his arm.', '\n', 'Being a fisher boy,', 'He’d found a fish', 'To carry—', 'Half fish,', 'Half girl', 'To marry.'], 'Dreams': ['Hold fast to dreams', 'For if dreams die', 'Life is a broken-winged bird', 'That cannot fly.', '\n', 'Hold fast to dreams', 'For when dreams go', 'Life is a barren field', 'Frozen with snow.']}


If you wanted to iterate over *both* keys and values in a given dictionary, an extra step is necessary, as [described here](https://stackoverflow.com/questions/3294889/iterating-over-dictionaries-using-for-loops).

###3.4.4 Looping to Infinity and Beyond

One possible problem you may encounter with loops is the following scenario: your loop never ends. This happens when an infinite loop never encounters a `False` statement. Sure, there may be uses for such loops, but for now, you'll mostly want to avoid those.

(And when such an infinite loop occurs, you'll have to interrupt it, because otherwise, the code will literally be executed infinitely [but rest assured though that your machine is still much more likely to be interrupted]).

So compare the following `while` loop. Nothing to stop this loop from continuing, right? (Before you try it out, note that you can interrupt a kernel the same way that you execute it.)

In [None]:
text = "Write me!"
counter = 0

while counter > -1:
  print(text)
  counter += 1

One way to stop a loop is to include a `break` statement. For example, we could include the following clause in the loop above:

In [3]:
text = "Write me!"
counter = 0

while counter > -1:
  print(text)
  counter += 1
  if counter == 10:
    break

Write me!
Write me!
Write me!
Write me!
Write me!
Write me!
Write me!
Write me!
Write me!
Write me!


In a nested loop (a loop within a loop), the `break` statement just exits the innermost loops (the loop `break` is in), and returns to the outer loop.

In [7]:
for poem in poems_dict.values():
  for line in poem:
    print(line)
    if line == "Curved":
      break
  print("\n---------\n")

Big Boy came
Carrying a mermaid
On his shoulders
And the mermaid
Had her tail
Curved

---------

Hold fast to dreams
For if dreams die
Life is a broken-winged bird
That cannot fly.


Hold fast to dreams
For when dreams go
Life is a barren field
Frozen with snow.

---------



∇  *Can you explain what happened above?* 


# Practice Section







1. In the kernel below, you should get the following output: `list`

In [None]:
this_list = {1, 2, 3, 4}

type(this_list)

set

2. You should get the following output: `set`

In [None]:
this_set = ("Alpha", "Beta", "Gamma")

type(this_set)

tuple

In the following section, let's use some of the lists and dictionaries we built. Make sure that you executed the relevant kernels above!

3. You should get the following output: `['Blues on a Box', 'Boogie: 1 A.M.']`

In [None]:
poems_selection

4. You should get the following output: `['Brass Spittoons', 'Dream Boogie']`

In [None]:
poems_selection[5:]

5. You should get the following output: `dict`

In [None]:
this_dict = ("First Name": "Toni", "Last Name": "Morrison")
type(this_dict)

dict

6. You should get the following output: `dict_keys(['Author', 'Title', 'Publication Place', 'Publisher', 'Year'])`

In [None]:
bibliography.keys()

AttributeError: ignored

7. You should get the following output: `中村祐司`

In [None]:
bibliography[0].values("Author")

8. Fix the following loop (make sure you print different titles):


In [None]:
for book in bibliography:
  print("The book '" + bibliography[0].get("Title") + "' is in my bibliography")

The book 'The Order of Things: an Archaeology of the Human Sciences' is in my bibliography
The book 'The Order of Things: an Archaeology of the Human Sciences' is in my bibliography
The book 'The Order of Things: an Archaeology of the Human Sciences' is in my bibliography



9. Fix the following loop:

In [None]:
counter = 0

while poem in poems_selection()
  counter + 1
  
print("There are " + str(counter) + " poems in this list")

TypeError: ignored

10: Fix the loop below:

In [None]:
for books in bibliography:
  print(book.get("Author") + " is the author of " + book.get("Title"))

Michel Foucault is the author of The Order of Things: an Archaeology of the Human Sciences
郑张尚芳 is the author of 温州方言志
中村祐司 is the author of 2020年東京オリンピックとは何だったのか：欺瞞の祭典が残したもの


11. Fix the following loop. You should get the following output `Catch Dreams`:

In [21]:
for k in poems_dict.values():
  print(k)

['Big Boy came', 'Carrying a mermaid', 'On his shoulders', 'And the mermaid', 'Had her tail', 'Curved', 'Beneath his arm.', '\n', 'Being a fisher boy,', 'He’d found a fish', 'To carry—', 'Half fish,', 'Half girl', 'To marry.']
['Hold fast to dreams', 'For if dreams die', 'Life is a broken-winged bird', 'That cannot fly.', '\n', 'Hold fast to dreams', 'For when dreams go', 'Life is a barren field', 'Frozen with snow.']
