# Module 8: Data Structures II
March 3, 2023

Last time we discussed recursion and the most fundamental data structure in Python: the list.

Today we discuss more data structures (tuples, dictionaries, sets). We will also discuss the important difference between call by value and call by reference.

Next time we will cover how to read and write files in general, how to deal with CSV files in particular, and how to handle runtime errors that can for example be caused by user inputs or file operations.

## Data Structures in Python 

| Data Structure | Ordered | Mutable | Unique | Iterable |
|----------------|:-------:|:-------:|:------:|:--------:|
| List           |   Yes   |   Yes   |   No   |    Yes   |
| Tuple          |   Yes   |    No   |   No   |    Yes   |
| Dictionary     |    No   |   Yes   |  Yes*  |    Yes   |
| Set            |    No   |   Yes   |   Yes  |    Yes   |

*Keys are unique, while values can be repeated (more on this below)

### Tuples
Tuples are actually very similar to lists, just that they are immutable and cannot be changed. Thus, they only support operations that read from them, but no operations that would change or delete the data structure.
In practice tuples are frequently obtained as a result from library functions, but they can also be created directly, by using round brackets:

```
<tuple_name> = (<value1>, <value2>, …, <valueN>)
```

For example:


In [1]:
sample = ("Thursday", "lunch", "pasta", 3.95)
print(sample)

('Thursday', 'lunch', 'pasta', 3.95)


All reading operations (such as indexing, slicing, iteration…) work in the same way as on lists, for example:

In [2]:
print(sample[0])
print(sample[len(sample)//2:])

for s in sample:
    print(s)

Thursday
('pasta', 3.95)
Thursday
lunch
pasta
3.95


However, writing operations are not possible on tuples, that is, no changing of elements, no deletions, no appending, no sorting, etc.

In [3]:
s = (1, 2, 3)
try:
    s[0] = 10
except TypeError as e:
    print(e)

'tuple' object does not support item assignment


In practice, tuples are frequently used for example by web services or other APIs to return results. You cannot manipulate these directly, but of course access them and copy the contained values to other data structures. Being read-only data structures also makes operations on tuples faster than on lists, so when working with large collections of data that does not change, the use of tuples might be preferred over lists. Furthermore, tuples are also used to make functions return more than one value. For example:

In [4]:
def integer_division(a,b):
    quotient = a//b
    remainder = a%b
    return quotient, remainder

print(integer_division(20,6))

(3, 2)


Note that the return statement does not explicitly define the pair of numbers as a tuple, but any comma-separated list of return values as shown here will automatically be turned into a tuple.

### Dictionaries

Dictionaries are another complex data structure in Python that can be used to store several values, or more precisely key-value pairs. Keys must be unique and immutable (to be safe it is best to only use simple data types such as strings or numbers as keys), while values can occur repeatedly and be any kind of data type. Dictionaries can be defined as follows:

```
<dictionary_name> = {<key1>:<value1>,… ,<keyN>:<valueN>}
```

For example:

In [5]:
person_details = {"First name":"Bob", "Last name":"Smith", "Building":"BBG", "Room":223} 

The ```print()``` function can also print out dictionaries, for example:

In [6]:
print(person_details)

{'First name': 'Bob', 'Last name': 'Smith', 'Building': 'BBG', 'Room': 223}


While in lists a numerical index is used to access the element at a certain position, with dictionaries the key is used to access a particular value. The order of the pairs inside the data structure should not bother the programmer. The basic syntax for accessing a value is:

```
<dictionary_name>[<key>]
```
    
For example, to print out the first name of the person, we can use the following code:

In [7]:
print(f"First name:" + person_details["First name"])

First name:Bob


To change the value for a key or to add a new key-value pair to a dictionary, the assignment statement can be used, for example:

In [8]:
person_details["Last name"] = "Tailor"
person_details["Phone"] = 1234

Resulting in a changed dictionary:

In [9]:
print(person_details)

{'First name': 'Bob', 'Last name': 'Tailor', 'Building': 'BBG', 'Room': 223, 'Phone': 1234}


Elements can be deleted from a dictionary also via their key, for example:

In [10]:
del person_details["Building"]
del person_details["Room"]

Resulting in:

In [11]:
print(person_details)

{'First name': 'Bob', 'Last name': 'Tailor', 'Phone': 1234}


Just as lists, also dictionaries know their length, that is, the number of key-value pairs in them:

In [12]:
print(len(person_details))

3


The operators "in" and "not in" can be used to check if a **key** is contained in a dictionary (or not):

In [13]:
print("First name" in person_details)
print("Bob" in person_details)

True
False


Now let's look at a more comprehensive example with dictionaries. The following small program lets the user enter a number of term-definition pairs for a glossary, then prints the glossary alphabetically sorted by the terms:

In [14]:
glossary = {}

while True:
    new_key = input("Please enter term: ")
    new_value = input("Please enter definition: ")
    glossary[new_key] = new_value
    more_entries = input("Do you want to add another entry? (y/n) ")
    if more_entries != "y":
        break
        
keys = list(glossary.keys())
keys.sort()

for key in keys:
    print(f"{key}: {glossary[key]}")

Please enter term:  list
Please enter definition:   abstract data type that represents a finite number of ordered values
Do you want to add another entry? (y/n)  y
Please enter term:  tree
Please enter definition:  a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes.
Do you want to add another entry? (y/n)  n


list:  abstract data type that represents a finite number of ordered values
tree: a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes.


As with lists, also with dictionaries there is a difference between a normal "shallow" copy through the assignment operator, and a thorough deep copy with the ```copy.deepcopy()``` function.

### Sets

Sets in Python correspond to sets in mathematics. They contain each element only once, and set operations like union and intersection can be performed on them. Sets support membership tests (```in, not in```), but they are unordered and have no index to access individual elements. Sets can be defined as in the following example:

In [15]:
set1 = {3,1,2}
set2 = set([5,6,4])

That is, a list of elements in curly braces defines a set. An empty pair of curly braces is however already reserved for creating an empty dictionary, so alternatively a set can be created as shown in the second line,  but calling the set function with a (possibly empty) list to create a new set.

Sets define no order themselves, but commands like print might order the elements:

In [16]:
print(set1)
print(set2)

{1, 2, 3}
{4, 5, 6}


Elements can be added and removed from sets with the corresponding functions. Adding and element to a set that is already contained in it will simply have no effect:

In [17]:
set1.add(1)
print(set1)
set1.add(4)
print(set1)

{1, 2, 3}
{1, 2, 3, 4}


The operators |, &, - and ^ can be used to compute the union, intersection, difference and symmetric difference between sets, respectively:

In [18]:
print(set1 | set2)
print(set1 & set2)
print(set2 - set1) 
print(set2 ^ set1)  

{1, 2, 3, 4, 5, 6}
{4}
{5, 6}
{1, 2, 3, 5, 6}


## Call by Reference vs. Call by Value

There is an important difference between passing complex data objects (like the data structures discussed this week) as arguments to a function, compared to passing variables of simple types. Have a look at the following code and try to guess what it does before you (execute it and) check the actual output:

In [19]:
# function that manipulates the string passed as argument
def add_to_string(string, addition):
    string = string + addition
    print(f"\t {string}")
    
# function that manipulates the dictionary passed as argument
def add_to_dictionary(dictionary, key, value):
    dictionary[key] = value
    print(f"\t {dictionary}")
    
#main program
a_string = "Hello!"
print(a_string)
add_to_string(a_string, " Hello World!")
print(a_string)
a_dictionary = {}
print(a_dictionary)
add_to_dictionary(a_dictionary, "greeting", "Hello World!")
print(a_dictionary) 

Hello!
	 Hello! Hello World!
Hello!
{}
	 {'greeting': 'Hello World!'}
{'greeting': 'Hello World!'}


A string and a dictionary are defined and then passed to a string and dictionary manipulation function, respectively. The printouts within the functions show the effects of the manipulation. However, the printouts after the function calls show a difference: The string is still the same as before the manipulation, while the dictionary has changed. The reason for this lies in the way that parameters are passed to functions. Passing variables of the basic data types (such as strings, integers, floats and booleans) happens as call-by-value, that is, the current value of the variable is copied to create a new local variable with the same value for use inside the function. Because it is only visible in the scope of the function, however, changes to it will not be visible after the function. To achieve that, the new values would have to be returned by the function and, e.g., assigned to another variable by the calling code.

In contrast, passing variables of complex data types (such as lists and dictionaries) happens as call-by-reference. As with shallow copies, only the reference to the (address of the) object in the memory is copied to a new variable and passed as argument, but no (deep) copy of the object itself is created. Thus, manipulations to the object that happen inside a function will also be visible afterwards, also without returning and re-assigning the result of the function. If a function is not supposed to be able to change the object passed as an argument, a (deep) copy needs to be made before it is passed to the function.

## Exercises

Please use Quarterfall to submit and check your answers. 

### 1. Anagram Test (★★★★☆)
An anagram is a word or phrase that is made by rearranging the letters of another word or phrase. For example, "secure" is an anagram of "rescue".  Write a function is_anagram(word1,word2) that checks if the two words are anagrams of each other. If so, the function should return True, and False otherwise. You can use the following code to test your function:
```
# Test program
print(is_anagram("rescue", "secure"))
print(is_anagram("Rescue", "Secure"))
print(is_anagram("Rescue", "Anchor"))
print(is_anagram("Ship", "Secure"))
```
The output should be:
```
True
True
False
False
```
That is, the function should **not** distinguish between upper- and lower-case letters.


### 2. Room Occupancy (★★★★☆)
Imagine a small hostel with four four-bed rooms (with the arbitrarily chosen numbers 101, 102, 201, and 202). You want to write a little program for the hostel staff to help them keep track of the room occupancy and checking guests in and out. The code for the user interaction already exists (see below), but you still need to implement the missing functions:
* `print_occupancy` should simply print out a list of all rooms and the guests that are currently checked in.
* `check_in` should add a guest to a room. If a non-existing room number is given or if the chosen room is already full, a corresponding message should be printed. It is allowed to have two (or more) guests with the same name in one room. 
* `check_out` should remove a guest from a room. If a wrong room number or guest name is passed, a corresponding message should be printed. 

The following code shows how the functions are used. You can also use it to test your implementation:

```
# Main program
room_occupancy = {101:[], 102:[], 201:[], 202:[]} 
while True:
    print("These are your options:")
    print("1 - View current room occupancy.")
    print("2 - Check guest in.")
    print("3 - Check guest out.")
    print("4 - Exit program.")
    choice = input("Please choose what you want to do: ") 
    if choice == "1":
        print_occupancy(room_occupancy)
    elif choice == "2":
        guest = input("Enter name of guest: ")
        room = int(input("Enter room number: "))
        check_in(room_occupancy, guest, room)
    elif choice == "3":
        guest = input("Enter name of guest: ")
        room = int(input("Enter room number: "))
        check_out(room_occupancy, guest, room)
    elif choice == "4":
        print("Goodbye!")
        break
    else:
        print("Invalid input, try again.")
```

### 3. Room Occupancy (★★★★☆)
Imagine a small hostel with four four-bed rooms (with the arbitrarily chosen numbers 101, 102, 201, and 202). You want to write a little program for the hostel staff to help them keep track of the room occupancy and checking guests in and out. The code for the user interaction already exists (see below), but you still need to implement the missing functions:
* `print_occupancy` should simply print out a list of all rooms and the guests that are currently checked in.
* `check_in` should add a guest to a room. If a non-existing room number is given or if the chosen room is already full, a corresponding message should be printed. It is allowed to have two (or more) guests with the same name in one room. 
* `check_out` should remove a guest from a room. If a wrong room number or guest name is passed, a corresponding message should be printed. 

The following code shows how the functions are used. You can also use it to test your implementation:

```
# Main program
room_occupancy = {101:[], 102:[], 201:[], 202:[]} 
while True:
    print("These are your options:")
    print("1 - View current room occupancy.")
    print("2 - Check guest in.")
    print("3 - Check guest out.")
    print("4 - Exit program.")
    choice = input("Please choose what you want to do: ") 
    if choice == "1":
        print_occupancy(room_occupancy)
    elif choice == "2":
        guest = input("Enter name of guest: ")
        room = int(input("Enter room number: "))
        check_in(room_occupancy, guest, room)
    elif choice == "3":
        guest = input("Enter name of guest: ")
        room = int(input("Enter room number: "))
        check_out(room_occupancy, guest, room)
    elif choice == "4":
        print("Goodbye!")
        break
    else:
        print("Invalid input, try again.")
```