
<img src="https://raw.githubusercontent.com/abchapman93/delphi-python-2025-dev/refs/heads/main/media/DELPHI-long.png">
</br>

<h1 valign="center" align="center"><font size="+150">Introduction to Python</br>December 2025</font></h1>

In [None]:
!pip install https://github.com/abchapman93/delphi-python-2025/releases/download/v2/uu_delphi_python_dec25-0.2.tar.gz

In [None]:
from uu_delphi_python_dec25.helpers import *
from uu_delphi_python_dec25.quizzes.module1_quizzes import *

# Data Structures Part 2

## Sets
Sets, like lists and tuples, are collections of Python objects. Two of the key characteristics of lists and tuples are:
1. **Ordering and indexing**: The order matters in lists and tuples. Two lists which have the same elements but in different orders are not considered the same (can you show this using Python code?). Because the elements are ordered, you can access individual elements using their positional index.
2. **There can be duplicate values**: Because lists and tuples are defined by both the elements in them and their order, it's perfectly valid for a list to have the same element more than once. For example, `["a", "c", "c"]` or `[1, 1, 2]`

**Sets** differ in both these qualities. Sets in Python are closer to the idea of [mathematical sets](https://en.wikipedia.org/wiki/Set_(mathematics)) than lists and arrays, in that they are a collection of objects and what matters most is what elements are **members** of the set.

We declare a set in Python similar to lists and tuples, but with curly brackets:

```
x = {x1, x2, ...}
```

We can also take another collection and create a set out of it:

```
set([1, 2, 3])
```

#### TODO
Declare three sets below: 
1. `evens`, which contains the elements `[2, 4, 6, 8, 10]`
2. `odds`, which contains the elements `[1, 3, 5, 7, 9]`
3. `primes`, which contains `[2, 3, 5, 7]`

In [None]:
evens = ____
____ = {1, 3, 5, 7, 9}
____ = ____

To check if an object is a **member** of a set, we can use the `in` keyword, which returns `True` if the element is in the set and `False` otherwise.

```python
# Example:
x_set = {1, 2, 3}
1 in x_set
>>> True
0 in x_set 
>>> False
```

#### TODO
What code would check whether `1` is in `primes`? What value would this code return?

In [None]:
# RUN CELL TO SEE QUIZ
quiz_code_1_in_primes

In [None]:
# RUN CELL TO SEE QUIZ
quiz_1_in_primes

#### TODO
What value would we get if we executed `evens[0]`? Why?

In [None]:
# RUN CELL TO SEE QUIZ
quiz_evens0 

#### Comparing sets
We often want to compare two sets with each other. For example, if we consider all [natural numbers](https://en.wikipedia.org/wiki/Natural_number) from 1-10, this is a set of 10 numbers:

```python
naturals_10 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
```

We could create subsets such as:
- All **odd** numbers in the first 10 natural numbers
- All **even** numbers
- All **prime** numbers (i.e., numbers only divisible by one and themselves)

That's exactly what we did when we declared our three sets above.

Let's say we want to ask: Which numbers are both **odd** and **prime**?


This can be visually compared using [Venn diagrams](https://en.wikipedia.org/wiki/Venn_diagram). 

One way we could do this is check each number and see whether it's in both `odds` and `primes`:

In [None]:
(1 in odds) & (1 in primes)

In [None]:
(2 in odds) & (2 in primes)

In [None]:
(3 in odds) & (3 in primes)

But this gets old pretty quickly. A faster way to do this is using the method `set.intersection()`:

In [None]:
odds.intersection(primes)

#### TODO
Create a set called `even_primes` of all numbers between 1 and 10 which are both even and prime. Don't manually write out the set, but instead use existing sets and methods.

In [None]:
____

In [None]:
# Solution
even_primes = evens.intersection(primes)

In [None]:
# RUN CELL TO TEST VALUE
test_even_primes.test(even_primes)

#### TODO
What is the length of `odds.intersection(evens)`?

In [None]:
# RUN CELL TO SEE QUIZ
quiz_len_odds_evens

If we think of a Venn diagram, the intersection is the part in the middle of two circles which overlap (i.e., **"intersect"** ).

The other parts of a Venn diagram are:

**The difference**: These are the elements that are in one set but not the other.

In [None]:
evens.difference(primes)

#### TODO
Create a set called `primes_not_odd` which is the set of all prime numbers that are not odd.

In [None]:
____

In [None]:
primes_not_odd

In [None]:
# RUN CELL TO TEST VALUE
test_primes_not_odd.test(primes_not_odd)

#### TODO
What is the value of `len(primes_not_odd.difference(even_primes))`?

In [None]:
# RUN CELL TO SEE QUIZ
test_len_pno_ep

**The union:** These are all elements in both sets (i.e., all off the venn diagram)

In [None]:
odds.union(primes)

#### TODO
Create a new set called `naturals_10` which contains all of the whole numbers 1-10. Don't manually write out the set as `{1, 2, ...}`, but instead use existing sets to create a **superset**.

In [None]:
naturals_10 = ____

In [None]:
# RUN CELL TO TEST VALUE
test_naturals_10.test(naturals_10)

#### Deduplicating with sets
Another common way of using sets is **deduplicating collections**. We said earlier that lists and tuples can have duplicate values, while sets contain only unique values. That means that if we want to know what all the unique values in a list with duplicates are, we can do this by turning a list into a set.

For example, let's say we had a class with a lot of people whose names start with **"A"** and put their names in a list. To get the unique list of first names, we could do the following:

In [None]:
first_names = ["Alex", "Alec", "Alex", "Aaron", "Alex", "Alek", "Alexis"]
first_names_unq = set(first_names)
first_names_unq

#### TODO
Let's say we had a list of the cities where a group of patients live. We want to know how many unique cities our patient populations reside in. Create an object called called `unq_cities` and count how many values are in it. Save that number as `num_unq_cities`.

In [None]:
pt_cities = [
    "Salt Lake City",
    "Ogden",
    "Evanston",
    "Salt Lake City",
    "Salt Lake Cit, UT",
    "Provo",
    "Provo"
]

In [None]:
unq_cities = ____
num_unq_cities = ____

In [None]:
# RUN CELL TO TEST VALUE
test_num_unq_cities.test(num_unq_cities)

#### Discussion
What is an issue with our count above?

## Dictionaries
Like sets, **dictionaries** are unique and unordered. However, while sets are collections of individual elements, dictionaries are collections of **key/value pairs**. 

A key/value pair is a unique mapping from one item (a key) to another (a value). An example of this in real life is a mapping from states to their capitals:

- Utah --> Salt Lake City
- Pennsylvania --> Harrisburg
- New York --> Albany

The states are the keys and the capitals are values. Let's see what this would look like in Python.

Dictionaries are declared using curly brackets (just like sets). But we signify the key/value mapping using a colon **":"**.

In [None]:
state_capitals = {
    "Utah": "Salt Lake City",
    "Pennsylvania": "Harrisburg",
    "New York": "Alabany"
}

In [None]:
state_capitals

Let's say we want to know the capital of a particular state. We can get this by using the key (i.e., state) as the index:

In [None]:
state_capitals["Utah"]

#### TODO
What code would we use to get the capital of Pennsylvania?

In [None]:
# RUN CODE TO SEE QUIZ
test_capital_pa

In [None]:
quiz_check_len_dict

If you try to index using a key that isn't in the dictionary, you get an error:

In [None]:
state_capitals["California"]

You can add a key/value pair to a dictionary like this:

In [None]:
state_capitals["California"] = "Sacramento"

In [None]:
state_capitals

And you can remove one using the `.pop(key)` method (similar to lists):

In [None]:
state_capitals.pop("California")

#### TODO
Add `"Idaho"` and its capital city to our dictionary.

In [None]:
____

In [None]:
# Solution
state_capitals["Idaho"] = "Boise"

In [None]:
# RUN CODE TO TEST VALUE
test_capital_idaho.test(state_capitals["Idaho"])

#### Uniqueness
Just like sets, dictionaries are unique, so you can only map a key to a single value. This makes sense in our case: a state can't have more than one capital!

#### TODO

What value would be printed out by the code below?

```python
state_capitals2 = {
    "California": "Sacramento",
    "Utah": "Salt Lake City",
    "Pennsylvania": "Harrisburg",
    "New York": "Alabany"
}


state_capitals2["New York"] = "New York City"
print(state_capitals2["New York"])
```

In [None]:
# RUN CELL TO SEE QUIZ
quiz_state_capitals2_ny 



#### Discussion
Let's say we were mapping states to **all** cities in that state, not just the capital. How could we do this with a Python dictionary? Implement it below using the following cities:

- **New York**: Albany, Buffalo, New York City 
- **California** Los Angeles, Sacramento, San Francisco, San Diego
- **Pennsylvania**: Harrisburg, Pittsburgh, Philadelphia
- **Utah**: Ogden, Provo, Salt Lake City

### Emergency room wait times

Let's think back to our example of emergency room patients. Earlier, all we had done is record patient names. But let's say we started to record some additional information as well:

| name | arrival_time | age | severity |
|------|--------------|-----|----------|
|   Jim   |      6:00        |  40   |     40     |
|   Mary   |       6:30       | 31   |     10     |
|   Rachel   |        7:00      |   27  |    20      |
|    Laura  |        7:30      |   38  |     15     |
|   Chloe   |         8:00     |   25  |     50     |

We can use dictionaries to map each name to the respective value. Let's start by just mapping names to arrival times:

In [None]:
pt_arrivals = {
    "Jim": "6:00",
    "Mary": "6:30",
    "Rachel": "7:00",
    "Laura": "7:30",
    "Chloe": "8:00"
}

In [None]:
pt_arrivals

#### TODO

In [None]:
# RUN CELL TO SEE QUIZ
quiz_code_rachel_arrival 

We can separate out the keys and values using the `dict.keys()` and `dict.values()` methods:

In [None]:
pt_arrivals.keys()

In [None]:
pt_arrivals.values()

This can be useful if we just want to check if there is a patient by a certain name in our records, or if anyone arrived at a certain time.

#### TODO

In [None]:
# RUN CODE TO SEE QUIZ
test_check_pt_name

In [None]:
# RUN CELL TO SEE QUIZ
quiz_jacob_in_dict

In [None]:
# RUN CELL TO SEE QUIZ
test_check_pt_time

All of the examples we've seen so far have had strings as both the keys and values. But there are lots of other options for what data we put in a dictionary. Without getting too much into the details, the values can be any data type that is **hashable** - so that includes numerics, ints, tuples, and strings. The values can be any data type.

So in our earlier example where we wanted to map states to multiple names of cities, we could have done that as:

```python
{
    "Utah": {"Salt Lake City", "Provo", "Ogden", "Park City"},
    "Pennsylvania": {"Pittsburgh", "Philadelphia", "Erie", "Harrisburg"},
    "California": {"San Francisco", "San Diego", "Los Angeles", "Sacramento"}
}
```

In [None]:
# RUN CELL TO SEE CODE
quiz_dict_data_types

One especially useful way of structuring is **nested dictionaries** where the keys are some basic data type, such as strings, and the values are other dictionaries. This is a common way of mapping keys to multiple attributes.

For example, if we wanted to map state names to several different facts about them, we could do that as:

In [None]:
states = {
    "Utah": {
        "cities": {"Salt Lake City", "Provo", "Ogden", "Park City"},
        "capital": "Salt Lake City",
        "population": 3.15 # 3.15 million
    },
    "Pennsylvania": {
        "cities": {"Pittsburgh", "Philadelphia", "Erie", "Harrisburg"},
        "capital": "Harrisburg",
        "population": 12.79
    },
    "California": {
        "cities": {"San Francisco", "San Diego", "Los Angeles", "Sacramento"},
        "capital": "Sacramento",
        "population": 39.35
    }
}

In [None]:
# RUN CELL TO SEE QUIZ
quiz_type_states_utah

Once we access the inner dictionary in a nested dictionary, we can then use the keys in that dictionary to look up the properties of the city:

In [None]:
states["Utah"]["capital"]

#### TODO

In [None]:
# RUN CELL TO SEE QUIZ
quiz_ca_population

#### TODO
Let's say we wanted to get the set of all cities in either Utah or Pennsylvania. Using `states`, create a new variable called `cities_ut_pa` which contains all the corresponding elements.

In [None]:
____ = ____
cities_ut_pa

In [None]:
# RUN CELL TO TEST VALUE
test_cities_ut_pa.test(cities_ut_pa)