# Sets

**In this notebook, we cover the following subjects:**
- Defining a Set;
- Set Methods;
- Let's Think.
___________________________________________________________________________________________________________________________

In [None]:
# To enable type hints for lists, dicts, tuples, and sets we need to import the following:
from typing import List, Dict, Tuple, Set

<h2 style="color:#4169E1">Defining a Set</h2>

Another **mutable** built-in type provided by Python is a [set][set]. A set is an **unordered** collection of elements that has **no duplicated** items. We can create a set by listing its elements within curly braces (`{}`) or using the `set()` function. The syntax looks as follows:

```python
my_set: Set = {element1, element2, element3}
```

Let's look at an example.

[set]:https://programming-pybook.github.io/introProgramming/chapters/sets.html

In [None]:
# Creating a set with elements
coffee_types: Set[str] = {"Espresso", "Latte", "Cappuccino", "Americano"}

print(coffee_types)
type(coffee_types)

In [None]:
# Creating an empty set
empty_set: Set = set()
print(empty_set)
type(empty_set)

Now, with this new knowledge we update our overview of the data types and their properties.

| Property                              | Set        |Tuple           | List            | Dict Keys              | Dict Values              | 
|-------------------------------------- |------------|----------------|-----------------|------------------------|--------------------------|
| **Mutable** (can you add add/remove?) | yes        |no              | yes             | yes                    | yes                      |      
| **Can** contain duplicates            | no         |yes             | yes             | no                     | yes                      |
| **Ordered**                           | no         |yes             | yes             | yes (since Python 3.7) | yes (follows key order)  |
| **Can** contain                       | immutables |all             | all             | immutables             |  all                     |



<div class="alert" style="background-color: #ffecb3; color: #856404;">
    <b>Note</b><br>
Since sets are unordered, you can’t use indexing to access an element or slicing to get a subset.</div>

One of the most interesting properties of a set is that it doesn’t allow duplicate elements. This means that if you add an element twice to a set, it will only keep one occurrence. Let's look at an example:

In [None]:
spotify_tags: Set[str] = {"country", "rock", "hiphop", 
                          "indie", "country", "pop"}

A set is a great way to store tags as you don’t really care about their order and only want each tag stored once. So, what will happen when we print this?

In [None]:
# What happens to the order?
print(spotify_tags)

Now, we want to group tags that are a bit similar and we do this using a list, what will happen when we print this cell?

In [None]:
spotify_tags: Set = {"country", "rock", "hiphop", 
                          "indie", "country", 
                          ["pop", "dance-pop", "alt-pop",
                           "new wave", "dream pop"]}

<h2 style="color:#4169E1">Set Methods</h2>

Let's look into some fundamental set methods.

<h4 style="color:#B22222">The <code>.add()</code> Method</h4>

This method is used to add elements to a set.

In [None]:
dutch_cities: Set[str] = {
    "Amsterdam", "Rotterdam", "The Hague", "Utrecht", "Eindhoven", 
}
print(dutch_cities)

In [None]:
dutch_cities.add("Groningen")
print(dutch_cities)

<h4 style="color:#B22222">The <code>.remove()</code> Method</h4>

As the name suggests, this method removes an element from a set and raises an error when the element isn’t present.

In [None]:
movie_tags: Set[str] = {"comedy", "fantasy", "science fiction", 
                        "action", "drama"}
print(movie_tags)

In [None]:
# When a tag in present
movie_tags.remove("action")
print(movie_tags)                  

In [None]:
# When a tag is not present
movie_tags.remove("documentary")
print(movie_tags)

<h4 style="color:#B22222">The <code>.discard()</code> Method</h4>

This method is almost identical to remove, but it does not raise an error if an element cannot be found in the set. So, in what scenarios would you choose `.discard()` over `.remove()`?

In [None]:
pasta_types: Set[str] = {
    "lasagna", "penne", "ravioli", "spaghetti", 
    "macaroni", "orzo", "tagliatelle"
}
print(pasta_types)

In [None]:
# When a tag in present
pasta_types.discard("lasagna")
print(pasta_types)                  

In [None]:
# When a tag is not present
pasta_types.discard("gnocchi")
print(pasta_types)

<h4 style="color:#B22222">The <code>.union()</code> Method or <code>|</code></h4>

When you take the union of two sets, you get a new set with all the unique elements from both sets.

In [None]:
set1: Set[int] = {1, 2, 3}
set2: Set[int] = {3, 4, 5}
union_set: Set[int] = set1.union(set2)

print(union_set)

<h4 style="color:#B22222">The <code>.intersection()</code> Method or <code>&</code></h4>

The intersection of two sets returns a new set with the elements that are present in both sets.

In [None]:
set1: Set[int] = {1, 2, 3, 4}
set2: Set[int] = {3, 4, 5}
intersection_set: Set[int] = set1.intersection(set2)

print(intersection_set)

<h4 style="color:#B22222">The <code>difference()</code> Method or <code>-</code></h4>

This method returns a new set with elements from the first set that aren't in the second set.

In [None]:
set1: Set[int] = {1, 2, 3, 4}
set2: Set[int] = {3, 4, 5}
difference_set: Set[int] = set1.difference(set2)

print(difference_set)

<h2 style="color:#4169E1">Let's Think!</h2>

What are the outcomes of the following Boolean expressions?

In [None]:
{1, 2, 3} <= {1, 2, 3, 4}

In [None]:
{1, 2, 3} & {3, 4, 5}

In [None]:
{1, 2, 3}.isdisjoint({4, 5, 6})

In [None]:
{1, 2, 3} | {3, 4, 5} == {1, 2, 3, 4, 5}

In [None]:
{1, 2, 3}.symmetric_difference({2, 4, 5}) == {1, 4}

In [None]:
{1, 2, 3}.issubset({1, 2, 3, 4, 5})

In [None]:
{1, 2, 3}.difference({3, 4, 5}) == {1, 2}

In [None]:
{1, 2, 3} | {3, 4, 5} < {1, 2, 3, 4, 5, 6}

In [None]:
(not {2, 5, 4}) != (2, 5, 4)

<h2 style="color:#3CB371">Exercises</h2>

Let's practice! Mind that each exercise is designed with multiple levels to help you progressively build your skills. <span style="color:darkorange;"><strong>Level 1</strong></span> is the foundational level, designed to be straightforward so that everyone can successfully complete it. In <span style="color:darkorange;"><strong>Level 2</strong></span>, we step it up a notch, expecting you to use more complex concepts or combine them in new ways. Finally, in <span style="color:darkorange;"><strong>Level 3</strong></span>, we get closest to exam level questions, but we may use some concepts that are not covered in this notebook. However, in programming, you often encounter situations where you’re unsure how to proceed. Fortunately, you can often solve these problems by starting to work on them and figuring things out as you go. Practicing this skill is extremely helpful, so we highly recommend completing these exercises.

For each of the exercises, make sure to add a `docstring` and `type hints`, and **do not** import any libraries unless specified otherwise.
<br>

### Exercise 1

<span style="color:darkorange;"><strong>Level 1</strong>:</span> In this exercise, you’ll be working with a data structure that holds the names of students who passed different high school subjects like history, math, physics, economics, etc. Your goal is to create a function called `students_who_passed_all_subjects()` that identifies the students who passed every subject. To do this, you’ll use **set methods** to pinpoint the names that appear in all the subjects. The function should work as follows:

- It should take a **dictionary** as input, with each key representing a subject and the value being a set of student names who passed that subject.
- It should **return** a set containing the names of students who passed every subject.

**Print** the set **outside** the function in a clear and readable format.

**Example input**: you pass this argument to the parameter in the function call.
```Python
subjects: Dict[str, Set[str]] = {
    'history': {'Einstein', 'Curie', 'Tesla', 'Hopper', 'Goodall'},
    'math': {'Einstein', 'Curie', 'Turing', 'Hopper'},
    'physics': {'Einstein', 'Curie', 'Hopper', 'Goodall'},
    'economics': {'Einstein', 'Curie', 'Tesla', 'Hopper'},
    'biology': {'Curie', 'Goodall', 'Turing', 'Einstein'},
    'literature': {'Hopper', 'Turing', 'Einstein', 'Curie'}
}
```

**Example output**:
```Python
"Students who passed all subjects: Einstein, Curie"
```

In [None]:
# TODO.

<span style="color:darkorange;"><strong>Level 2</strong>:</span> Next, you’ll enhance your program by creating a more advanced version, called `students_who_passed_minimum_subjects`. This time, your task is to identify which students passed at least a specified minimum number of subjects. You’ll still be working with the same data structure, but now you’ll add a parameter to define the minimum number of subjects a student needs to pass to be included in the results. Your function should work as follows:

- The function should take two parameters: the same **dictionary** of subjects and a minimum threshold (as **int**) that represents the number of subjects a student must pass.
- It should **return** a set of students who passed at least the given number of subjects.

**Print** the set **outside** the function in a clear and readable format.

**Example input**: you pass these arguments to the parameters in the function call.

```python
subjects: Dict[str, Set[str]] = {
    'history': {'Einstein', 'Curie', 'Tesla', 'Hopper', 'Goodall'},
    'math': {'Einstein', 'Curie', 'Turing', 'Hopper'},
    'physics': {'Einstein', 'Curie', 'Hopper', 'Goodall'},
    'economics': {'Einstein', 'Curie', 'Tesla', 'Hopper'},
    'biology': {'Curie', 'Goodall', 'Turing', 'Einstein'},
    'literature': {'Hopper', 'Turing', 'Einstein', 'Curie'}
}
minimum_subjects: int = 4
```
**Example output**:
```Python
"Students who passed a minimum of 4 subjects: Einstein, Curie, Hopper"
```


In [None]:
# TODO.

<span style="color:darkorange;"><strong>Level 3</strong>:</span> For the final level, you will work with data stored in a [JSON][json] file. Create a function called `top_students_by_subjects()` that identifies the students who passed the highest number of subjects, highlighting the top performers. The data is provided in a JSON file named `subjects.json`, which contains a dictionary where each key represents a subject and each value is a **list** of student names who passed that subject. Since working with files is new to you, we’ll break down the expected steps more clearly.

1. First, outside of any function, you need to read the `subjects.json` file to get the data. Open the file, read its contents, and parse it into a dictionary. This dictionary will initially have the same format as the JSON file. However, you need to modify this dictionary so that it has subject names as keys and **sets** of students who passed those subjects as values. **Note** that you need to convert the list of student names to sets of student names. This conversion is also done outside of any function.

2. After extracting and modifying the data, pass this dictionary into the function `top_students_by_subjects()`. 

The function should work as follows:
- It should take a **dictionary** of subjects as input and **return** a **set of tuples**. Each tuple should contain the name of a student and the maximum number of subjects that student passed.
- It should identify the maximum number of subjects passed by any student and include this number in each tuple for the top students.
- If multiple students share the top number of subjects passed, all their names should be included in the result.

**Print** the set **outside** the function in a clear and readable format.

**Example JSON file**:
```json
{
    "history": ["Serena", "Simone", "Naomi", "Megan", "Allyson", "Marta", "Caster"],
    "math": ["Serena", "Simone", "Allyson", "Megan", "Marta", "Caster"],
    "physics": ["Serena", "Simone", "Allyson", "Marta", "Shelly-Ann"],
    "economics": ["Serena", "Simone", "Naomi", "Megan", "Allyson", "Shelly-Ann"],
    "biology": ["Simone", "Allyson", "Naomi", "Serena", "Caster", "Shelly-Ann"],
    "literature": ["Megan", "Marta", "Caster"]
}
```

**Example input**: you pass this argument to the parameter in the function call.
```python
subjects: Dict[str, Set[str]] = {
    "history": {"Serena", "Simone", "Naomi", "Megan", "Allyson", "Marta", "Caster"},
    "math": {"Serena", "Simone", "Allyson", "Megan", "Marta", "Caster"},
    "physics": {"Serena", "Simone", "Allyson", "Marta", "Shelly-Ann"},
    "economics": {"Serena", "Simone", "Naomi", "Megan", "Allyson", "Shelly-Ann"},
    "biology": {"Simone", "Allyson", "Naomi", "Serena", "Caster", "Shelly-Ann"},
    "literature": {"Megan", "Marta", "Caster"}
}
```

**Example return value**:
```Python
{('Serena', 5), ('Simone', 5), ('Allyson', 5)}
```

**Example output**:
```Python
"Students who passed a maximum number of 5 subjects: Serena, Simone, Allyson"
```

[json]:https://programming-pybook.github.io/introProgramming/chapters/data_preparation.html?javascript-object-notation-json#javascript-object-notation-json

In [4]:
# TODO.

___________________________________________________________________________________________________________________________

*Material for the VU Amsterdam course “Introduction to Python Programming” for BSc Artificial Intelligence students. These notebooks are created using the following sources:*
1. [Learning Python by Doing][learning python]: This book, developed by teachers of TU/e Eindhoven and VU Amsterdam, is the main source for the course materials. Code snippets or text explanations from the book may be used in the notebooks, sometimes with slight adjustments.
2. [Think Python][think python]
3. [GeekForGeeks][geekforgeeks]

[learning python]: https://programming-pybook.github.io/introProgramming/intro.html
[think python]: https://greenteapress.com/thinkpython2/html/
[geekforgeeks]: https://www.geeksforgeeks.org