# Set

So far, we have covered two types of collections in Python: list and tuple. The advantage of a tuple is that it is immutable and of a list that it is sortable. The disadvantage of lists and tuples is that they are very slow when you need to find a specific value. For this purpose, Python has a 'set'. A set is a collection in which all elements are unique. The values are not stored at indices but by a smart way that allows to find elements very efficiently.

A literal set is created by summing up elements between curly braces `{}`. It is allowed to add duplicates, but as you can see, the end result is that every value only appears once. 

In [None]:
places = {'Delft', 'Amsterdam', 'Delft', 'Leiden'}

In [None]:
places

Typical use of sets is to efficiently maintain a collection of unique values, quickly check if a value is in a set and to iterate over a set. The following table lists common operations on sets:

In [None]:
a = {2, 5, 3, 5}
b = {5, 4, 2}

| code | result | comment | mathematical notation |
|:-----|:-------|:--------|:--------|
| 2 in a | True | True is the given element is in the set | $2\in a$<br>$x$ is an element of $a$|
| a.add(9) | a == {2, 3, 5, 9} | adds an elemment to the set | $a$ becomes $a\cup\{9\}$<br>$a$ becomes the union of set a and the set $\{9\}$, <br>note that in mathematics we only have notation to unite sets,<br>there is no notation to add a single element to a set |
| a.remove(2) | a == {3, 5} | removes an element from a set | $a$ becomes $a\setminus \{2\}$<br>$a$ becomes the set difference of set $a$ and the set $\{2\}$ |
| len(a) | 3 | A set is just like any iterable collection, <br>however, note that all elements are unique | $|a|$<br> this is the count of elements in $a$ |
| sum(a) | 10 |  | no math notation |
| set([1, 1, 2]) | {1, 2} | creates a set from the unique elements in a list | no math notation |
| a.union(b) | {2, 3, 4, 5} | returns a union of two sets (a remains unchanged) | $ a\cup b $ <br> set $a$ united with set $b$|
| a.intersect(b) | {2, 5} | returns the intersect between two sets | $a\cap b$ <br> set $a$ intersected with set $b$ |

In mathematics it is custom to use capitals for set ($A, B, C$). When programming we can use lowercase letters if we like.

The last few examples show that we can apply set-operations on sets, therefore there are also function to check `issubset` and to get the `difference` between two sets. 

There are also some operations on lists that cannot be applied to sets: indexing, slicing, sorting, inserting at a position. Note that although it may seem that sets are sorted, there is no guarantee that they are. 

Note that set difference is not symmetrical, just as normal subtraction is not symmetrical ($5-3\neq 3-5$).

$\{0, 1, 3, 8\}\setminus\{3,8\}={0, 1}$

$\{3,8\}\setminus\{0, 1, 3, 8\} = \{\} = \emptyset$

This last symbol is the empty set symbol. This is a set with no elements in it. Theoretically there exists only one empty set.

# Assignments

#### Check if the value 3 and 5 are in `b`, in one expression

In [None]:
b = {3, 5, 7}

In [None]:
%%assignment
### BEGIN SOLUTION
3 in b and 5 in b
### END SOLUTION

In [None]:
%%check
b = {3, 7, 8}
not result
b = {5, 7, 9}
not result
b = {3, 5, 7}
result

#### Create a literal set `vowel`, with the vowels 'a', 'e', 'i', 'o', 'u'.

In [None]:
%%assignment
### BEGIN SOLUTION
vowel = {'a', 'e', 'i', 'o', 'u'}
### END SOLUTION

In [None]:
%%check
len(vowel) == 5
'aeiou' == ''.join(sorted(vowel))

#### Write a function `check_vowel`, that takes a `character` and returns True if the character is a vowel, or False otherwise.

Note, you do not need use an `if` and return a True/False, since a comparison is already a Boolean, you can just return the result of a comparison.

In [None]:
%%assignment
### BEGIN SOLUTION
def check_vowel(character):
    return character in {'a', 'e', 'i', 'o', 'u'}
### END SOLUTION

In [None]:
%%check
signature check_vowel character
vowel = 'aeiou'
all(map(check_vowel, 'aeiou')) # Your check_vowel function does not return True for every vowel
import string
sum(map(check_vowel, string.ascii_lowercase)) == 5

# Set vs List

Note that for this last assignment, we could have used a `list` instead of a `set`. However, the lookup done with the `in` operator in a `set` is a magnitude faster. Therefore, for large collections, you will prefer to use sets to do lookups.

We will do a little experiment, we will use a set and list that are populated with all even numbers smaller than 1 million.

In [None]:
numbers_set = { i for i in range(1000000) if i % 2 == 0 }

In [None]:
numbers_list = [ i for i in range(1000000) if i % 2 == 0 ]

Then we do a hundred lookups in each of them. The %%time reports the time it takes to execute a cell.

In [None]:
%%time
for i in range(1000):
    if i in numbers_set:
        pass

In [None]:
%%time
for i in range(1000):
    if i in numbers_list:
        pass

Since a µs is $10^{-6}$s, it is easy to see why we use sets, the lookups are around 40.000 times faster.

# A perculiar mathematical result
In mathematics, we define two sets $A$ and $B$ of the same size, whe we can find a function that transforms an element of $A$ in an element of $B$ in such a way that we have an exact one-to-one correspondence.

So, what do you think is a larger set, the set of all numbers or the set of even numbers?

# Assignment
Write a function `get_evens` that returns a set where each element is double the value of an element in `numbers`. The function receives `numbers` as an argument.

In [None]:
%%assignment
### BEGIN SOLUTION
def get_evens(numbers):
    return {2 * i for i in numbers}
### END SOLUTION

In [None]:
%%check
numbers = set(range(20))
evens = get_evens(numbers)
len(numbers) == len(evens)
2 in evens
not 3 in evens
numbers = set(range(5))
evens = get_evens(numbers)
len(evens) == 5

Execute your function with a set of the numbers 1 to 1000. Assign the result to the variable `evens`. Check if the number of numbers is the same as the number of evens. Assign the (boolean) result to the variable `same_length`.

In [None]:
%%assignment
numbers = set(range(1, 1000))
### BEGIN SOLUTION
evens = get_evens(numbers)
same_length = len(numbers) == len(evens)
### END SOLUTION
same_length

In [None]:
%%check
mandatory get_evens len
same_length == True

Note that the maximum of 1000 for the set numbers is chosen arbitrary. With every size of this set, we would have got the same result. The set `numbers` is always of equal size as the set `evens`.

Programmatically we cannot extend this until infinity, but in mathematics we can. Thus a perculiar mathematical result is that there are as much (positive) numbers as there are positive even numbers!

$|\{1, 2, 3, ...\}|=|\{2, 4, 6, ...\}|$