### Python Sets

Python has an implementation of sets that supports many set operations:
- cardinality -> len(s)
- membership testing -> in, not in
- unions -> s1 | s2, s1.union(s2)
- intersections -> s1 & s2, s1.intersection(s2)
- differences -> s1 - s2, s1.difference(s2)
- symmetric differences -> s1 ^ s2, s1.symmetric_difference(s2)
- subsets -> s1 <= s2, s1.issubset(s2), s1 < s2
- supersets -> s1 >= s2, s1.issuperset(s2), s1 > s2
- disjointness -> s1.isdisjoint(s2)

The type of sets is *set*

Set literals -> {'a', 10, 10.5}

Empty set -> set()

Notice how the literal notation for sets uses the same braces {} as dictionaries  
In fact sets are hash tables as well

Unlike dictionary hash tables, sets on contain the "keys", not the values

-> set(iterable)

The above will create a set!

##### Elements of a Set
- must be unique (distinct)
- must be hashable
- have no guaranteed order

A set is a mutable collection -> can add and remove elements

Therefore a set is not hashable!
- cannot be used as a dictionary key
- cannot be used as an element in another set
 - no set of sets!

##### Frozen Sets

Frozen Sets are the immutable equivalent of sets

- Think of tuples and lists

It is just created using frozenset()

frozenset() can take iterables much like how set() can

The elements of a frozen set:
- must be unique
- must be hashable
- have no guaranteed order

We can convert any set to a frozen set by simply passing the set to the frozenset() function

There is **no** literal for a frozen set

A frozenset is hashable!
- Can be used as a dictionary key
- Can be used as an element of a set (or frozenset)

##### Membership Testing

Testing membership of an element in a set is extremely efficient (hash table lookup)
- in, not in

In fact, instead of writing code like this:

In [None]:
if a in [10, 20, 30]:

or even:

In [None]:
if a in (10, 20, 30):

We don't have to scan through the data structure, requiring O(n) time

It is much faster to just do a lookup in a set:

In [None]:
if a in {10, 20, 30}:

The trade off is a higher storage cost

#### Some Timings

In [1]:
n = 10_000_000
s = set(range(n))
l = list(range(n))
t = tuple(range(n))

def test_set(s, value):
    return value in s

def test_list(l, value):
    return value in l

def test_tuple(t, value):
    return value in t

In [2]:
from timeit import timeit

In [5]:
value = 100
print(timeit('test_set(s, value)', globals=globals(), number=10_000))
print(timeit('test_list(l, value)', globals=globals(), number=10_000))
print(timeit('test_tuple(t, value)', globals=globals(), number=10_000))

0.0008760000000620494
0.007611500000166416
0.007470200000170735


In [6]:
value = 9_999_999
print(timeit('test_set(s, value)', globals=globals(), number=10_000))
print(timeit('test_list(l, value)', globals=globals(), number=10_000))
print(timeit('test_tuple(t, value)', globals=globals(), number=10_000))

0.0010180999997828621
676.8737766999993
681.7997606999998


Dayum, good thing I understand algorithms and data structures already