### Creating sets

In [1]:
# set literals
s = {'a', 100, (1,2)}

print(type(s))
type(s)

<class 'set'>


set

In [2]:
# empty set
s = set()

s

set()

In [3]:
# 'set()' and pass it an iterable
s1 = set([10, 20, 30])
s2 = set('abc')
s3 = set(range(6))

print(s1)
print(s2)
print(s3)

{10, 20, 30}
{'b', 'c', 'a'}
{0, 1, 2, 3, 4, 5}


In [4]:
# There's a restriction that an iterable has to consist only from hashable elements

# s = set([[1,2], [3,4]])  # TypeError: unhashable type: 'list'

In [5]:
# we can create a set from dictionary but it will contain only keys
d = {'a': 1, 'b': 2}
s = set(d)

s

{'a', 'b'}

In [6]:
# set comprehension
s = {c for c in 'abcd'}

s

{'a', 'b', 'c', 'd'}

In [7]:
# but in aforementioned case it is better
s = set('abcd')

s

{'a', 'b', 'c', 'd'}

In [8]:
# unpacking

s1 = {'a', 'b', 'c'}
s2 = {'c', 10, 20, 30}

s = {*s1, *s2}

s

{10, 20, 30, 'a', 'b', 'c'}

<br>

Example of using.<br>
We have a string, and we want to assign a score to the string based on how many distinct characters of the alphabet it uses.

In [9]:
def scorer(s):
    alphabet = set('abcdefghijklmnopqrstuvwxyz')
    s = s.lower()
    distinct = set(s)
    # we want to only count characters that are in our alphabet
    effective = distinct & alphabet
    return len(effective) / len(alphabet)

In [10]:
scorer('baa baa baa!!! 123')  # this string contains only 2 distinct characters of the alphabet

0.07692307692307693

In [11]:
# check
2 / 26

0.07692307692307693

In [12]:
scorer('the quick brown fox jumps over the lazy dog')

1.0

<br>

In [13]:
# Size of set is also called its cardinality
s = set('abc')

cardinality = len(s)
cardinality

3

<br>
<br>

### Some set operations

#### intersections

In [14]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}

s1.intersection(s2)

{2, 3}

In [15]:
s1 & s2

{2, 3}

#### unions

In [16]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}

s1.union(s2)

{1, 2, 3, 4}

In [17]:
s1 | s2

{1, 2, 3, 4}

##### Disjointedness

Two sets are disjoint if their intersection is empty:

In [18]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
s3 = {30, 40, 50}

print(s1.isdisjoint(s2))
print(s2.isdisjoint(s3))

False
True


Or we can use the cardinality of the intersection instead:

In [19]:
print(len(s1 & s2))
print(len(s1 & s3))

2
0


Or, since empty sets are falsy, we can use the associated truth value:

In [20]:
if {1, 2} & {2, 3}:
    print('sets are not disjoint')
    
if not {1, 2} & {3, 4}:
    print('sets are disjoint')

sets are not disjoint
sets are disjoint


##### Differences

In [21]:
s1 = {1, 2, 3, 4, 5}
s2 = {4, 5}

s1.difference(s2)

{1, 2, 3}

In [22]:
s1 - s2

{1, 2, 3}

Note that the difference operator is not commutative, i.e. it does not hold in general that
```
s1 - s2 = s2 - s1
```

In [23]:
s2 - s1

set()

##### Symmetric Difference

In [24]:
s1 = {1, 2, 3, 4, 5}
s2 = {4, 5, 6, 7, 8}

s1.symmetric_difference(s2)

{1, 2, 3, 6, 7, 8}

In [25]:
s1 ^ s2

{1, 2, 3, 6, 7, 8}

The symmetric difference of two sets results in the difference of the union and the intersection of the two sets:

In [26]:
(s1 | s2) - (s1 & s2)

{1, 2, 3, 6, 7, 8}

##### Subsets and Supersets

With containmnent we have the notion of proper containment (i.e strictly contained, not equal) and just containment (contained, possibly equal).<br>
This is analogous to the concept of (`i < j` and `i <= j`)

In [27]:
s1 = {1, 2, 3}
s2 = {1, 2, 3}
s3 = {1, 2, 3, 4}
s4 = {10, 20, 30}

In [28]:
print(s1.issubset(s2))
print(s1 <= s2)

True
True


In [29]:
# for strict containment there is no set method
print(s1 < s2)

False


In [30]:
# An analogous situation with supersets

print(s2.issuperset(s1))
print(s2 >= s1)

print(s2 > s1)

True
True
False


Be careful with these set containment operators, they do not work quite the same way as with numbers.

With numbers, if
```
a <= b --> False
```
then
```
a < b --> True
```

This is not the case with set containment:

In [31]:
s1 = {1, 2, 3}
s2 = {10, 20, 30}

print(s1 <= s2)
print(s1 > s2)

False
False


<br>
<br>

### Update operations

In [32]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
print(s1, id(s1))

s1 = s1 | s2
print(s1, id(s1))  # id is changed

{1, 2, 3} 140625029233128
{1, 2, 3, 4} 140625029230888


In [33]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
print(s1, id(s1))

s1 |= s2
print(s1, id(s1))  # id is the same

{1, 2, 3} 140625029233128
{1, 2, 3, 4} 140625029233128


All these operations **mutate** the original set:

* union updates: `s1.update(s2)` or `s1 |= s2`

* intersection updates: `s1.intersection_update(s2)` or `s1 &= s2`

* difference updates: `s1.difference_update(s2)` or `s1 -= s2`

* symm. diff. updates: `s1.symmetric_difference_update(s2)` or `s1 ^= s2`

In [34]:
# Be careful with differences:

s1 = {1, 2, 3, 4}
s2 = {2, 3}
s3 = {3, 4}

s1a = s1 - s2 - s3
print(s1a)

s1b = (s1 - s2) - s3
print(s1b)

s1c = s1 - (s2 - s3)
print(s1c)

s1 -= s2 - s3
print(s1)

{1}
{1}
{1, 3, 4}
{1, 3, 4}


<br>
<br>

### Frozen sets

`frozenset` is the **immutable** equivalent of the plain `set`.

Apart from the fact that you cannot mutate the collection (i.e. add or remove elements), the interesting thing is that frozen sets are hashable (as long as each contained element is also hashable).

This means that whereas we cannot create a set of sets, we can create a set of frozen sets. We also can use frozen sets as dictionary keys.

There is no literal for frozen sets - we have to use the `frozenset()` callable.

In [35]:
s1 = {'a', 'b', 'c'}  # ordinary set for comparison

print(s1)
print(type(s1))

# hash(s1)  # TypeError: unhashable type: 'set'

{'b', 'c', 'a'}
<class 'set'>


In [36]:
fs1 = frozenset('abc')

print(fs1)
print(type(fs1))

hash(fs1)

frozenset({'b', 'c', 'a'})
<class 'frozenset'>


-8195199803699504459

All the non-mutating set operations are also applied to frozen sets.

We can mix sets and frozen sets when performing these operations. The type of result depend on the order of sets.

In [37]:
s1 = {1, 2}
fs1 = frozenset('ab')

x = s1 | fs1
y = fs1 | s1

print(x, type(x))
print(y, type(y))

{1, 2, 'b', 'a'} <class 'set'>
frozenset({'b', 2, 1, 'a'}) <class 'frozenset'>


The same with other similar operations (`&`, `-`, `^`).

<br>

Example of using - memoization.

Recall that memoization is basically a technique to cache the results of a (deterministic) function call based on the provided arguments. A cache is created that contains the results of calling the function with a particular set of arguments, the next time the function is called, the arguments are checked against the cache - if the arguments exist in the cache, then the cached value is returned instead of re-executing the function.

Although Python's `functools` has the `lru_cache` decorator available, there is one drawback - the order of the keyword arguments matters.

Let's see this:

In [38]:
from functools import lru_cache

In [39]:
@lru_cache()
def foo(*, a, b):
    print('calculating a+b...')
    return a + b

In [40]:
foo(a=1, b=2)

calculating a+b...


3

In [41]:
foo(a=1, b=2)

3

In [42]:
# but if we change the order of the keyword arguments
foo(b=2, a=1)

calculating a+b...


3

In [43]:
# create our own decorator
def memoizer(fn):
    cache = {}
    
    def inner(*args, **kwargs):
        key = (*args, frozenset(kwargs.items()))
        if key not in cache:
            result = fn(*args, **kwargs)
            cache[key] = result
        return cache[key]
    return inner

In [44]:
@memoizer
def goo(*, a, b):
    print('calculating a + b...')
    return a + b

In [45]:
goo(a=1, b=2)

calculating a + b...


3

In [46]:
goo(a=1, b=2)

3

In [47]:
# now change the order of the keyword arguments
goo(b=2, a=1)

3

<br>

In [48]:
# modification
def memoizer2(fn):
    cache = {}
    
    def inner(*args, **kwargs):
        key = frozenset(args) | frozenset(kwargs.items())
        if key not in cache:
            result = fn(*args, **kwargs)
            cache[key] = result
        return cache[key]
    return inner

In [49]:
@memoizer2
def adder(*args):
    print('calculating...')
    return sum(args)

In [50]:
adder(1, 2, 3)

calculating...


6

In [51]:
adder(3, 1, 2)

6

<br>
<br>

### Views: Keys, Values and Items

In [52]:
# we cannot mutate a dictionary's keys while iterating over it

# dd = {'a': 1, 'b': 2, 'c': 3}
#
# for k, v in dd.items():
#     print(k, v)
#     del dd[k]        # RuntimeError: dictionary changed size during iteration

One way to solve this is to create a static list or tuple of all the keys, and iterate over that instead:

In [53]:
dd = {'a': 1, 'b': 2, 'c': 3}

keys = tuple(dd.keys())
print(keys, '\n')

for k in keys:
    print(k, dd[k])
    del dd[k]
    
dd

('a', 'b', 'c') 

a 1
b 2
c 3


{}

In [54]:
# more better
dd = {'a': 1, 'b': 2, 'c': 3}

for k in list(dd.keys()):
    v = dd.pop(k)
    print(k, v)
    
dd

a 1
b 2
c 3


{}

In [55]:
# or
dd = {'a': 1, 'b': 2, 'c': 3}

for _ in range(len(dd)):
    k, v = dd.popitem()
    print(k, v)
    
dd

c 3
b 2
a 1


{}

In [56]:
# or
dd = {'a': 1, 'b': 2, 'c': 3}

while len(dd) > 0:
    k, v = dd.popitem()
    print(k, v)
    
dd

c 3
b 2
a 1


{}

In [57]:
# or we can keep iterating indefinitely until a KeyError exception occurs
dd = {'a': 1, 'b': 2, 'c': 3}

while True:
    try:
        k, v = dd.popitem()
    except KeyError:
        break
    else:
        print(k, v)
        
dd

c 3
b 2
a 1


{}