# Chapter 3. Dictionaries and Sets
---

## ToC


1. [Set Theory](#set-theory)  
    1.1. [Set Literals](#set-literals)  
    1.2. [Set Comprehensions](#set-comprehensions)  
2. [Practical Consequences of How Sets Work](#practical-consequences-of-how-sets-work)  
    2.1. [Set Operations](#set-operations)
3. [Set Operations on dict Views](#set-operations-on-dict-views)
---

## Set Theory

A set is a collection of unique objects. A basic use case is removing duplication:

In [3]:
l = ['spam', 'spam', 'eggs', 'spam', 'bacon', 'eggs']
set(l)

{'bacon', 'eggs', 'spam'}

In [3]:
list(set(l))

['eggs', 'spam', 'bacon']

![Figure 41](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/41.PNG)

Set elements must be hashable. The `set` type is not hashable, so you can’t build a `set` with nested `set` instances. But `frozenset` is hashable, so you can have `frozenset` elements inside a `set`.

In [11]:
dict.fromkeys(l)

{'spam': None, 'eggs': None, 'bacon': None}

In [12]:
dict.fromkeys(l).keys()

dict_keys(['spam', 'eggs', 'bacon'])

In [13]:
list(dict.fromkeys(l).keys())

['spam', 'eggs', 'bacon']

#### Set Operations

- `a | b`: union
- `a & b`: intersection
- `a - b`: difference
- `a ^ b`: symmetric difference

#### Methods for finding needle in a haystack

For example, imagine you have a large set of email addresses (the `haystack`) and a
smaller set of addresses (the `needles`) and you need to count how many `needles`
occur in the `haystack`.

In [21]:
# Method 1: Requires both needles and haystack objects to be sets
needles = {1, 2, 3, 4}
haystack = {2, 4, 6, 3, 1, 3, 4}
found = len(needles & haystack)
found

4

In [22]:
# Method 2: Works on any object, but takes longer
found = 0
for n in needles:
    if n in haystack:
        found += 1
found

4

In [23]:
# Method 3: Build sets on the fly
found = len(set(needles) & set(haystack))
found

4

In [None]:
# Method 4: When one object is set, and another isn't
found = len(set(needles).intersection(haystack))
found

4

### Set Literals

![Figure 42](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/42.PNG)

In [25]:
s = {1}
type(s)

set

In [26]:
s

{1}

In [27]:
s.pop()

1

In [28]:
s

set()

In [30]:
d = {}
type(d)

dict

Literal set syntax like `{1, 2, 3}` is both faster and more readable than calling the
constructor (e.g., `set([1, 2, 3])`). The latter form is slower because, to evaluate it,
Python has to look up the set name to fetch the constructor, then build a list, and
finally pass it to the constructor. In contrast, to process a literal like `{1, 2, 3}`,
Python runs a specialized `BUILD_SET` bytecode.

There is no special syntax to represent `frozenset` literals—they must be created by
calling the constructor:

In [31]:
frozenset(range(10))

frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9})

### Set Comprehensions

The idea of listcomps was adapted to build sets as well.

**Example:** Build a set of Latin-1 characters that have the word “SIGN” in their
Unicode names

In [5]:
from unicodedata import name
{name(chr(i),'') for i in range(31,35)}

{'', 'EXCLAMATION MARK', 'QUOTATION MARK', 'SPACE'}

In [7]:
{chr(i) for i in range(32, 256) if 'SIGN' in name(chr(i),'')}

{'#',
 '$',
 '%',
 '+',
 '<',
 '=',
 '>',
 '¢',
 '£',
 '¤',
 '¥',
 '§',
 '©',
 '¬',
 '®',
 '°',
 '±',
 'µ',
 '¶',
 '×',
 '÷'}

## Practical Consequences of How Sets Work

The `set` and `frozenset` types are both implemented with a hash table. This has these effects:

- Set elements must be hashable objects. They must implement proper `__hash__` and `__eq__` methods.
- Membership testing is very efficient.
- Sets have a significant memory overhead, compared to a low-level array pointers to its elements—which would be more compact but also much slower to search beyond a handful of elements.
- Element ordering depends on insertion order, but not in a useful or reliable way. If two elements are different but have the same hash code, their position depends on which element is added first.

More info on [Internals of sets and dicts](https://fpy.li/hashint)

### Set Operations

Note that some operators and methods perform in-place changes on the target set (e.g., `&=`, `difference_update`, etc.). Such operations make no sense in the ideal world of mathematical sets, and are not implemented in frozenset

![Figure 43](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/43.PNG)

![Figure 44](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/44.PNG)

![Figure 45](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/45.PNG)

![Figure 46](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/46.PNG)

![Figure 47](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/47.PNG)

## Set Operations on dict Views

![Figure 48](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/48.PNG)

In [8]:
d1 = dict(a=1, b=2, c=3, d=4)
d2 = dict(b=20, d=40, e=50)
d1.keys() & d2.keys()

{'b', 'd'}

Note that the return value of `&` is a `set`. Even better: the set operators in dictionary
views are compatible with `set` instances:

In [9]:
s = {'a', 'e', 'i'}
d1.keys() & s

{'a'}

In [10]:
d1.keys() | s

{'a', 'b', 'c', 'd', 'e', 'i'}

![Figure 49](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/49.PNG)

In [11]:
# Hashable values (strings)
dict1 = {"a": "apple", "b": "banana"}
items1 = dict1.items()

# Works fine as a set
print("items1 as set:", set(items1))

# Unhashable values (lists)
dict2 = {"a": [1, 2], "b": [3, 4]}
items2 = dict2.items()

try:
    print("items2 as set:", set(items2))
except TypeError as e:
    print("Error with items2:", e)

# Now demonstrating dict_keys which always work
keys1 = dict1.keys()
keys2 = dict2.keys()

print("keys1 as set:", set(keys1))
print("keys2 as set:", set(keys2))  # no error, because all keys must be hashable


items1 as set: {('b', 'banana'), ('a', 'apple')}
Error with items2: unhashable type: 'list'
keys1 as set: {'b', 'a'}
keys2 as set: {'b', 'a'}
