### Python Sets (advanced, but not too much)

Python sets model mathematical sets: **unordered**, **unique**, **hashable** elements.
- Fast membership tests (`O(1)` average)
- No duplicates
- Mutability: `set` is mutable; `frozenset` is immutable & hashable

#### Creating Sets

In [1]:
s1 = {'a', 'b', 'c'}              # literal
s2 = set(['a', 'b', 'c'])         # from iterable
s3 = set('python')                # from string
empty = set()                     # empty set ({} would be an empty dict)
s1, s2, s3, empty

({'a', 'b', 'c'}, {'a', 'b', 'c'}, {'h', 'n', 'o', 'p', 't', 'y'}, set())

#### Uniqueness & Hashability

In [2]:
set(['a', 'a', 'b', 'b'])  # duplicates removed
                            # set elements must be hashable:
try:
    set([[1,2], [1,2]])    # lists are unhashable -> TypeError
except TypeError as e:
    print('Error:', e)

fs = {frozenset({1,2}), frozenset({1,2})}  # OK: frozenset is hashable
fs

Error: unhashable type: 'list'


{frozenset({1, 2})}

#### Membership, Iteration, Size (order is arbitrary)

In [3]:
s = set('python')
'p' in s, 'x' not in s, len(s)
for item in s:
    print(item)  # iteration order is not guaranteed

p
n
o
h
y
t


#### Basic Mutating Operations (`add`, `update`, `remove`, `discard`, `pop`, `clear`)

In [4]:
s = {'a', 'b'}
s.add('c')                    # add single element
s.update(['c', 'd'])          # add many (any iterable)
print('after add/update:', s)

s.discard('x')                # safe (no error if missing)
try:
    s.remove('x')             # KeyError if missing
except KeyError:
    print('remove failed on missing key')

_ = s.pop()                   # remove and return an arbitrary element
print('after pop():', s)
s.clear(); s

after add/update: {'b', 'a', 'd', 'c'}
remove failed on missing key
after pop(): {'a', 'd', 'c'}


set()

#### Set Algebra (union, intersection, difference, symmetric difference)

In [5]:
a = set('abcde')
b = set('bdxyz')

a_union_b        = a | b          # union
a_inter_b        = a & b          # intersection
a_minus_b        = a - b          # elements in a not in b
sym_diff         = a ^ b          # elements in exactly one set

a_union_b, a_inter_b, a_minus_b, sym_diff

({'a', 'b', 'c', 'd', 'e', 'x', 'y', 'z'},
 {'b', 'd'},
 {'a', 'c', 'e'},
 {'a', 'c', 'e', 'x', 'y', 'z'})

#### Relational Tests (`issubset`, `issuperset`, `isdisjoint`)

In [6]:
digits   = set('0123456789')
even     = {'0','2','4','6','8'}
print(even.issubset(digits), digits.issuperset(even))
vowels   = set('aeiou')
print(vowels.isdisjoint(set('xyz')))  # True (no overlap)

True True
True


#### Practical Patterns

**1) Deduplicate while preserving original order** (use a seen set)

In [7]:
def dedupe_preserve_order(seq):
    seen = set()
    out = []
    for x in seq:
        if x not in seen:
            seen.add(x)
            out.append(x)
    return out

dedupe_preserve_order([3,1,2,3,2,1,4])

[3, 1, 2, 4]

**2) Fast filtering / membership tests**

In [8]:
allowed = {'jpg','png','gif'}
files = ['a.jpg','b.txt','c.png','d.docx','e.gif']
[f for f in files if f.rsplit('.',1)[-1] in allowed]

['a.jpg', 'c.png', 'e.gif']

**3) Set comprehension** (readable filtering & transformation)

In [9]:
words = {'Alpha', 'beta', 'GAMMA', 'Beta'}
normalized = {w.lower() for w in words}   # dedupe, normalize case
normalized

{'alpha', 'beta', 'gamma'}

**4) Unique characters / tokens**

In [10]:
text = 'bananas are bananas'
unique_chars  = set(text)
unique_words  = set(text.split())
unique_chars, unique_words

({' ', 'a', 'b', 'e', 'n', 'r', 's'}, {'are', 'bananas'})

#### Immutable Sets: `frozenset` (hashable; can be dict/set keys)

In [11]:
fs = frozenset({'read','write'})
permissions_map = {fs: 'rw'}   # OK since frozenset is hashable
permissions_map[fs]

'rw'

#### Pitfalls & Notes
- Set iteration order is **arbitrary** and can change between runs.
- Only hashable elements are allowed.
- Use `discard` instead of `remove` if you don't want errors on missing items.
- For **stable ordering**, convert to a list and sort: `sorted(s)`.

#### Example: Simple Venn-like Summary

In [12]:
A = set(range(0, 10))
B = set(range(5, 15))
summary = {
    'A_only': sorted(A - B),
    'B_only': sorted(B - A),
    'intersection': sorted(A & B),
    'union_size': len(A | B)
}
summary

{'A_only': [0, 1, 2, 3, 4],
 'B_only': [10, 11, 12, 13, 14],
 'intersection': [5, 6, 7, 8, 9],
 'union_size': 15}