# Python Fundamentals: Sets - Unique Unordered Collections

## Introduction

A **set** is a mutable, unordered collection of unique, immutable elements. Think of it like a mathematical set – duplicates are automatically removed, and the order of elements is not guaranteed (though iteration order might appear consistent in some CPython versions, it shouldn't be relied upon).

**Key Characteristics:**
*   **Unordered:** Elements have no fixed position or index.
*   **Unique Elements:** Duplicates are automatically discarded.
*   **Mutable Set:** Sets themselves can be modified (elements added or removed).
*   **Immutable Elements:** Elements within a set *must* be immutable (hashable) types (e.g., numbers, strings, tuples of immutables).
*   **No Indexing:** Elements cannot be accessed using index numbers.

## Real-World Analogies & Use Cases

*   **Unique Items:** Collecting unique usernames, email addresses, or product IDs from a list.
*   **Membership Testing:** Quickly checking if an item exists in a large collection (e.g., "Is this user ID already processed?"). Sets offer very fast O(1) average time complexity for this.
*   **Mathematical Set Operations:** Finding intersections (common elements), unions (all unique elements), differences (elements in one set but not another).
*   **Removing Duplicates:** The easiest way to remove duplicates from a list is often to convert it to a set and back: `unique_list = list(set(original_list))` (Note: this loses original order).

## 1. Explain & Demonstrate: Creating Sets

Sets are created using curly braces `{}` with comma-separated elements, or the `set()` constructor. 

**Important:** Using empty curly braces `{}` creates an *empty dictionary*, not an empty set. Use `set()` for an empty set.

In [1]:
from typing import Set, Any, Tuple, FrozenSet

# Using curly braces (literal syntax)
# Duplicates are automatically removed, boolean True/1 and False/0 are treated as duplicates.
unique_tags: Set[str] = {"python", "data", "web", "python", "api"}
print(f"Literal syntax (duplicates removed): {unique_tags}")
print(f"Type: {type(unique_tags)}\n")

mixed_set: Set[Any] = {1, "hello", 3.14, True, 0, False, (1, 2)}
# Note: True == 1, False == 0. Only one of each pair is kept.
# Order is NOT guaranteed.
print(f"Mixed types (duplicates, bool/int treated same): {mixed_set}\n")

# Using the set() constructor with an iterable (e.g., list, string)
numbers_list = [1, 2, 2, 3, 4, 3, 5]
set_from_list: Set[int] = set(numbers_list)
print(f"Set from list: {set_from_list}")

set_from_string: Set[str] = set("hello world") # Creates set of unique characters
print(f"Set from string: {set_from_string}\n")

# --- Pitfall Alert: Empty Set Creation ---
empty_dict = {}
empty_set = set()
print(f"Empty dict: {empty_dict}, Type: {type(empty_dict)}")
print(f"Empty set: {empty_set}, Type: {type(empty_set)}")

Literal syntax (duplicates removed): {'python', 'api', 'data', 'web'}
Type: <class 'set'>

Mixed types (duplicates, bool/int treated same): {0, 1, (1, 2), 3.14, 'hello'}

Set from list: {1, 2, 3, 4, 5}
Set from string: {'o', 'r', 'l', 'w', 'e', 'd', ' ', 'h'}

Empty dict: {}, Type: <class 'dict'>
Empty set: set(), Type: <class 'set'>


## 2. Explain & Demonstrate: Modifying Sets

Sets are mutable, allowing elements to be added or removed.

In [2]:
active_users: Set[str] = {"alice", "bob"}
print(f"Initial set: {active_users}\n")

# --- Adding Elements ---

# .add(element): Adds a single element. Does nothing if element is already present.
active_users.add("charlie")
active_users.add("alice") # Adding existing element has no effect
print(f"After adding 'charlie' and 'alice': {active_users}")

# Pitfall: Adding mutable elements raises TypeError
try:
    active_users.add(["david", "eve"]) # Cannot add a list
except TypeError as e:
    print(f"Error adding mutable list: {e}")
# Can add immutable elements like tuples
active_users.add(("group", 1))
print(f"After adding tuple: {active_users}\n")

# .update(iterable): Adds all elements from an iterable (list, tuple, set, string).
new_users = ["frank", "grace", "bob"] # List with duplicates
inactive_set = {"heidi"}
active_users.update(new_users, inactive_set, "ivan") # Can take multiple iterables
print(f"After update with list, set, string: {active_users}\n")

# --- Removing Elements ---

# .remove(element): Removes the element. Raises KeyError if the element is not found.
active_users.remove("bob")
print(f"After removing 'bob': {active_users}")
try:
    active_users.remove("zelda") # 'zelda' is not in the set
except KeyError as e:
    print(f"Error removing non-existent 'zelda' with remove(): {e}\n")

# .discard(element): Removes the element if present. Does *nothing* if the element is not found (safer).
active_users.discard("alice")
active_users.discard("zelda") # No error for non-existent element
print(f"After discarding 'alice' and 'zelda': {active_users}\n")

# .pop(): Removes and returns an *arbitrary* element from the set.
# Raises KeyError if the set is empty. Order is not guaranteed.
if active_users:
    removed_user = active_users.pop()
    print(f"After pop() (removed '{removed_user}'): {active_users}\n")
else:
    print("Set is empty, cannot pop.")

# .clear(): Removes all elements from the set.
active_users.clear()
print(f"After clear(): {active_users}")

Initial set: {'bob', 'alice'}

After adding 'charlie' and 'alice': {'charlie', 'bob', 'alice'}
Error adding mutable list: unhashable type: 'list'
After adding tuple: {'charlie', 'bob', ('group', 1), 'alice'}

After update with list, set, string: {'charlie', 'heidi', 'bob', 'alice', 'n', 'i', ('group', 1), 'frank', 'v', 'grace', 'a'}

After removing 'bob': {'charlie', 'heidi', 'alice', 'n', 'i', ('group', 1), 'frank', 'v', 'grace', 'a'}
Error removing non-existent 'zelda' with remove(): 'zelda'

After discarding 'alice' and 'zelda': {'charlie', 'heidi', 'n', 'i', ('group', 1), 'frank', 'v', 'grace', 'a'}

After pop() (removed 'charlie'): {'heidi', 'n', 'i', ('group', 1), 'frank', 'v', 'grace', 'a'}

After clear(): set()


## 3. Demonstrate: Set Operations (Mathematical)

Sets support standard mathematical operations like union, intersection, difference, etc. These can be performed using methods or operators.

In [3]:
set_a: Set[int] = {1, 2, 3, 4, 5}
set_b: Set[int] = {4, 5, 6, 7, 8}
set_c: Set[int] = {1, 2}

print(f"Set A: {set_a}")
print(f"Set B: {set_b}")
print(f"Set C: {set_c}\n")

# --- Union (Elements in either set) ---
# Method: .union() - Returns a new set
union_ab_method: Set[int] = set_a.union(set_b)
print(f"A union B (method): {union_ab_method}")
# Operator: | - Returns a new set
union_ab_operator: Set[int] = set_a | set_b
print(f"A union B (operator): {union_ab_operator}\n")

# --- Intersection (Elements in both sets) ---
# Method: .intersection() - Returns a new set
intersect_ab_method: Set[int] = set_a.intersection(set_b)
print(f"A intersection B (method): {intersect_ab_method}")
# Operator: & - Returns a new set
intersect_ab_operator: Set[int] = set_a & set_b
print(f"A intersection B (operator): {intersect_ab_operator}\n")

# --- Difference (Elements in first set, but not in second) ---
# Method: .difference() - Returns a new set (Order matters!)
diff_ab_method: Set[int] = set_a.difference(set_b) # A - B
diff_ba_method: Set[int] = set_b.difference(set_a) # B - A
print(f"A difference B (method): {diff_ab_method}")
print(f"B difference A (method): {diff_ba_method}")
# Operator: - - Returns a new set
diff_ab_operator: Set[int] = set_a - set_b
diff_ba_operator: Set[int] = set_b - set_a
print(f"A difference B (operator): {diff_ab_operator}")
print(f"B difference A (operator): {diff_ba_operator}\n")

# --- Symmetric Difference (Elements in either set, but not both) ---
# Method: .symmetric_difference() - Returns a new set
sym_diff_ab_method: Set[int] = set_a.symmetric_difference(set_b)
print(f"A symmetric_difference B (method): {sym_diff_ab_method}")
# Operator: ^ - Returns a new set
sym_diff_ab_operator: Set[int] = set_a ^ set_b
print(f"A symmetric_difference B (operator): {sym_diff_ab_operator}\n")

# --- Subset and Superset ---
# Method: .issubset(), .issuperset()
print(f"Is C subset of A? {set_c.issubset(set_a)}")     # True
print(f"Is A subset of C? {set_a.issubset(set_c)}")     # False
print(f"Is A superset of C? {set_a.issuperset(set_c)}") # True
# Operator: <= (subset), >= (superset), < (proper subset), > (proper superset)
print(f"Is C <= A? {set_c <= set_a}") # True
print(f"Is A >= C? {set_a >= set_c}") # True
print(f"Is C < A? {set_c < set_a}")   # True (C is subset, but not equal to A)
print(f"Is A < A? {set_a < set_a}")   # False (A is not a *proper* subset of itself)
print(f"Is A <= A? {set_a <= set_a}") # True (A is a subset of itself)\n")

# --- Disjoint (No common elements) ---
# Method: .isdisjoint()
set_d: Set[int] = {10, 11}
print(f"Are A and B disjoint? {set_a.isdisjoint(set_b)}") # False (share 4, 5)
print(f"Are A and D disjoint? {set_a.isdisjoint(set_d)}") # True

Set A: {1, 2, 3, 4, 5}
Set B: {4, 5, 6, 7, 8}
Set C: {1, 2}

A union B (method): {1, 2, 3, 4, 5, 6, 7, 8}
A union B (operator): {1, 2, 3, 4, 5, 6, 7, 8}

A intersection B (method): {4, 5}
A intersection B (operator): {4, 5}

A difference B (method): {1, 2, 3}
B difference A (method): {8, 6, 7}
A difference B (operator): {1, 2, 3}
B difference A (operator): {8, 6, 7}

A symmetric_difference B (method): {1, 2, 3, 6, 7, 8}
A symmetric_difference B (operator): {1, 2, 3, 6, 7, 8}

Is C subset of A? True
Is A subset of C? False
Is A superset of C? True
Is C <= A? True
Is A >= C? True
Is C < A? True
Is A < A? False
Is A <= A? True
Are A and B disjoint? False
Are A and D disjoint? True


## 4. Demonstrate: In-Place Set Operations (Update Methods)

Similar to the mathematical operations, there are methods that modify the set *in-place* instead of returning a new set. Their names often include `_update`.

In [4]:
set_x: Set[int] = {1, 2, 3}
set_y: Set[int] = {3, 4, 5}
print(f"Initial Set X: {set_x}")
print(f"Initial Set Y: {set_y}\n")

# --- Update (In-place Union) ---
# Method: .update() (also used for adding from any iterable)
# set_x.update(set_y)
# Operator: |=
set_x |= set_y
print(f"X after |= Y (Union Update): {set_x}") # set_x is now {1, 2, 3, 4, 5}

# Reset X for next demo
set_x = {1, 2, 3}

# --- Intersection Update (In-place Intersection) ---
# Method: .intersection_update()
# set_x.intersection_update(set_y)
# Operator: &=
set_x &= set_y
print(f"X after &= Y (Intersection Update): {set_x}") # set_x is now {3}

# Reset X for next demo
set_x = {1, 2, 3}

# --- Difference Update (In-place Difference) ---
# Method: .difference_update()
# set_x.difference_update(set_y) # X - Y
# Operator: -=
set_x -= set_y
print(f"X after -= Y (Difference Update): {set_x}") # set_x is now {1, 2}

# Reset X for next demo
set_x = {1, 2, 3}

# --- Symmetric Difference Update (In-place Symmetric Difference) ---
# Method: .symmetric_difference_update()
# set_x.symmetric_difference_update(set_y)
# Operator: ^=
set_x ^= set_y
print(f"X after ^= Y (Symmetric Difference Update): {set_x}") # set_x is now {1, 2, 4, 5}

Initial Set X: {1, 2, 3}
Initial Set Y: {3, 4, 5}

X after |= Y (Union Update): {1, 2, 3, 4, 5}
X after &= Y (Intersection Update): {3}
X after -= Y (Difference Update): {1, 2}
X after ^= Y (Symmetric Difference Update): {1, 2, 4, 5}


## 5. Apply: Iteration and Membership Testing

Sets are iterable, and checking for membership is highly efficient.

In [5]:
permissions: Set[str] = {"read", "write", "admin", "guest"}

# --- Iteration ---
# Note: Order is NOT guaranteed!
print("Available permissions (iteration order not guaranteed):")
for perm in permissions:
    print(f"- {perm}")
print("\n")

# --- Membership Testing (Fast!) ---
user_has_admin = "admin" in permissions
user_has_delete = "delete" in permissions

print(f"Does user have 'admin' permission? {user_has_admin}")
print(f"Does user have 'delete' permission? {user_has_delete}\n")

# Example: Checking required permissions
required_perms = {"read", "write"}
if required_perms.issubset(permissions):
    print("User has required read/write permissions.")
else:
    missing = required_perms - permissions
    print(f"User is missing required permissions: {missing}")

Available permissions (iteration order not guaranteed):
- read
- guest
- admin
- write


Does user have 'admin' permission? True
Does user have 'delete' permission? False

User has required read/write permissions.


## Performance Considerations

*   **Membership Testing:** Checking if an element is in a set (`element in my_set`) has an average time complexity of **O(1)**, thanks to hashing. This is significantly faster than searching in a list (O(n)).
*   **Add/Remove:** Adding and removing elements also average O(1).
*   **Set Operations:** Operations like union, intersection, and difference typically have a time complexity related to the sizes of the sets involved (e.g., O(len(s) + len(t)) for union, O(min(len(s), len(t))) for intersection in many cases).
*   **Memory:** Like dictionaries, sets have memory overhead due to the underlying hash table.

## Best Practices & Enterprise Context

*   **Use Sets for Uniqueness:** When you need to store a collection of items where duplicates are irrelevant or must be prevented, use a set.
*   **Fast Membership Checks:** If your code frequently needs to check for the existence of an item in a large collection, a set is often the most performant choice.
*   **Removing Duplicates:** `list(set(my_list))` is a common idiom, but remember it doesn't preserve order. For order-preserving unique items, other methods (like iterating and adding to a new list if not seen) are needed.
*   **Use `.discard()` for Safe Removal:** Prefer `.discard(x)` over `.remove(x)` if you don't want a `KeyError` when attempting to remove an element that might not be present.
*   **Understand Immutability of Elements:** You cannot put mutable items like lists or dictionaries into a set. Use tuples (of immutables) if you need compound elements.
*   **Set Operations:** Leverage built-in set operations for efficient comparisons and combinations of collections (e.g., finding common users between two groups).

## Common Pitfalls & Interview Questions

*   **Pitfall: Unordered Nature:** Do not rely on sets preserving insertion order, even if it sometimes appears to happen in specific Python versions. If order matters, use a list or other ordered sequence.
*   **Pitfall: Mutable Elements:** Trying to add a list or dictionary to a set will raise a `TypeError`. Elements must be hashable.
*   **Pitfall: Creating Empty Sets:** Using `{}` creates an empty dictionary, not a set. Use `set()`.
*   **Pitfall: `remove()` vs `discard()`:** Forgetting that `remove()` raises `KeyError` if the element is absent, while `discard()` does not.
*   **Pitfall: `pop()` Behavior:** Remembering that `pop()` removes an *arbitrary* element, not necessarily the first or last.

*   **Interview Question:** "What are the main characteristics of a Python set?"
    *   *Answer:* Unordered, unique immutable elements, mutable set itself, fast membership testing (O(1) average).
*   **Interview Question:** "When would you use a set instead of a list?"
    *   *Answer:* When uniqueness is required, when order doesn't matter, and especially when frequent membership testing is needed.
*   **Interview Question:** "How do you remove duplicates from a list while preserving order?"
    *   *Answer:* `list(set(my_list))` removes duplicates but *loses* order. To preserve order: iterate through the list, adding elements to a new list only if they haven't been seen (using a set to track seen elements efficiently): `seen = set(); result = []; for item in my_list: if item not in seen: seen.add(item); result.append(item)`.
*   **Interview Question:** "What's the difference between `set.remove(x)` and `set.discard(x)`?"
    *   *Answer:* `remove(x)` raises `KeyError` if `x` is not in the set, `discard(x)` does nothing if `x` is not in the set.
*   **Interview Question:** "Can you have a set of lists? Why or why not?"
    *   *Answer:* No, because lists are mutable and therefore not hashable. Set elements must be immutable.
*   **Interview Question:** "What is the time complexity for checking if an element is in a set?"
    *   *Answer:* Average time complexity is O(1).

## Advanced Topic: `frozenset`

Python also provides `frozenset`, which is an **immutable** version of a set. Once created, a `frozenset` cannot be changed (no adding or removing elements).

**Why use `frozenset`?**
*   Because they are immutable and hashable, `frozenset` objects *can* be used as elements in other sets or as keys in dictionaries.
*   Useful for representing fixed collections of unique items.

In [6]:
# Create a frozenset
frozen_perms: FrozenSet[str] = frozenset(["read", "execute", "read"])
print(f"Frozenset: {frozen_perms}")
print(f"Type: {type(frozen_perms)}\n")

# Methods like .add(), .remove(), .update(), .pop() etc. DO NOT EXIST
# try: frozen_perms.add("write") except AttributeError as e: print(e)

# Mathematical operations still work (return new frozensets or sets)
other_frozen = frozenset(["read", "write"])
union_frozen = frozen_perms.union(other_frozen)
intersection_frozen = frozen_perms.intersection(other_frozen)
print(f"Union of frozensets: {union_frozen} (Type: {type(union_frozen)})")
print(f"Intersection of frozensets: {intersection_frozen} (Type: {type(intersection_frozen)})\n")

# Using frozenset as a dictionary key
permission_groups: Dict[FrozenSet[str], str] = {
    frozenset(["read"]): "Guest Access",
    frozenset(["read", "write"]): "Editor Access",
    frozen_perms: "Execute Access" # Using the one created earlier
}
print(f"Dictionary with frozenset keys: {permission_groups}")
print(f"Accessing via frozenset key: {permission_groups[frozenset(['read', 'write'])]}")

Frozenset: frozenset({'read', 'execute'})
Type: <class 'frozenset'>

Union of frozensets: frozenset({'read', 'write', 'execute'}) (Type: <class 'frozenset'>)
Intersection of frozensets: frozenset({'read'}) (Type: <class 'frozenset'>)



NameError: name 'Dict' is not defined

## 6. Challenge: Finding Common and Unique Elements

You have two lists representing features supported by two different product versions.

1.  Write a function `compare_features` that takes two lists of features (strings) as input.
2.  Convert these lists into sets to easily manage uniqueness.
3.  The function should determine and return three things:
    *   A set of features common to both versions (intersection).
    *   A set of features unique to the first version (difference).
    *   A set of features unique to the second version (difference).
4.  Return these three sets, perhaps as a tuple.

In [None]:
from typing import List, Set, Tuple

FeatureSet = Set[str]
ComparisonResult = Tuple[FeatureSet, FeatureSet, FeatureSet]

def compare_features(version1_features: List[str], version2_features: List[str]) -> ComparisonResult:
    """Compares feature lists from two versions.

    Args:
        version1_features: List of feature names for version 1.
        version2_features: List of feature names for version 2.

    Returns:
        A tuple containing:
        - Set of common features.
        - Set of features unique to version 1.
        - Set of features unique to version 2.
    """
    set_v1: FeatureSet = set(version1_features)
    set_v2: FeatureSet = set(version2_features)

    common_features: FeatureSet = set_v1.intersection(set_v2) # Or set_v1 & set_v2
    unique_to_v1: FeatureSet = set_v1.difference(set_v2)   # Or set_v1 - set_v2
    unique_to_v2: FeatureSet = set_v2.difference(set_v1)   # Or set_v2 - set_v1

    return common_features, unique_to_v1, unique_to_v2

# --- Test the function ---
features_v1 = ["login", "search", "profile", "upload", "report", "search"]
features_v2 = ["login", "search", "dashboard", "profile", "export", "login"]

common, unique1, unique2 = compare_features(features_v1, features_v2)

print(f"Version 1 Features (raw): {features_v1}")
print(f"Version 2 Features (raw): {features_v2}\n")

print(f"Common Features: {common}")
print(f"Unique to Version 1: {unique1}")
print(f"Unique to Version 2: {unique2}")

## Quiz

1.  What is the primary characteristic that distinguishes a set from a list?
    a) Sets are ordered.
    b) Sets allow duplicate elements.
    c) Sets contain only unique elements.
    d) Sets can only store strings.

2.  How do you create an empty set in Python?
    a) `my_set = {}`
    b) `my_set = []`
    c) `my_set = ()`
    d) `my_set = set()`

3.  Which operation finds elements present in EITHER set A OR set B (or both)?
    a) `A.intersection(B)`
    b) `A.union(B)`
    c) `A.difference(B)`
    d) `A.symmetric_difference(B)`

4.  If `my_set = {1, 2, 3}` and you call `my_set.discard(4)`, what happens?
    a) A `KeyError` is raised.
    b) The set remains unchanged (`{1, 2, 3}`).
    c) `4` is added to the set.
    d) The set becomes empty.

*(Answers: 1-c, 2-d, 3-b, 4-b)*

## Conclusion

Sets provide an efficient way to work with collections of **unique** items. Their strength lies in fast membership testing and the ability to perform standard mathematical set operations (union, intersection, difference). While their unordered nature and restriction to immutable elements are key considerations, sets are invaluable tools for tasks involving duplicate removal, data comparison, and existence checks in Python.