# 🟢 7c. Data Structures: Sets

**Goal:** Learn to use sets for storing unique items and performing mathematical set operations.

A **set** is an **unordered**, mutable collection of **unique** elements. Their main advantages are very fast membership testing (checking if an element is in a set) and the ability to perform mathematical set operations like union, intersection, and difference.

This notebook covers:
1.  **Creating Sets.**
2.  **Adding and Removing Elements.**
3.  **Set Operations.**
4.  **When to Use a Set.**

### 1. Creating Sets
You can create a set from a list or by using curly braces `{}`. Note that duplicates are automatically removed.

In [1]:
# Create a set from a list with duplicates
numbers_list = [1, 2, 2, 3, 4, 4, 4, 5]
unique_numbers = set(numbers_list)
print(f"Set from list: {unique_numbers}")

# Create a set directly with curly braces
fruits = {"apple", "banana", "cherry"}
print(f"Set of fruits: {fruits}")

# To create an empty set, you MUST use set(), not {}
# because {} creates an empty dictionary!
empty_set = set()
empty_dict = {}
print(f"Type of empty_set: {type(empty_set)}")
print(f"Type of empty_dict: {type(empty_dict)}")

Set from list: {1, 2, 3, 4, 5}
Set of fruits: {'banana', 'apple', 'cherry'}
Type of empty_set: <class 'set'>
Type of empty_dict: <class 'dict'>


---

### 2. Adding and Removing Elements

In [2]:
s = {1, 2, 3}
print(f"Original set: {s}")

# .add() adds a single element
s.add(4)
print(f"After add(4): {s}")

# .update() adds multiple elements from an iterable
s.update([4, 5, 6]) # Note that 4 is already there, so it's ignored
print(f"After update([4, 5, 6]): {s}")

# .remove() removes an element. Raises a KeyError if the element is not found.
s.remove(6)
print(f"After remove(6): {s}")

# .discard() also removes an element, but does NOT raise an error if it's not found.
s.discard(10) # No error
print(f"After discard(10): {s}")

# .pop() removes and returns an arbitrary element from the set.
popped_element = s.pop()
print(f"Popped element: {popped_element}")
print(f"Set after pop: {s}")

Original set: {1, 2, 3}
After add(4): {1, 2, 3, 4}
After update([4, 5, 6]): {1, 2, 3, 4, 5, 6}
After remove(6): {1, 2, 3, 4, 5}
After discard(10): {1, 2, 3, 4, 5}
Popped element: 1
Set after pop: {2, 3, 4, 5}


---

### 3. Set Operations
This is where sets are most powerful.

In [3]:
set_a = {1, 2, 3, 4}
set_b = {3, 4, 5, 6}

print(f"Set A: {set_a}")
print(f"Set B: {set_b}")
print("---")

# Union: All elements from both sets ( | operator or .union() )
print(f"Union: {set_a | set_b}")

# Intersection: Elements that are in BOTH sets ( & operator or .intersection() )
print(f"Intersection: {set_a & set_b}")

# Difference: Elements in A but NOT in B ( - operator or .difference() )
print(f"Difference (A - B): {set_a - set_b}")

# Symmetric Difference: Elements in either A or B, but NOT in both ( ^ operator or .symmetric_difference() )
print(f"Symmetric Difference: {set_a ^ set_b}")

Set A: {1, 2, 3, 4}
Set B: {3, 4, 5, 6}
---
Union: {1, 2, 3, 4, 5, 6}
Intersection: {3, 4}
Difference (A - B): {1, 2}
Symmetric Difference: {1, 2, 5, 6}


---

### 4. When to Use a Set

1.  **Removing duplicates from a list:** This is a very common and efficient use case.

In [4]:
my_list = [1, 1, 2, 3, 3, 3, 4, 5, 5]
unique_list = list(set(my_list))
print(f"Original list: {my_list}")
print(f"List with duplicates removed: {unique_list}")

Original list: [1, 1, 2, 3, 3, 3, 4, 5, 5]
List with duplicates removed: [1, 2, 3, 4, 5]


2.  **Membership Testing:** Checking if an element exists in a collection. This is much faster in a set than in a list, especially for large collections.

In [5]:
import time

large_list = list(range(1000000))
large_set = set(large_list)

# Test for an element at the end of the list
start_time = time.perf_counter()
_ = 999999 in large_list
end_time = time.perf_counter()
print(f"Time to check in list: {end_time - start_time:.7f} seconds")

# Test for the same element in the set
start_time = time.perf_counter()
_ = 999999 in large_set
end_time = time.perf_counter()
print(f"Time to check in set:  {end_time - start_time:.7f} seconds") # Much faster!

Time to check in list: 0.0156807 seconds
Time to check in set:  0.0002684 seconds


---

### ✍️ Exercises

**Exercise 1:** You have two lists of students. One for the math club, and one for the science club. Find out which students are in both clubs.

In [6]:
math_club = ["Alice", "Bob", "Charlie", "David"]
science_club = ["Charlie", "Eve", "Frank", "Alice"]
# Your code here

**Exercise 2:** Find out which students are in the math club but *not* in the science club.

In [7]:
# Your code here