# 16 - Tuples and Sets

## Introduction

Tuples and sets are additional data structures in Python that complement lists and dictionaries. Understanding them is important for data engineering tasks.

## What You'll Learn

- Creating and using tuples
- Tuple operations and methods
- Creating and using sets
- Set operations (union, intersection, difference)
- When to use tuples vs lists vs sets


## Tuples

Tuples are like lists, but they are **immutable** (cannot be changed after creation). They're created using parentheses `()`.


In [1]:
# Create a tuple
coordinates = (10, 20)
print("Coordinates:", coordinates)

# Access tuple elements (same as lists)
print("X coordinate:", coordinates[0])
print("Y coordinate:", coordinates[1])

# Tuples can contain different types
person = ("Alice", 25, "Engineer")
print("Person:", person)


Coordinates: (10, 20)
X coordinate: 10
Y coordinate: 20
Person: ('Alice', 25, 'Engineer')


## Tuple Immutability

Unlike lists, tuples cannot be modified after creation.


In [2]:
# This will work (accessing)
my_tuple = (1, 2, 3)
print("First element:", my_tuple[0])

# This will cause an error (uncomment to see)
# my_tuple[0] = 10  # TypeError: 'tuple' object does not support item assignment


First element: 1


## Tuple Unpacking

You can unpack tuples into multiple variables, which is very useful.


In [3]:
# Unpacking a tuple
point = (5, 10)
x, y = point
print(f"X: {x}, Y: {y}")

# Multiple values
person_info = ("Bob", 30, "Data Engineer")
name, age, job = person_info
print(f"{name} is {age} years old and works as a {job}")


X: 5, Y: 10
Bob is 30 years old and works as a Data Engineer


## Sets

Sets are collections of unique items. They're created using curly braces `{}` or the `set()` function.


In [4]:
# Create a set
fruits = {"apple", "banana", "orange", "apple"}  # Duplicate removed
print("Fruits set:", fruits)

# Create set from list (removes duplicates)
numbers = [1, 2, 2, 3, 3, 3, 4]
unique_numbers = set(numbers)
print("Unique numbers:", unique_numbers)


Fruits set: {'apple', 'banana', 'orange'}
Unique numbers: {1, 2, 3, 4}


## Set Operations

Sets support mathematical operations like union, intersection, and difference.


In [5]:
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}

# Union (all elements from both sets)
union = set1 | set2  # or set1.union(set2)
print("Union:", union)

# Intersection (common elements)
intersection = set1 & set2  # or set1.intersection(set2)
print("Intersection:", intersection)

# Difference (elements in set1 but not in set2)
difference = set1 - set2  # or set1.difference(set2)
print("Difference:", difference)


Union: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection: {4, 5}
Difference: {1, 2, 3}


## Set Methods

Sets have useful methods for adding, removing, and checking elements.


In [6]:
# Set methods
my_set = {1, 2, 3}

# Add element
my_set.add(4)
print("After adding 4:", my_set)

# Remove element
my_set.remove(2)
print("After removing 2:", my_set)

# Check if element exists
print("Is 3 in set?", 3 in my_set)
print("Is 5 in set?", 5 in my_set)


After adding 4: {1, 2, 3, 4}
After removing 2: {1, 3, 4}
Is 3 in set? True
Is 5 in set? False


## When to Use Each Data Structure

- **Lists**: When you need an ordered, mutable collection
- **Tuples**: When you need an ordered, immutable collection (e.g., coordinates, database records)
- **Sets**: When you need unique elements and fast membership testing
- **Dictionaries**: When you need key-value pairs


In [7]:
# Practical example: Finding unique values
data = [1, 2, 2, 3, 3, 3, 4, 5, 5]
unique_data = list(set(data))  # Convert to set to remove duplicates, then back to list
print("Unique values:", unique_data)

# Practical example: Tuple for coordinates
points = [(0, 0), (1, 1), (2, 2)]
for x, y in points:
    print(f"Point at ({x}, {y})")


Unique values: [1, 2, 3, 4, 5]
Point at (0, 0)
Point at (1, 1)
Point at (2, 2)


## Key Points to Remember

- Tuples are immutable lists - use them when data shouldn't change
- Sets store unique elements and are great for removing duplicates
- Sets are faster than lists for membership testing (`in` operator)
- Tuple unpacking is a powerful feature for working with multiple values
- These data structures are used in various PySpark operations
