# Set Comprehensions

**Creating sets efficiently with comprehensions**
- **Concise**: Write less code than loops
- **Readable**: Clear intent and logic
- **Fast**: Optimized performance
- **Pythonic**: Idiomatic Python style

**Basic Syntax**
- `{expression for item in iterable if condition}`

**Components:**
- `expression`: What to include in the set
- `item`: Variable representing each element
- `iterable`: Source of data (list, range, string, etc.)
- `condition`: Optional filter (if clause)

**Common Patterns:**
- **Transform**: `{func(x) for x in iterable}`
- **Filter**: `{x for x in iterable if condition}`
- **Transform + Filter**: `{func(x) for x in iterable if condition}`
- **Nested**: `{expr for x in iter1 for y in iter2}`
- **Conditional Expression**: `{x if condition else y for x in iterable}`

**Use When**
- simple transformations and filtering
- mathematical operations
- data cleaning and validation
- extracting unique values


**Best Practices:**
- keep comprehensions readable (break complex ones into functions)
- use meaningful variable names
- consider performance for large datasets
- remember that sets automatically remove duplicates
- use comprehensions for simple transformations, functions for complex logic
- when working with side effects use regular loops 

## Topic Covered
- transformations
- filtering
- nested iterations
- complex conditions with nested loops
- nested data structure

### Basic Set Comprehensions
- number transformations
- string transformations
- from strings (character sets)

In [None]:
# Simple number transformations
numbers = [1, 2, 3, 4, 5]

# Squares
squares = {x**2 for x in numbers}
print(f"Original: {numbers}")
print(f"Squares: {squares}")

# Cubes
cubes = {x**3 for x in range(1, 6)}
print(f"Cubes: {cubes}")

# Double each number
doubled = {x * 2 for x in numbers}
print(f"Doubled: {doubled}")

In [None]:
# String transformations
words = ["hello", "world", "python", "sets"]

# Uppercase
uppercase = {word.upper() for word in words}
print(f"Original: {words}")
print(f"Uppercase: {uppercase}")

# String lengths
lengths = {len(word) for word in words}
print(f"Lengths: {lengths}")

# First character of each word
first_chars = {word[0] for word in words}
print(f"First characters: {first_chars}")

In [None]:
# From strings (character sets)
text = "hello world"

# All characters
all_chars = {char for char in text}
print(f"Text: '{text}'")
print(f"All characters: {all_chars}")

# Only letters
letters_only = {char for char in text if char.isalpha()}
print(f"Letters only: {letters_only}")

# Vowels from text
vowels = {char for char in text.lower() if char in 'aeiou'}
print(f"Vowels found: {vowels}")

### Conditional Set Comprehensions
- filtering with conditions
- string filtering

In [None]:
# Filtering with conditions
numbers = range(1, 21)  # 1 to 20

# Even numbers
evens = {x for x in numbers if x % 2 == 0}
print(f"Even numbers: {evens}")

# Odd squares
odd_squares = {x**2 for x in numbers if x % 2 == 1}
print(f"Odd squares: {odd_squares}")

# Numbers divisible by 3 or 5
div_3_or_5 = {x for x in numbers if x % 3 == 0 or x % 5 == 0}
print(f"Divisible by 3 or 5: {div_3_or_5}")

# Perfect squares under 100
import math
perfect_squares = {x for x in range(1, 100) if int(math.sqrt(x))**2 == x}
print(f"Perfect squares under 100: {perfect_squares}")

In [None]:
# String filtering
words = ["apple", "banana", "cherry", "date", "elderberry", "fig"]

# Words longer than 4 characters
long_words = {word for word in words if len(word) > 4}
print(f"Long words: {long_words}")

# Words containing 'a'
with_a = {word for word in words if 'a' in word}
print(f"Words with 'a': {with_a}")

# Words starting with vowels
vowel_start = {word for word in words if word[0].lower() in 'aeiou'}
print(f"Words starting with vowels: {vowel_start}")

# Uppercase words with even length
even_length_upper = {word.upper() for word in words if len(word) % 2 == 0}
print(f"Even length (uppercase): {even_length_upper}")

### Advanced Set Comprehensions
- nested iterations
- complex conditions with nested loops
- nested data structures

In [None]:
# Nested iterations
# Cartesian product as coordinate pairs
coords = {(x, y) for x in range(3) for y in range(3)}
print(f"3x3 coordinates: {coords}")

# Multiplication table results
mult_results = {x * y for x in range(1, 4) for y in range(1, 4)}
print(f"Multiplication results: {mult_results}")

# Letter-number combinations
combinations = {f"{letter}{num}" for letter in 'ABC' for num in range(1, 4)}
print(f"Letter-number combinations: {combinations}")

In [None]:
# Complex conditions with nested loops
# Pythagorean triples (a² + b² = c²)
triples = {(a, b, c) for a in range(1, 13) 
           for b in range(a, 13) 
           for c in range(b, 13) 
           if a*a + b*b == c*c}
print(f"Pythagorean triples: {triples}")

# Prime factors
def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

primes_under_50 = {n for n in range(2, 50) if is_prime(n)}
print(f"Primes under 50: {primes_under_50}")

In [None]:
# Working with nested data structures
students_grades = [
    {"name": "Alice", "grades": [85, 90, 92]},
    {"name": "Bob", "grades": [78, 85, 80]},
    {"name": "Charlie", "grades": [90, 95, 88]},
    {"name": "Diana", "grades": [92, 88, 90]}
]

# All grades from all students
all_grades = {grade for student in students_grades for grade in student["grades"]}
print(f"All grades: {all_grades}")

# Students with grades above 90
high_achievers = {student["name"] for student in students_grades 
                  if any(grade > 90 for grade in student["grades"])}
print(f"High achievers: {high_achievers}")

# Average grades for each student
averages = {round(sum(student["grades"]) / len(student["grades"]), 1) 
           for student in students_grades}
print(f"Average grades: {averages}")

### Set Comprehensions vs Other Methods

In [None]:
import time

# Performance comparison
data = range(10000)

# Method 1: Set comprehension
start = time.time()
squares_comp = {x**2 for x in data if x % 2 == 0}
comp_time = time.time() - start

# Method 2: Loop with set operations
start = time.time()
squares_loop = set()
for x in data:
    if x % 2 == 0:
        squares_loop.add(x**2)
loop_time = time.time() - start

# Method 3: Using map and filter
start = time.time()
evens = filter(lambda x: x % 2 == 0, data)
squares_map = set(map(lambda x: x**2, evens))
map_time = time.time() - start

print(f"Set comprehension: {comp_time:.6f} seconds")
print(f"Loop method: {loop_time:.6f} seconds")
print(f"Map/filter method: {map_time:.6f} seconds")
print(f"\nAll methods produce same result: {squares_comp == squares_loop == squares_map}")
print(f"Result size: {len(squares_comp)}")

In [None]:
# Readability comparison
text = "The quick brown fox jumps over the lazy dog"

# Set comprehension (most readable)
unique_letters_comp = {char.lower() for char in text if char.isalpha()}

# Traditional approach
unique_letters_trad = set()
for char in text:
    if char.isalpha():
        unique_letters_trad.add(char.lower())

# Using filter and map
letters_only = filter(str.isalpha, text)
unique_letters_func = set(map(str.lower, letters_only))

print(f"Set comprehension: {unique_letters_comp}")
print(f"Traditional loop: {unique_letters_trad}")
print(f"Functional approach: {unique_letters_func}")
print(f"\nAll same? {unique_letters_comp == unique_letters_trad == unique_letters_func}")

### Real World Use Cases

#### Email Domain Analysis

In [None]:
# Analyzing email domains
emails = [
    "alice@gmail.com", "bob@company.com", "charlie@gmail.com",
    "diana@yahoo.com", "eve@company.com", "frank@hotmail.com",
    "grace@gmail.com", "henry@company.com", "iris@outlook.com"
]

# Extract unique domains
domains = {email.split('@')[1] for email in emails}
print(f"All domains: {domains}")

# Company emails only
company_domains = {email.split('@')[1] for email in emails 
                   if 'company' in email.split('@')[1]}
print(f"Company domains: {company_domains}")

# Users with Gmail addresses
gmail_users = {email.split('@')[0] for email in emails 
               if email.endswith('gmail.com')}
print(f"Gmail users: {gmail_users}")

# All usernames (before @)
usernames = {email.split('@')[0] for email in emails}
print(f"All usernames: {usernames}")

#### File Extension Analysis

In [None]:
# Analyzing file types in a directory
files = [
    "document.pdf", "image.jpg", "script.py", "data.csv",
    "photo.png", "code.js", "report.docx", "backup.zip",
    "config.json", "style.css", "index.html", "app.py"
]

# All file extensions
extensions = {file.split('.')[-1].lower() for file in files if '.' in file}
print(f"All extensions: {extensions}")

# Programming files
code_extensions = {'py', 'js', 'html', 'css', 'json'}
code_files = {file for file in files 
              if file.split('.')[-1].lower() in code_extensions}
print(f"Code files: {code_files}")

# Image files
image_files = {file for file in files 
               if file.split('.')[-1].lower() in {'jpg', 'jpeg', 'png', 'gif', 'bmp'}}
print(f"Image files: {image_files}")

# Files without extensions
no_extension = {file for file in files if '.' not in file}
print(f"Files without extension: {no_extension}")

#### Text Analysis

In [None]:
# Analyzing text patterns
sentences = [
    "The quick brown fox jumps over the lazy dog",
    "Python is a powerful programming language",
    "Set comprehensions are very useful tools",
    "Data analysis requires careful attention to detail"
]

# All unique words (case-insensitive)
all_words = {word.lower().strip('.,!?') for sentence in sentences 
             for word in sentence.split()}
print(f"Unique words ({len(all_words)}): {sorted(all_words)}")

# Words longer than 5 characters
long_words = {word.lower().strip('.,!?') for sentence in sentences 
              for word in sentence.split() 
              if len(word.strip('.,!?')) > 5}
print(f"\nLong words: {long_words}")

# Words starting with specific letters
p_words = {word.lower().strip('.,!?') for sentence in sentences 
           for word in sentence.split() 
           if word.lower().startswith('p')}
print(f"Words starting with 'p': {p_words}")

# All letters used in the text
all_letters = {char.lower() for sentence in sentences 
               for char in sentence if char.isalpha()}
print(f"\nAll letters used: {sorted(all_letters)}")
print(f"Missing letters: {set('abcdefghijklmnopqrstuvwxyz') - all_letters}")

### Advanced Patterns and Techniques

##### Conditional expressions inside comprehensions

In [None]:
numbers = range(-5, 6)  # -5 to 5

# Absolute values, but double negatives
processed = {x if x >= 0 else -2*x for x in numbers}
print(f"Numbers: {list(numbers)}")
print(f"Processed: {processed}")

# String processing with conditions
words = ["Hello", "WORLD", "python", "PROGRAMMING"]
normalized = {word.lower() if word.isupper() else word.upper() for word in words}
print(f"\nOriginal: {words}")
print(f"Normalized: {normalized}")

##### Using functions in comprehensions

In [None]:
def process_word(word):
    """Custom word processing function"""
    return word.lower().replace('a', '@').replace('e', '3')

def is_valid_word(word):
    """Check if word meets criteria"""
    return len(word) > 3 and word.isalpha()

words = ["apple", "banana", "cat", "elephant", "123", "dog", "computer"]

# Apply custom processing to valid words only
processed_words = {process_word(word) for word in words if is_valid_word(word)}
print(f"Original: {words}")
print(f"Processed valid words: {processed_words}")

##### Mathematical set operations with comprehensions

In [None]:
# Generate sets and perform operations
set_a = {x for x in range(1, 11) if x % 2 == 0}  # Even numbers 1-10
set_b = {x for x in range(1, 11) if x % 3 == 0}  # Multiples of 3
set_c = {x**2 for x in range(1, 6)}  # Perfect squares

print(f"Even numbers (1-10): {set_a}")
print(f"Multiples of 3 (1-10): {set_b}")
print(f"Perfect squares (1-5)²: {set_c}")

# Complex combinations
union_all = set_a | set_b | set_c
intersection_ab = set_a & set_b
print(f"\nUnion of all: {union_all}")
print(f"Even AND multiple of 3: {intersection_ab}")

# Create sets based on relationships to other sets
related_to_a = {x * 2 for x in set_a if x in set_b}
print(f"Double the numbers that are both even and multiples of 3: {related_to_a}")