# Mathematics for Data Science I - Week 1 Practice

## Set Theory, Relations, and Functions

**Course Code**: BSMA1001  
**Date**: 2025-11-15  
**Topics**: Number Systems, Sets, Relations, Functions

---

### üìö Prerequisites
- Review `notes/week-01-notes.md` first
- Python 3.8+ with NumPy, Pandas, Matplotlib

### üéØ Learning Goals
1. Implement set operations in Python
2. Visualize set relationships with Venn diagrams
3. Work with relations and check their properties
4. Create and analyze functions
5. Apply concepts to data science problems

### üìù Notebook Structure
1. **Set Operations** - Implement union, intersection, difference
2. **Venn Diagrams** - Visualize set relationships
3. **Relations** - Check reflexive, symmetric, transitive properties
4. **Functions** - Test injective, surjective, bijective
5. **Real-World Applications** - Customer segmentation example
6. **Practice Problems** - Exercises to reinforce learning

---

Let's get started! üöÄ

In [None]:
# Setup: Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib_venn import venn2, venn3, venn2_circles, venn3_circles
import itertools

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("‚úÖ Libraries imported successfully")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## Part 1: Set Operations in Python

Python's built-in `set` type is perfect for working with mathematical sets!

In [None]:
# Define sets
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7}
U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}  # Universal set

print("Set A:", A)
print("Set B:", B)
print("Universal Set U:", U)
print()

# Basic Set Operations
print("=" * 50)
print("BASIC SET OPERATIONS")
print("=" * 50)

# Union
union_ab = A | B  # or A.union(B)
print(f"A ‚à™ B = {union_ab}")

# Intersection
intersection_ab = A & B  # or A.intersection(B)
print(f"A ‚à© B = {intersection_ab}")

# Difference
difference_ab = A - B  # or A.difference(B)
print(f"A ‚àí B = {difference_ab}")

difference_ba = B - A
print(f"B ‚àí A = {difference_ba}")

# Symmetric Difference
sym_diff = A ^ B  # or A.symmetric_difference(B)
print(f"A ‚ñ≥ B = {sym_diff}")

# Complement
complement_a = U - A
print(f"A·∂ú (complement of A) = {complement_a}")

# Subset checks
print()
print("=" * 50)
print("SUBSET RELATIONSHIPS")
print("=" * 50)
print(f"Is A ‚äÜ U? {A.issubset(U)}")
print(f"Is B ‚äÜ U? {B.issubset(U)}")
print(f"Is A ‚äÜ B? {A.issubset(B)}")
print(f"Is {{1, 2}} ‚äÜ A? {{1, 2}.issubset(A) = {set({1, 2}).issubset(A)}")

# Cardinality
print()
print("=" * 50)
print("CARDINALITY (Size of Sets)")
print("=" * 50)
print(f"|A| = {len(A)}")
print(f"|B| = {len(B)}")
print(f"|A ‚à™ B| = {len(union_ab)}")
print(f"|A ‚à© B| = {len(intersection_ab)}")

# Verify Inclusion-Exclusion Principle
print()
print("Verifying Inclusion-Exclusion Principle:")
print(f"|A ‚à™ B| = |A| + |B| ‚àí |A ‚à© B|")
calculated = len(A) + len(B) - len(intersection_ab)
print(f"{len(union_ab)} = {len(A)} + {len(B)} ‚àí {len(intersection_ab)}")
print(f"{len(union_ab)} = {calculated}")
print(f"‚úÖ Principle verified!" if len(union_ab) == calculated else "‚ùå Error!")

## Part 2: Visualizing Sets with Venn Diagrams

Venn diagrams help us visualize relationships between sets.

In [None]:
# Install matplotlib-venn if not already installed
# !pip install matplotlib-venn

# Create Venn diagram for two sets
fig, axes = plt.subplots(2, 2, figsize=(14, 12))

# Plot 1: Basic Venn Diagram
ax1 = axes[0, 0]
v1 = venn2([A, B], set_labels=('A', 'B'), ax=ax1)
venn2_circles([A, B], ax=ax1, linewidth=1.5)
ax1.set_title('Venn Diagram: Sets A and B', fontsize=14, fontweight='bold')

# Plot 2: Highlighting Union
ax2 = axes[0, 1]
v2 = venn2([A, B], set_labels=('A', 'B'), ax=ax2)
venn2_circles([A, B], ax=ax2, linewidth=1.5)
if v2.get_patch_by_id('10'): v2.get_patch_by_id('10').set_alpha(0.8)
if v2.get_patch_by_id('11'): v2.get_patch_by_id('11').set_alpha(0.8)
if v2.get_patch_by_id('01'): v2.get_patch_by_id('01').set_alpha(0.8)
ax2.set_title(f'Union: A ‚à™ B = {union_ab}', fontsize=14, fontweight='bold')

# Plot 3: Highlighting Intersection
ax3 = axes[1, 0]
v3 = venn2([A, B], set_labels=('A', 'B'), ax=ax3)
venn2_circles([A, B], ax=ax3, linewidth=1.5)
if v3.get_patch_by_id('10'): v3.get_patch_by_id('10').set_alpha(0.2)
if v3.get_patch_by_id('11'): v3.get_patch_by_id('11').set_alpha(0.8)
if v3.get_patch_by_id('01'): v3.get_patch_by_id('01').set_alpha(0.2)
ax3.set_title(f'Intersection: A ‚à© B = {intersection_ab}', fontsize=14, fontweight='bold')

# Plot 4: Highlighting Difference
ax4 = axes[1, 1]
v4 = venn2([A, B], set_labels=('A', 'B'), ax=ax4)
venn2_circles([A, B], ax=ax4, linewidth=1.5)
if v4.get_patch_by_id('10'): v4.get_patch_by_id('10').set_alpha(0.8)
if v4.get_patch_by_id('11'): v4.get_patch_by_id('11').set_alpha(0.2)
if v4.get_patch_by_id('01'): v4.get_patch_by_id('01').set_alpha(0.2)
ax4.set_title(f'Difference: A ‚àí B = {difference_ab}', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("üìä Venn diagrams created successfully!")

## Part 3: Relations - Properties and Checking

Let's implement functions to check if a relation has specific properties.

In [None]:
def is_reflexive(relation, domain):
    """Check if relation is reflexive"""
    for a in domain:
        if (a, a) not in relation:
            return False
    return True

def is_symmetric(relation):
    """Check if relation is symmetric"""
    for (a, b) in relation:
        if (b, a) not in relation:
            return False
    return True

def is_antisymmetric(relation):
    """Check if relation is antisymmetric"""
    for (a, b) in relation:
        if a != b and (b, a) in relation:
            return False
    return True

def is_transitive(relation):
    """Check if relation is transitive"""
    for (a, b) in relation:
        for (c, d) in relation:
            if b == c:  # found (a,b) and (b,d)
                if (a, d) not in relation:
                    return False
    return True

def check_relation_properties(relation, domain, name="R"):
    """Comprehensive check of relation properties"""
    print(f"{'=' * 60}")
    print(f"Analyzing Relation {name}")
    print(f"{'=' * 60}")
    print(f"Domain: {domain}")
    print(f"Relation: {relation}")
    print()
    
    reflexive = is_reflexive(relation, domain)
    symmetric = is_symmetric(relation)
    antisymmetric = is_antisymmetric(relation)
    transitive = is_transitive(relation)
    
    print(f"‚úì Reflexive:      {reflexive}" if reflexive else f"‚úó Reflexive:      {reflexive}")
    print(f"‚úì Symmetric:      {symmetric}" if symmetric else f"‚úó Symmetric:      {symmetric}")
    print(f"‚úì Antisymmetric:  {antisymmetric}" if antisymmetric else f"‚úó Antisymmetric:  {antisymmetric}")
    print(f"‚úì Transitive:     {transitive}" if transitive else f"‚úó Transitive:     {transitive}")
    print()
    
    # Check for special types
    if reflexive and symmetric and transitive:
        print("üåü This is an EQUIVALENCE RELATION!")
    elif reflexive and antisymmetric and transitive:
        print("üåü This is a PARTIAL ORDER!")
    else:
        print("This is a general relation.")
    print()

# Example 1: Equivalence Relation
A = {1, 2, 3}
R1 = {(1,1), (2,2), (3,3), (1,2), (2,1)}
check_relation_properties(R1, A, "R‚ÇÅ")

# Example 2: Partial Order (‚â§ on {1,2,3})
R2 = {(1,1), (2,2), (3,3), (1,2), (1,3), (2,3)}
check_relation_properties(R2, A, "R‚ÇÇ (‚â§)")

# Example 3: Not an equivalence relation
R3 = {(1,1), (2,2), (3,3), (1,2)}
check_relation_properties(R3, A, "R‚ÇÉ")

## Part 4: Functions - Testing Types

Let's create functions and test if they're injective, surjective, or bijective.

In [None]:
def is_injective(func, domain):
    """Check if function is one-to-one (injective)"""
    outputs = [func(x) for x in domain]
    return len(outputs) == len(set(outputs))

def is_surjective(func, domain, codomain):
    """Check if function is onto (surjective)"""
    range_set = {func(x) for x in domain}
    return range_set == set(codomain)

def analyze_function(func, domain, codomain, name="f"):
    """Comprehensive analysis of a function"""
    print(f"{'=' * 60}")
    print(f"Analyzing Function {name}")
    print(f"{'=' * 60}")
    print(f"Domain:   {domain}")
    print(f"Codomain: {codomain}")
    print()
    
    # Create mapping table
    mapping = {x: func(x) for x in domain}
    range_set = set(mapping.values())
    
    print("Mapping:")
    for x, y in sorted(mapping.items()):
        print(f"  {name}({x}) = {y}")
    print()
    print(f"Range: {range_set}")
    print()
    
    # Check properties
    injective = is_injective(func, domain)
    surjective = is_surjective(func, domain, codomain)
    bijective = injective and surjective
    
    print(f"‚úì Injective (One-to-One):  {injective}" if injective else f"‚úó Injective (One-to-One):  {injective}")
    print(f"‚úì Surjective (Onto):       {surjective}" if surjective else f"‚úó Surjective (Onto):       {surjective}")
    print(f"‚úì Bijective:               {bijective}" if bijective else f"‚úó Bijective:               {bijective}")
    print()
    
    if bijective:
        print("üåü This is a BIJECTIVE function! It has an inverse.")
    print()

# Example 1: f(x) = 2x (injective, not surjective to ‚Ñù)
domain1 = range(1, 6)
codomain1 = range(1, 15)
analyze_function(lambda x: 2*x, domain1, codomain1, "f‚ÇÅ(x) = 2x")

# Example 2: f(x) = x¬≤ (not injective)
domain2 = range(-2, 3)
codomain2 = range(0, 10)
analyze_function(lambda x: x**2, domain2, codomain2, "f‚ÇÇ(x) = x¬≤")

# Example 3: f(x) = x + 3 (bijective)
domain3 = range(1, 6)
codomain3 = range(4, 9)
analyze_function(lambda x: x + 3, domain3, codomain3, "f‚ÇÉ(x) = x + 3")

# Visualize the functions
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Plot f‚ÇÅ(x) = 2x
ax1 = axes[0]
x1 = list(domain1)
y1 = [2*x for x in x1]
ax1.scatter(x1, y1, s=100, c='blue', marker='o', edgecolors='black', linewidths=2)
ax1.plot(x1, y1, 'b--', alpha=0.5)
ax1.set_xlabel('x', fontsize=12)
ax1.set_ylabel('f‚ÇÅ(x)', fontsize=12)
ax1.set_title('f‚ÇÅ(x) = 2x\n(Injective, Not Surjective)', fontsize=12, fontweight='bold')
ax1.grid(True, alpha=0.3)

# Plot f‚ÇÇ(x) = x¬≤
ax2 = axes[1]
x2 = list(domain2)
y2 = [x**2 for x in x2]
ax2.scatter(x2, y2, s=100, c='red', marker='s', edgecolors='black', linewidths=2)
ax2.plot(x2, y2, 'r--', alpha=0.5)
ax2.set_xlabel('x', fontsize=12)
ax2.set_ylabel('f‚ÇÇ(x)', fontsize=12)
ax2.set_title('f‚ÇÇ(x) = x¬≤\n(Not Injective)', fontsize=12, fontweight='bold')
ax2.grid(True, alpha=0.3)

# Plot f‚ÇÉ(x) = x + 3
ax3 = axes[2]
x3 = list(domain3)
y3 = [x + 3 for x in x3]
ax3.scatter(x3, y3, s=100, c='green', marker='^', edgecolors='black', linewidths=2)
ax3.plot(x3, y3, 'g--', alpha=0.5)
ax3.set_xlabel('x', fontsize=12)
ax3.set_ylabel('f‚ÇÉ(x)', fontsize=12)
ax3.set_title('f‚ÇÉ(x) = x + 3\n(Bijective)', fontsize=12, fontweight='bold')
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üìä Function visualizations created!")

## Part 5: Real-World Data Science Application

### Customer Segmentation Using Set Theory

Let's apply set theory to a realistic e-commerce customer segmentation problem!

In [None]:
# Create sample customer data
np.random.seed(42)
n_customers = 100

customer_ids = [f"C{i:04d}" for i in range(1, n_customers + 1)]

# Define customer segments using sets
purchased_last_month = set(np.random.choice(customer_ids, size=40, replace=False))
clicked_email = set(np.random.choice(customer_ids, size=35, replace=False))
high_loyalty_points = set(np.random.choice(customer_ids, size=25, replace=False))

print("=" * 70)
print("E-COMMERCE CUSTOMER SEGMENTATION")
print("=" * 70)
print(f"Total Customers: {len(customer_ids)}")
print(f"Purchased Last Month: {len(purchased_last_month)}")
print(f"Clicked Email Campaign: {len(clicked_email)}")
print(f"High Loyalty Points (>1000): {len(high_loyalty_points)}")
print()

# Segment Analysis using Set Operations
print("=" * 70)
print("CUSTOMER SEGMENTS")
print("=" * 70)

# Segment 1: High-value engaged customers
high_value_engaged = purchased_last_month & clicked_email & high_loyalty_points
print(f"1. High-Value Engaged (P ‚à© E ‚à© L): {len(high_value_engaged)} customers")
print(f"   ‚Üí VIP treatment, exclusive offers")
print()

# Segment 2: Potential churn risk
potential_churn = high_loyalty_points - purchased_last_month
print(f"2. Potential Churn Risk (L ‚àí P): {len(potential_churn)} customers")
print(f"   ‚Üí Have points but no recent purchase - send win-back campaign")
print()

# Segment 3: Email responsive but not converted
email_responsive = clicked_email - purchased_last_month
print(f"3. Email Responsive (E ‚àí P): {len(email_responsive)} customers")
print(f"   ‚Üí Clicked but didn't buy - might need better offers")
print()

# Segment 4: Active but not engaged with email
active_no_email = purchased_last_month - clicked_email
print(f"4. Active Non-Email (P ‚àí E): {len(active_no_email)} customers")
print(f"   ‚Üí Buy without emails - try different channels")
print()

# Segment 5: Recently active with any engagement
recently_active = purchased_last_month | clicked_email
print(f"5. Recently Active (P ‚à™ E): {len(recently_active)} customers")
print(f"   ‚Üí Any recent interaction")
print()

# Segment 6: Completely inactive
all_customers = set(customer_ids)
completely_inactive = all_customers - (purchased_last_month | clicked_email | high_loyalty_points)
print(f"6. Completely Inactive: {len(completely_inactive)} customers")
print(f"   ‚Üí No recent activity - re-engagement campaign")
print()

# Verify partition (segments should cover all customers)
total_accounted = len(high_value_engaged | potential_churn | email_responsive | 
                     active_no_email | completely_inactive)
print(f"Total customers in defined segments: {total_accounted}")

# Create visualization
fig = plt.figure(figsize=(14, 6))

# Left: Venn diagram
ax1 = fig.add_subplot(121)
venn3([purchased_last_month, clicked_email, high_loyalty_points], 
      set_labels=('Purchased\nLast Month', 'Clicked\nEmail', 'High Loyalty\nPoints'),
      ax=ax1)
ax1.set_title('Customer Segments Venn Diagram', fontsize=14, fontweight='bold')

# Right: Bar chart of segment sizes
ax2 = fig.add_subplot(122)
segments = [
    'High-Value\nEngaged',
    'Potential\nChurn',
    'Email\nResponsive',
    'Active\nNon-Email',
    'Completely\nInactive'
]
sizes = [
    len(high_value_engaged),
    len(potential_churn),
    len(email_responsive),
    len(active_no_email),
    len(completely_inactive)
]
colors = ['#2ecc71', '#e74c3c', '#f39c12', '#3498db', '#95a5a6']

bars = ax2.barh(segments, sizes, color=colors, edgecolor='black', linewidth=1.5)
ax2.set_xlabel('Number of Customers', fontsize=12)
ax2.set_title('Customer Segment Sizes', fontsize=14, fontweight='bold')
ax2.grid(axis='x', alpha=0.3)

# Add value labels on bars
for i, (bar, size) in enumerate(zip(bars, sizes)):
    ax2.text(size + 1, i, str(size), va='center', fontweight='bold')

plt.tight_layout()
plt.show()

# Create actionable insights
print()
print("=" * 70)
print("ACTIONABLE INSIGHTS")
print("=" * 70)
print(f"‚úÖ Focus on {len(high_value_engaged)} VIP customers for upselling")
print(f"‚ö†Ô∏è  {len(potential_churn)} customers at churn risk - immediate action needed")
print(f"üìß {len(email_responsive)} clicked emails but didn't buy - improve offers")
print(f"üì± {len(active_no_email)} buy without email - try SMS/push notifications")
print(f"üò¥ {len(completely_inactive)} inactive customers - re-engagement campaign")

# Calculate inclusion-exclusion for verification
print()
print("=" * 70)
print("VERIFICATION: Inclusion-Exclusion Principle")
print("=" * 70)
p_e_l = len(purchased_last_month | clicked_email | high_loyalty_points)
individual = len(purchased_last_month) + len(clicked_email) + len(high_loyalty_points)
pairs = (len(purchased_last_month & clicked_email) + 
         len(purchased_last_month & high_loyalty_points) + 
         len(clicked_email & high_loyalty_points))
triple = len(purchased_last_month & clicked_email & high_loyalty_points)

calculated = individual - pairs + triple
print(f"|P ‚à™ E ‚à™ L| = {p_e_l}")
print(f"|P| + |E| + |L| - |P‚à©E| - |P‚à©L| - |E‚à©L| + |P‚à©E‚à©L|")
print(f"= {individual} - {pairs} + {triple}")
print(f"= {calculated}")
print(f"‚úÖ Verified!" if p_e_l == calculated else "‚ùå Error in calculation")

## Part 6: Practice Problems

### Try solving these problems on your own!

**Problem 1**: Given A = {1, 3, 5, 7, 9} and B = {2, 3, 5, 8, 9}, find:
- A ‚à™ B
- A ‚à© B  
- A - B
- B - A
- A ‚ñ≥ B

**Problem 2**: In a survey of 200 data scientists:
- 120 use Python
- 80 use R
- 50 use both
How many use neither?

**Problem 3**: Define R on {1,2,3,4} by: a R b if a divides b
- Write R as a set of ordered pairs
- Check if R is reflexive, symmetric, antisymmetric, transitive

**Problem 4**: For f: {1,2,3,4,5} ‚Üí {1,2,3,4,5}, f(x) = 6 - x
- Is f injective?
- Is f surjective?
- Is f bijective?
- If bijective, find f‚Åª¬π(x)

---

## Key Takeaways

### üéØ What We Learned

1. **Set Operations are Fundamental**
   - Python's built-in `set` type makes operations easy
   - Venn diagrams visualize relationships clearly
   - Inclusion-exclusion principle solves counting problems

2. **Relations Have Structure**
   - Reflexive, symmetric, transitive properties matter
   - Equivalence relations partition sets into classes
   - Partial orders give us ordering structures

3. **Functions are Special Relations**
   - Injective = no two inputs give same output (one-to-one)
   - Surjective = every output is reached (onto)
   - Bijective = both properties (has inverse)

4. **Real-World Applications**
   - Customer segmentation uses set operations
   - Database queries use set theory (SQL)
   - Feature engineering in ML uses membership tests

### üí° Insights from Coding

- Visualizations make abstract concepts concrete
- Small test cases help verify understanding
- Real data makes mathematics meaningful

### ‚ùì Questions for Next Session

- How do we handle infinite sets in practice?
- What's the connection between functions and transformations in ML?
- How does graph theory (Weeks 10-11) relate to sets?

### üìö Next Steps

- [ ] Complete all practice problems above
- [ ] Review week-01-notes.md thoroughly
- [ ] Watch IIT Madras Week 1 lecture videos
- [ ] Solve textbook exercises: Rosen Ch. 2, problems 1-30
- [ ] Move to Week 2: Coordinate Geometry & Straight Lines

---

**Last Updated**: 2025-11-15  
**Next**: Week 2 Practice Notebook

---

### üéì Assignment Checklist

Before moving to Week 2, ensure you can:

- [ ] Perform union, intersection, difference operations
- [ ] Apply inclusion-exclusion principle
- [ ] Check if a relation is reflexive/symmetric/transitive
- [ ] Determine if a function is injective/surjective/bijective
- [ ] Create Venn diagrams for 2-3 sets
- [ ] Apply set theory to solve real-world problems
- [ ] Explain why equivalence relations partition sets

**If you checked all boxes**: You're ready for Week 2! üöÄ  
**If not**: Review the relevant sections and try more examples.