# Function 8: Multi-Condition Filtering 🔍

**🤖 AI-Enhanced Learning: Advanced Pandas Filtering Techniques**

In this notebook, you'll learn how to build the `multi_condition_filtering()` function using AI assistance. This function applies multiple filtering conditions using advanced pandas techniques including boolean indexing, lambda functions, and complex logical operations.

## 🎯 What This Function Does
- Applies multiple filtering conditions from a configuration
- Uses complex boolean operations and logical combinations
- Implements dynamic filtering based on column types
- Supports custom filter functions with lambda expressions
- Provides detailed filtering reports and statistics

## 🤖 AI Learning Objectives
By the end of this notebook, you will:
1. **Use Copilot CHAT** to understand complex boolean operations
2. **Use AGENT mode** to implement multiple filtering conditions
3. **Use EDIT mode** to add dynamic filter configuration parsing
4. **Learn about** lambda functions, query() method, and advanced pandas filtering

## 🔧 Function Signature
```python
def multi_condition_filtering(df, filters_config):
    """
    Args:
        df (pandas.DataFrame): Environmental data to filter
        filters_config (dict): Dictionary defining filtering rules and conditions
    
    Returns:
        pandas.DataFrame: Filtered data meeting all specified conditions
    """
```

## 🚀 Let's Master Advanced Filtering!


## 🚀 Step 1: Import Required Libraries and Load Data

Let's start by importing pandas and loading our environmental data:


In [None]:
import pandas as pd
import numpy as np

print(f"✅ Pandas version: {pd.__version__}")

# Load our environmental data
df = pd.read_csv('../data/temperature_readings.csv')
stations_df = pd.read_csv('../data/weather_stations.csv')

print(f"📊 Temperature data shape: {df.shape}")
print(f"📊 Stations data shape: {stations_df.shape}")
print(f"📈 Sample data:")
print(df.head())


## 📚 Step 2: Understanding Multi-Condition Filtering

**Multi-condition filtering** allows you to apply complex filtering logic using multiple criteria simultaneously. This is essential for:

- **Environmental data quality control**: Filter by temperature range AND quality flag
- **Spatial filtering**: Select data within geographic bounds AND specific stations  
- **Temporal filtering**: Filter by date range AND measurement type
- **Dynamic filtering**: Apply different rules based on configuration

### 🔍 **Types of Multi-Condition Filtering:**

1. **AND conditions**: All conditions must be true `(temp > 20) & (quality == 'good')`
2. **OR conditions**: Any condition can be true `(temp > 35) | (temp < -5)`  
3. **Complex combinations**: Mix of AND/OR with parentheses
4. **Dynamic conditions**: Conditions defined by configuration dictionaries


In [None]:
# Let's examine our data structure for filtering
print("🔍 Column information:")
print(df.info())
print(f"\n📊 Unique values in key columns:")
print(f"Quality flags: {df['quality_flag'].unique()}")
print(f"Temperature range: {df['temperature'].min():.1f} to {df['temperature'].max():.1f}")
print(f"Unique stations: {df['station_id'].nunique()}")


## 🎯 Step 3: Basic Multi-Condition Filtering

Let's start with simple multi-condition filters using the `&` (AND) and `|` (OR) operators:

### **💡 Important Note:**
When combining conditions in pandas, you **must use parentheses** around each condition!

✅ **Correct**: `(df['temp'] > 20) & (df['quality'] == 'good')`  
❌ **Wrong**: `df['temp'] > 20 & df['quality'] == 'good'`


In [None]:
# Example 1: AND filtering - temperature range AND good quality
print("🔍 Example 1: Temperature between 15-25°C AND good quality")

# Create the filter conditions
temp_condition = (df['temperature'] >= 15) & (df['temperature'] <= 25)
quality_condition = (df['quality_flag'] == 'good')

# Combine conditions with AND
combined_filter = temp_condition & quality_condition

# Apply the filter
filtered_data = df[combined_filter]

print(f"Original data: {len(df)} rows")
print(f"After filtering: {len(filtered_data)} rows")
print(f"Percentage kept: {len(filtered_data)/len(df)*100:.1f}%")

print(f"\n📊 Sample of filtered data:")
print(filtered_data.head())


In [None]:
# Example 2: OR filtering - extreme temperatures OR poor quality
print("🔍 Example 2: Extreme temperatures (< 5°C OR > 35°C) OR poor quality")

# Create the filter conditions  
extreme_cold = (df['temperature'] < 5)
extreme_hot = (df['temperature'] > 35)
poor_quality = (df['quality_flag'] == 'poor')

# Combine conditions with OR
combined_filter = extreme_cold | extreme_hot | poor_quality

# Apply the filter
filtered_data = df[combined_filter]

print(f"Original data: {len(df)} rows")
print(f"After filtering: {len(filtered_data)} rows")
print(f"Percentage kept: {len(filtered_data)/len(df)*100:.1f}%")

# Show breakdown by condition
print(f"\n📊 Breakdown of filter matches:")
print(f"Extreme cold (< 5°C): {sum(extreme_cold)} rows")
print(f"Extreme hot (> 35°C): {sum(extreme_hot)} rows") 
print(f"Poor quality: {sum(poor_quality)} rows")


## 🚀 Step 4: Dynamic Filtering with Configuration

The power of multi-condition filtering comes from making it **dynamic** - where filtering rules are defined by configuration rather than hard-coded values.

### **Configuration-Driven Approach:**
```python
filters_config = {
    "temperature": {"min": 10, "max": 30},
    "quality_flag": {"include": ["good", "fair"]}, 
    "station_type": {"exclude": ["temporary"]},
    "logic": "AND"  # or "OR"
}
```


In [None]:
# Step 5: Building the Complete Function
def multi_condition_filtering(df, filters_config):
    """
    Apply multiple filtering conditions dynamically based on configuration.
    
    Args:
        df (pandas.DataFrame): Environmental data to filter
        filters_config (dict): Dictionary defining filtering rules
    
    Returns:
        pandas.DataFrame: Filtered data meeting all conditions
    """
    print("=" * 50)
    print("APPLYING MULTI-CONDITION FILTERING")
    print("=" * 50)
    
    # Start with all data
    filtered_df = df.copy()
    initial_count = len(filtered_df)
    
    print(f"📊 Starting with {initial_count} rows")
    
    # Apply each filter condition
    conditions_applied = []
    
    for column, config in filters_config.items():
        if column == 'logic':
            continue  # Skip logic configuration
            
        if column not in df.columns:
            print(f"⚠️  Warning: Column '{column}' not found in data")
            continue
            
        # Handle different filter types
        if 'min' in config or 'max' in config:
            # Numeric range filtering
            condition = pd.Series([True] * len(filtered_df), index=filtered_df.index)
            
            if 'min' in config:
                condition &= (filtered_df[column] >= config['min'])
                
            if 'max' in config:
                condition &= (filtered_df[column] <= config['max'])
                
            conditions_applied.append(condition)
            print(f"🔢 Applied numeric filter to {column}: {config}")
            
        elif 'include' in config:
            # Include specific values
            condition = filtered_df[column].isin(config['include'])
            conditions_applied.append(condition)
            print(f"✅ Applied include filter to {column}: {config['include']}")
            
        elif 'exclude' in config:
            # Exclude specific values  
            condition = ~filtered_df[column].isin(config['exclude'])
            conditions_applied.append(condition)
            print(f"❌ Applied exclude filter to {column}: {config['exclude']}")
    
    # Combine all conditions
    if conditions_applied:
        logic = filters_config.get('logic', 'AND').upper()
        
        if logic == 'AND':
            # All conditions must be true
            final_condition = conditions_applied[0]
            for condition in conditions_applied[1:]:
                final_condition &= condition
        else:  # OR logic
            # Any condition can be true
            final_condition = conditions_applied[0]
            for condition in conditions_applied[1:]:
                final_condition |= condition
        
        # Apply the combined filter
        filtered_df = filtered_df[final_condition]
        
        print(f"🔗 Combined conditions using {logic} logic")
    
    final_count = len(filtered_df)
    percentage = (final_count / initial_count) * 100 if initial_count > 0 else 0
    
    print(f"📈 Result: {final_count} rows ({percentage:.1f}% of original data)")
    print(f"🚫 Filtered out: {initial_count - final_count} rows")
    
    return filtered_df

# Test the function
test_config = {
    "temperature": {"min": 15, "max": 25},
    "quality_flag": {"include": ["good", "fair"]},
    "logic": "AND"
}

result = multi_condition_filtering(df, test_config)
print(f"\n📊 Sample of filtered data:")
print(result.head())


## 🎯 **Your Assignment Task**

### **✅ STEP-BY-STEP INSTRUCTIONS:**

#### **1. COPY YOUR WORKING FUNCTION**
- **FROM**: The complete function in the cell above
- **TO**: `src/pandas_basics.py` 
- **REPLACE**: All the TODO comments in the `multi_condition_filtering()` function

#### **2. TEST YOUR IMPLEMENTATION:**
```bash
# Test just this function
uv run pytest tests/test_pandas_basics.py::test_multi_condition_filtering -v
```

#### **3. ⚠️ COMMON MISTAKES TO AVOID:**
- ❌ **Forgetting parentheses** around conditions: `(df['col'] > 5) & (df['col'] < 10)`
- ❌ **Using `and/or` instead of `&/|`** for pandas boolean operations
- ❌ **Not handling missing columns** gracefully
- ✅ **Do test with different configurations** - numeric ranges, include/exclude lists

---

## 🔑 Key Learning Points

- **Multi-condition filtering** enables complex data selection with multiple criteria
- **Configuration-driven filtering** makes functions flexible and reusable
- **Boolean operators** (`&`, `|`, `~`) are essential for combining conditions
- **Dynamic filtering** allows users to specify rules without changing code
- **Error handling** for missing columns makes functions robust

## 🚀 Next Steps

Once this function works and passes tests, move on to:
- **Function 8**: `notebooks/08_function_analyze_temporal_patterns.ipynb`

**Remember: Learn in notebooks → Copy to Python file → Test with pytest → Repeat! 🔄**
