# Walmart Product Strategy with Type-Aware Filtering

## 🚀 Quick Start
**IMPORTANT: Run the cells in order!** Start with the import cell below to load all the filtering functionality.

## 📋 Overview
1. Please read the dataset description (amazon_products.md) and develop the product strategy for walmart.
2. Please read the Filter API (Filter API.md) and develop the product strategy for walmart.

## 🎯 Type-Aware Filtering Features
- **Numerical fields**: `RATING >= 4.5`, `REVIEWS_COUNT > 100`
- **Boolean fields**: `IS_AVAILABLE.is_true()`, `IS_AVAILABLE.is_false()`
- **String fields**: `TITLE.contains("wireless")`, `BRAND.in_list(["Apple", "Samsung"])`
- **Array fields**: `CATEGORIES.includes("Electronics")`, `CATEGORIES.includes(["Electronics", "Home"])`

## ⚡ Native Operators
- `&` for AND operations
- `|` for OR operations
- `+` for AND operations (alternative)

## ⚠️ Important: Operator Precedence
**Always use parentheses around comparison operations!**
- ✅ **Correct**: `(RATING >= 4.5) & (REVIEWS_COUNT > 100)`
- ❌ **Wrong**: `RATING >= 4.5 & REVIEWS_COUNT > 100`

In [None]:
# Import the core filtering functionality from util modules
# This handles the path setup automatically
from import_util import *

# Test imports and basic functionality
print("Testing imports...")
try:
    print(f"RATING: {RATING}")
    print(f"REVIEWS_COUNT: {REVIEWS_COUNT}")
    print(f"IS_AVAILABLE: {IS_AVAILABLE}")
    print(f"CATEGORIES: {CATEGORIES}")
    print("✅ All imports working correctly!")
    
    # Test basic functionality
    print("\nTesting basic filter creation...")
    test_filter = (RATING >= 4.5) & IS_AVAILABLE.is_true()
    print(f"Test filter: {test_filter}")
    print("✅ Filter creation working correctly!")
    
except NameError as e:
    print(f"❌ Import error: {e}")
    print("Please check that import_util.py is working correctly!")


✅ Successfully imported all modules from util package
📁 Project root: /Users/derek/Documents/Projects/walmart insights
🐍 Python path includes: ['/Users/derek/Documents/Projects/walmart insights']

🧪 Testing imports:
  FilterFields.RATING = rating
  RATING = rating
  FilterOperator.EQUAL = =
  Available fields: 24
✅ All imports working correctly!
Testing imports...
RATING: rating
REVIEWS_COUNT: reviews_count
IS_AVAILABLE: is_available
CATEGORIES: categories
✅ All imports working correctly!

Testing basic filter creation...
Test filter: FilterGroup(operator=<LogicalOperator.AND: 'and'>, filters=[FilterCondition(name='rating', operator=<FilterOperator.GREATER_THAN_EQUAL: '>='>, value='4.5'), FilterCondition(name='is_available', operator=<FilterOperator.EQUAL: '='>, value='true')])
✅ Filter creation working correctly!


In [None]:
# Type-aware syntax examples - different methods for different data types!
def example_usage():
    """Demonstrate type-aware syntax - RATING >= 4.5 & IS_AVAILABLE.is_true()"""
    
    # Verify imports are working
    try:
        print(f"RATING: {RATING}")
        print(f"REVIEWS_COUNT: {REVIEWS_COUNT}")
        print(f"IS_AVAILABLE: {IS_AVAILABLE}")
        print("✅ All field imports verified!")
    except NameError as e:
        print(f"❌ Import error: {e}")
        print("Please run the import cell first!")
        return None
    
    # Initialize the filter using secrets from YAML file
    try:
        api_key = get_brightdata_api_key()
        print("✅ Successfully loaded API key from secrets.yaml")
    except ValueError as e:
        print(f"❌ Error loading API key: {e}")
        print("Please copy secrets.example.yaml to secrets.yaml and add your API key")
        return None
    
    filter_tool = AmazonProductFilter(api_key)
    
    # Example 1: High-rated products with good reviews (type-aware syntax!)
    high_rated_filter = (
        (RATING >= 4.5) &
        (REVIEWS_COUNT > 100) &
        IS_AVAILABLE.is_true()
    )
    
    # Example 2: Electronics products under $100
    electronics_filter = (
        CATEGORIES.includes("Electronics") &
        (FINAL_PRICE <= 100) &
        (CURRENCY == "USD")
    )
    
    # Example 3: Long-tail products (your logic with type-aware syntax)
    long_tail_filter = (
        (RATING >= 4.0) &
        REVIEWS_COUNT.in_range(50, 500) &
        IS_AVAILABLE.is_true() &
        (CURRENCY == "USD")
    )
    
    # Example 4: OR operation - high rating OR many reviews
    high_performance_filter = (
        (RATING >= 4.8) |
        (REVIEWS_COUNT > 5000)
    )
    
    # Example 5: Complex nested logic - (high rating AND available) OR (many reviews AND electronics)
    complex_filter = (
        ((RATING >= 4.5) & IS_AVAILABLE.is_true()) |
        ((REVIEWS_COUNT > 1000) & CATEGORIES.includes("Electronics"))
    )
    
    # Example 6: Brand-specific filters
    brand_filter = (
        BRAND.in_list(["Apple", "Samsung", "Sony"]) &
        (RATING >= 4.0) &
        DISCOUNT("is_not_null", None)
    )
    
    # Example 7: String operations
    title_filter = (
        TITLE.contains("wireless") &
        (RATING >= 4.0) &
        FINAL_PRICE.in_range(20, 200)
    )
    
    # Example 8: Array operations with lists
    multi_category_filter = (
        CATEGORIES.includes(["Electronics", "Home & Garden", "Sports & Outdoors"]) &
        (RATING >= 4.0) &
        IS_AVAILABLE.is_true()
    )
    
    print("✅ Type-aware syntax examples created successfully!")
    print("🎯 RATING >= 4.5 & IS_AVAILABLE.is_true() - type-specific methods!")
    print("🔧 Numerical: >, >=, <, <=, ==, !=, in_range()")
    print("🔧 Boolean: is_true(), is_false(), ==, !=")
    print("🔧 String: contains(), in_list(), ==, !=")
    print("🔧 Array: includes(single_value), includes([list]), not_includes()")
    print("🚀 Perfect for data scientists - intuitive and type-safe!")
    
    return {
        "high_rated": high_rated_filter,
        "electronics": electronics_filter,
        "long_tail": long_tail_filter,
        "high_performance": high_performance_filter,
        "complex": complex_filter,
        "brand": brand_filter,
        "title": title_filter,
        "multi_category": multi_category_filter
    }

# Test the filter creation
if __name__ == "__main__":
    examples = example_usage()
    if examples:
        print("Amazon Product Filter function imported successfully!")
        print("Available filter examples:", list(examples.keys()))


RATING: rating
REVIEWS_COUNT: reviews_count
IS_AVAILABLE: is_available
✅ All field imports verified!
✅ Successfully loaded API key from secrets.yaml
✅ Type-aware syntax examples created successfully!
🎯 RATING >= 4.5 & IS_AVAILABLE.is_true() - type-specific methods!
🔧 Numerical: >, >=, <, <=, ==, !=, in_range()
🔧 Boolean: is_true(), is_false(), ==, !=
🔧 String: contains(), in_list(), ==, !=
🔧 Array: includes(single_value), includes([list]), not_includes()
🚀 Perfect for data scientists - intuitive and type-safe!
Amazon Product Filter function imported successfully!
Available filter examples: ['high_rated', 'electronics', 'long_tail', 'high_performance', 'complex', 'brand', 'title', 'multi_category']


In [None]:
# Field Reference - Just the essentials for IDE autocomplete
print("=== Available Filter Fields ===")
for field in FilterFields.get_all_fields():
    print(f"FilterFields.{field.field_name.upper()} = {field}")

print(f"\nTotal fields available: {FilterFields.get_field_count()}")
print("🔧 Use these enums for IDE autocomplete - you specify the filter logic!")


=== Available Filter Fields ===
FilterFields.TITLE = title
FilterFields.ASIN = asin
FilterFields.BRAND = brand
FilterFields.DESCRIPTION = description
FilterFields.CATEGORIES = categories
FilterFields.INITIAL_PRICE = initial_price
FilterFields.FINAL_PRICE = final_price
FilterFields.CURRENCY = currency
FilterFields.DISCOUNT = discount
FilterFields.RATING = rating
FilterFields.REVIEWS_COUNT = reviews_count
FilterFields.AVAILABILITY = availability
FilterFields.IS_AVAILABLE = is_available
FilterFields.SELLER_NAME = seller_name
FilterFields.BUYBOX_SELLER = buybox_seller
FilterFields.NUMBER_OF_SELLERS = number_of_sellers
FilterFields.BS_RANK = bs_rank
FilterFields.ROOT_BS_RANK = root_bs_rank
FilterFields.DEPARTMENT = department
FilterFields.ITEM_WEIGHT = item_weight
FilterFields.PRODUCT_DIMENSIONS = product_dimensions
FilterFields.MODEL_NUMBER = model_number
FilterFields.MANUFACTURER = manufacturer
FilterFields.UPC = upc

Total fields available: 24
🔧 Use these enums for IDE autocomplete - you

In [None]:
# Syntax Evolution: From Verbose to Type-Aware

def syntax_evolution():
    """Show the evolution from verbose to type-aware syntax"""
    
    try:
        api_key = get_brightdata_api_key()
        filter_tool = AmazonProductFilter(api_key)
    except ValueError as e:
        print(f"❌ Error loading API key: {e}")
        return
    
    print("=== SYNTAX EVOLUTION ===\n")
    
    print("🔴 VERSION 1 - VERBOSE SYNTAX:")
    print("""
    # Old way - verbose and hard to read
    old_filter = filter_tool.create_filter_group(
        LogicalOperator.AND,
        [
            filter_tool.create_filter(FilterFields.RATING.value, FilterOperator.GREATER_THAN_EQUAL, "4.5"),
            filter_tool.create_filter(FilterFields.REVIEWS_COUNT.value, FilterOperator.GREATER_THAN, "100"),
            filter_tool.create_filter(FilterFields.IS_AVAILABLE.value, FilterOperator.EQUAL, "true")
        ]
    )
    """)
    
    print("🟡 VERSION 2 - CALLABLE FIELDS:")
    print("""
    # Better - callable fields but still string-based
    new_filter = (
        RATING(">=", "4.5") &
        REVIEWS_COUNT(">", "100") &
        IS_AVAILABLE("=", "true")
    )
    """)
    
    print("🟢 VERSION 3 - TYPE-AWARE SYNTAX:")
    print("""
    # Best - type-aware with appropriate methods for each data type
    type_aware_filter = (
        (RATING >= 4.5) &
        (REVIEWS_COUNT > 100) &
        IS_AVAILABLE.is_true()
    )
    """)
    
    print("🎯 TYPE-AWARE FIELD CATEGORIES:")
    print("  📊 NUMERICAL: RATING, REVIEWS_COUNT, FINAL_PRICE, DISCOUNT")
    print("     Methods: >, >=, <, <=, ==, !=, in_range()")
    print("  ✅ BOOLEAN: IS_AVAILABLE")
    print("     Methods: is_true(), is_false(), ==, !=")
    print("  📝 STRING: TITLE, BRAND, CURRENCY, SELLER_NAME")
    print("     Methods: contains(), in_list(), ==, !=")
    print("  📋 ARRAY: CATEGORIES")
    print("     Methods: includes(), not_includes()")
    
    print("\n🔧 TYPE-SPECIFIC EXAMPLES:")
    print("  RATING >= 4.5                    # Numerical comparison")
    print("  IS_AVAILABLE.is_true()           # Boolean check")
    print("  TITLE.contains('wireless')       # String contains")
    print("  CATEGORIES.includes('Electronics') # Array includes single value")
    print("  CATEGORIES.includes(['Electronics', 'Home']) # Array includes list")
    print("  REVIEWS_COUNT.in_range(50, 500)  # Numerical range")
    print("  BRAND.in_list(['Apple', 'Samsung']) # String in list")
    
    print("\n🚀 COMPLEX EXAMPLE - TYPE-AWARE:")
    print("""
    # Complex nested logic with type-aware syntax
    complex_filter = (
        ((RATING >= 4.5) & IS_AVAILABLE.is_true()) |
        ((REVIEWS_COUNT > 1000) & CATEGORIES.includes("Electronics"))
    )
    """)
    
    print("✅ Perfect for data scientists - intuitive, type-safe, and powerful!")

# Run the evolution demo
syntax_evolution()


=== SYNTAX EVOLUTION ===

🔴 VERSION 1 - VERBOSE SYNTAX:

    # Old way - verbose and hard to read
    old_filter = filter_tool.create_filter_group(
        LogicalOperator.AND,
        [
            filter_tool.create_filter(FilterFields.RATING.value, FilterOperator.GREATER_THAN_EQUAL, "4.5"),
            filter_tool.create_filter(FilterFields.REVIEWS_COUNT.value, FilterOperator.GREATER_THAN, "100"),
            filter_tool.create_filter(FilterFields.IS_AVAILABLE.value, FilterOperator.EQUAL, "true")
        ]
    )
    
🟡 VERSION 2 - CALLABLE FIELDS:

    # Better - callable fields but still string-based
    new_filter = (
        RATING(">=", "4.5") &
        REVIEWS_COUNT(">", "100") &
        IS_AVAILABLE("=", "true")
    )
    
🟢 VERSION 3 - TYPE-AWARE SYNTAX:

    # Best - type-aware with appropriate methods for each data type
    type_aware_filter = (
        (RATING >= 4.5) &
        (REVIEWS_COUNT > 100) &
        IS_AVAILABLE.is_true()
    )
    
🎯 TYPE-AWARE FIELD CATEGORIES:

In [None]:
# Secrets Configuration Demo

def demonstrate_secrets_config():
    """Demonstrate how to use the secrets configuration system"""
    
    print("=== Secrets Configuration Demo ===")
    
    # Validate that required secrets are present
    try:
        validate_required_secrets()
        print("✅ All required secrets are properly configured")
    except ValueError as e:
        print(f"❌ Missing required secrets: {e}")
        print("\nTo fix this:")
        print("1. Copy secrets.example.yaml to secrets.yaml")
        print("2. Fill in your actual API keys and credentials")
        print("3. Run this cell again")
        return
    
    # Get specific secrets
    print("\n=== Configuration Values ===")
    
    # Bright Data configuration
    brightdata_config = {
        'api_key': get_secret('brightdata.api_key', 'Not set')[:10] + "..." if get_secret('brightdata.api_key') else 'Not set',
        'dataset_id': get_secret('brightdata.dataset_id', 'Not set'),
        'base_url': get_secret('brightdata.base_url', 'Not set')
    }
    
    print("Bright Data Configuration:")
    for key, value in brightdata_config.items():
        print(f"  {key}: {value}")
    
    # Environment configuration
    env_config = {
        'debug': get_secret('environment.debug', False),
        'log_level': get_secret('environment.log_level', 'INFO'),
        'timeout': get_secret('environment.timeout', 30)
    }
    
    print("\nEnvironment Configuration:")
    for key, value in env_config.items():
        print(f"  {key}: {value}")
    
    # Test API key loading
    print("\n=== API Key Test ===")
    try:
        api_key = get_brightdata_api_key()
        print(f"✅ API key loaded successfully: {api_key[:10]}...")
        
        # Test creating a filter with the loaded API key
        filter_tool = AmazonProductFilter(api_key)
        print("✅ AmazonProductFilter initialized successfully")
        
    except ValueError as e:
        print(f"❌ Error loading API key: {e}")
    
    print("\n=== Secrets Configuration Complete ===")

# Run the demo
demonstrate_secrets_config()


=== Secrets Configuration Demo ===
✅ All required secrets are properly configured

=== Configuration Values ===
Bright Data Configuration:
  api_key: 1b1837e37e...
  dataset_id: gd_l7q7dkf244hwjntr0
  base_url: https://api.brightdata.com/datasets

Environment Configuration:
  debug: False
  log_level: INFO
  timeout: 30

=== API Key Test ===
✅ API key loaded successfully: 1b1837e37e...
✅ AmazonProductFilter initialized successfully

=== Secrets Configuration Complete ===
