# Week 6 Python Exercises: Regular Expressions & Pattern Matching
## Wednesday Python Class - Regex Patterns
### Business Context: Advanced Text Analysis & Validation

---

## BUSINESS SCENARIO:
As a Senior Data Analyst at NaijaMart, you've been tasked with implementing advanced data validation rules and customer segmentation based on text patterns. Your analysis will help improve data quality and identify customer behavior patterns for targeted marketing campaigns.

## ADVANCED CONCEPTS COVERED:
✓ Regular expression fundamentals
✓ Pattern compilation and matching
✓ Text extraction and substitution
✓ Data validation using regex
✓ Advanced pattern analysis for business intelligence

## DATASETS USED:
- customer_reviews.csv (Customer review text analysis)
- customer_data.csv (Customer information validation)
- business_deals.csv (Business segment classification)
- product_data.csv (Product information extraction)

---

## Setup Instructions

Import the necessary libraries for regex pattern matching and data analysis.

In [None]:
# Required Imports
import pandas as pd
import numpy as np
import re
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

---
# SECTION 1: REGEX FUNDAMENTALS
Master basic regular expression patterns
---

### Exercise 1.1: Email Pattern Validation

**Business Context:** Validate email addresses using regex patterns

**Task:** Implement email validation and domain analysis

**Instructions:**
1. Create sample customer data with simulated email addresses
2. Define regex patterns for email validation
3. Identify valid vs invalid emails
4. Categorize by domain types (gmail, yahoo, business domains)
5. Generate validation report

**Regex patterns to use:**
- Basic email: `r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'`
- Domain extraction: `r'@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})$'`

In [None]:
def exercise_1_1_email_validation():
    """
    Exercise 1.1: Email Pattern Validation
    """
    
    # Sample email data (mix of valid and invalid)
    sample_emails = [
        'ana.silva@gmail.com', 'JOAO@YAHOO.COM', 'maria.oliveira@company.com.br',
        'invalid.email', 'pedro@', '@domain.com', 'user@domain',
        'carlos.123@business.org', 'lucia_ferreira@enterprise.net',
        'user@sub.domain.co.uk', '', None, 'not-an-email',
        'fernanda@local.com', 'roberto.alves@tech-company.com'
    ]

    customer_ids = [f'CUST_{i:04d}' for i in range(1, len(sample_emails) + 1)]

    # YOUR CODE HERE:
    # 1. Create DataFrame with customer data
    # 2. Define regex patterns for validation
    # 3. Validate email formats
    # 4. Extract domains from valid emails
    # 5. Categorize by domain types
    # 6. Generate comprehensive validation report
    
    pass

# Test your function
# result_1_1 = exercise_1_1_email_validation()
# print(result_1_1)

### Exercise 1.2: Brazilian Phone Number Pattern Matching

**Business Context:** Extract and validate Brazilian phone numbers from text

**Task:** Pattern matching for phone number extraction

**Instructions:**
1. Create sample review text containing phone numbers
2. Define regex patterns for Brazilian phone formats:
   - (11) 99999-9999 (with area code in parentheses)
   - 11 99999-9999 (space-separated)
   - +55 11 99999-9999 (international format)
   - 11999999999 (compact format)
3. Extract all phone numbers from text
4. Standardize to common format
5. Validate area codes for major Brazilian cities

**Use:** re.findall(), re.search(), re.sub()

In [None]:
def exercise_1_2_phone_number_extraction():
    """
    Exercise 1.2: Brazilian Phone Number Pattern Matching
    """
    
    # Sample review texts containing phone numbers
    sample_reviews = [
        "Produto ótimo! Qualquer dúvida me liguem: (11) 98765-4321",
        "Vendedor confiável, contato 21 99888-7777 para mais informações",
        "Para dúvidas: +55 85 91234-5678 ou email",
        "Whatsapp 11987654321 - produto novo na caixa",
        "Sem telefone nesta review, apenas feedback positivo",
        "Loja física: 47 3333-4444 ou (47) 99876-5432",
        "Contato comercial +55 11 94567-8901 horário comercial",
        None,
        "Multiple numbers: 11 91111-2222 and (21) 93333-4444",
        "International: +55 85 98888-9999 - excellent service"
    ]

    # Brazilian area codes for validation
    valid_area_codes = {
        '11': 'São Paulo', '21': 'Rio de Janeiro', '31': 'Belo Horizonte',
        '41': 'Curitiba', '47': 'Joinville', '51': 'Porto Alegre',
        '61': 'Brasília', '71': 'Salvador', '81': 'Recife', '85': 'Fortaleza'
    }

    # YOUR CODE HERE:
    # 1. Define regex patterns for different phone formats
    # 2. Extract phone numbers from each review
    # 3. Standardize formats
    # 4. Validate area codes
    # 5. Create extraction report
    
    pass

# Test your function
# result_1_2 = exercise_1_2_phone_number_extraction()
# print(result_1_2)

### Exercise 1.3: Product Code Pattern Recognition

**Business Context:** Extract product codes and SKUs from product descriptions

**Task:** Recognize different product code formats

**Instructions:**
1. Create sample product descriptions with embedded codes
2. Define patterns for different code formats:
   - ABC123 (letters followed by numbers)
   - ABC-123 (letters, dash, numbers)
   - A1B2C3 (alternating letters and numbers)
   - 12345-ABC (numbers, dash, letters)
3. Extract all product codes
4. Categorize by format type
5. Generate frequency analysis

**Use:** re.compile(), re.finditer() for detailed matching

In [None]:
def exercise_1_3_product_code_extraction():
    """
    Exercise 1.3: Product Code Pattern Recognition
    """
    
    # Sample product descriptions with various code formats
    product_descriptions = [
        "Smart TV Samsung 55' modelo UN55TU7000 com tecnologia HDR",
        "Smartphone iPhone 12 Pro Max A2342 cor azul pacífico",
        "Notebook Dell Inspiron I15-3501-A70S processador Intel Core i7",
        "Produto código ABC123 - Headset Gamer com microfone",
        "Mouse sem fio modelo M705-BLK conexão Bluetooth",
        "Teclado mecânico código GMK-87 switches Cherry MX",
        "Monitor LG 24' código 24MK430H-B resolução Full HD",
        "Produto sem código específico apenas descrição geral",
        None,
        "Multiple codes: ABC123, DEF-456, and G1H2I3 in same product",
        "Câmera Canon EOS 2000D kit com lente EF-S18-55 código CAN2000D"
    ]

    # YOUR CODE HERE:
    # 1. Define regex patterns for different code formats
    # 2. Extract codes from descriptions
    # 3. Categorize by pattern type
    # 4. Count frequency of each format
    # 5. Identify products without codes
    
    pass

# Test your function
# result_1_3 = exercise_1_3_product_code_extraction()
# print(result_1_3)

---
# SECTION 2: ADVANCED PATTERN MATCHING
Complex regex operations for business intelligence
---

### Exercise 2.1: Sentiment Analysis Using Regex Patterns

**Business Context:** Classify review sentiment using Portuguese keyword patterns

**Task:** Implement regex-based sentiment classification

**Instructions:**
1. Create comprehensive review dataset
2. Define regex patterns for sentiment indicators:
   - Positive: (ótimo|excelente|recomendo|perfeito|adorei)
   - Negative: (ruim|péssimo|defeito|problema|terrível)
   - Neutral: (normal|regular|ok|comum)
3. Count pattern matches per review
4. Create sentiment scoring algorithm
5. Validate against numeric ratings

**Use:** word boundaries \b for precise matching

In [None]:
def exercise_2_1_sentiment_pattern_analysis():
    """
    Exercise 2.1: Sentiment Analysis Using Regex Patterns
    """
    
    # Sample review data with mixed sentiments
    review_data = {
        'review_id': range(1, 16),
        'score': [5, 4, 5, 2, 3, 5, 1, 4, 5, 3, 2, 5, 4, 1, 5],
        'message': [
            "Produto excelente! Recomendo para todos, qualidade ótima!",
            "Bom produto, mas entrega normal, nada excepcional",
            "Adorei a compra! Produto perfeito, superou expectativas",
            "Produto com defeito, qualidade ruim, não recomendo",
            "Regular, produto ok mas nada demais, comum",
            "Excelente atendimento, produto ótimo, muito recomendo!",
            "Terrível! Produto péssimo, cheio de problemas",
            "Produto bom, entrega ok, experiência normal",
            "Perfeito! Adorei tudo, qualidade excelente, ótimo!",
            "Produto comum, qualidade regular, normal",
            "Ruim, muitos problemas, defeito logo no primeiro uso",
            "Recomendo fortemente! Produto ótimo, excelente qualidade",
            "Produto ok, regular, nada excepcional mas serve",
            "Péssimo produto, terrível qualidade, cheio de defeitos",
            "Adorei! Produto perfeito, excelente, super recomendo!"
        ]
    }

    # YOUR CODE HERE:
    # 1. Define regex patterns for each sentiment category
    # 2. Count matches in each review
    # 3. Create sentiment scoring function
    # 4. Calculate sentiment scores
    # 5. Compare with numeric ratings
    # 6. Generate accuracy report
    
    pass

# Test your function
# result_2_1 = exercise_2_1_sentiment_pattern_analysis()
# print(result_2_1)

### Exercise 2.2: Price and Currency Pattern Extraction

**Business Context:** Extract price mentions from review text

**Task:** Extract and analyze price patterns

**Instructions:**
1. Create sample reviews mentioning prices
2. Define regex patterns for Brazilian currency formats:
   - R$ 99,99 (standard format)
   - R$99.99 (international format)
   - 99,99 reais (text format)
   - R$ 1.299,99 (thousands separator)
3. Extract and standardize all price mentions
4. Analyze price ranges mentioned in reviews
5. Correlate with review scores

**Use:** capturing groups to extract numeric values

In [None]:
def exercise_2_2_price_extraction_analysis():
    """
    Exercise 2.2: Price and Currency Pattern Extraction
    """
    
    # Sample reviews with price mentions
    price_reviews = [
        "Produto barato, paguei apenas R$ 29,99 e vale muito!",
        "Caro demais! R$199.99 por esse produto é excessivo",
        "Preço justo de R$ 1.299,99 para essa qualidade",
        "Barato, apenas 45,90 reais, recomendo",
        "Produto custa R$89,90 mas qualidade é ótima",
        "Review sem menção de preços, apenas qualidade",
        "Paguei R$ 2.500,00 e não me arrependo",
        None,
        "Comparei preços: aqui R$150,00 vs concorrente R$180,99",
        "Produto gratuito em promoção, valor R$ 0,00",
        "Preço promocional R$99,90 (antes R$149,90)",
        "Caro! Mais de 500 reais por isso é absurdo",
        "Baratinho, só R$15,99 com frete grátis"
    ]

    scores = [5, 2, 4, 5, 4, 4, 5, 3, 4, 5, 5, 2, 5]

    # YOUR CODE HERE:
    # 1. Define regex patterns for price formats
    # 2. Extract all price mentions
    # 3. Standardize price formats
    # 4. Calculate price statistics
    # 5. Analyze correlation with review scores
    # 6. Identify price-sensitive reviews
    
    pass

# Test your function
# result_2_2 = exercise_2_2_price_extraction_analysis()
# print(result_2_2)

### Exercise 2.3: Date and Time Pattern Recognition

**Business Context:** Extract date/time mentions from customer reviews

**Task:** Extract and validate date patterns

**Instructions:**
1. Create reviews mentioning dates and times
2. Define patterns for Brazilian date formats:
   - dd/mm/yyyy or dd/mm/yy
   - dd-mm-yyyy or dd-mm-yy
   - dd de mês de yyyy (written format)
3. Extract delivery dates, purchase dates, problem dates
4. Validate date ranges and identify anomalies
5. Create timeline analysis

**Use:** named groups for day, month, year extraction

In [None]:
def exercise_2_3_date_time_extraction():
    """
    Exercise 2.3: Date and Time Pattern Recognition
    """
    
    # Sample reviews with date/time mentions
    date_reviews = [
        "Comprei em 15/03/2023 e chegou no dia 20/03/2023",
        "Produto adquirido no dia 01-12-2022, excelente!",
        "Chegou em 25 de dezembro de 2023, presente perfeito",
        "Compra realizada 10/01/23, entrega foi 15/01/23",
        "Review sem datas, apenas feedback geral",
        "Problema começou dia 05/08/2023 após 2 meses de uso",
        "Entrega prometida 30/11/2023 mas chegou 05/12/2023",
        None,
        "Multiple dates: bought 15/06/2023, delivered 20/06/2023, problem 01/07/2023",
        "Produto de 2022, mas comprei apenas em janeiro de 2024",
        "Data errada no sistema: aparece 32/13/2023 (impossível)",
        "Comprei ontem (não mencionou data específica)",
        "Produto de 1999 ainda funciona perfeitamente!"
    ]

    # YOUR CODE HERE:
    # 1. Define regex patterns for different date formats
    # 2. Extract dates from reviews
    # 3. Parse and validate dates
    # 4. Identify anomalies (invalid dates)
    # 5. Create timeline analysis
    # 6. Calculate delivery time patterns
    
    pass

# Test your function
# result_2_3 = exercise_2_3_date_time_extraction()
# print(result_2_3)

---
# SECTION 3: DATA VALIDATION WITH REGEX
Implement production-ready validation systems
---

### Exercise 3.1: Comprehensive Customer Data Validation

**Business Context:** Create validation framework using regex patterns

**Task:** Implement comprehensive data validation

**Instructions:**
1. Create customer dataset with various data quality issues
2. Define validation patterns for:
   - Customer ID format (letters + numbers)
   - Brazilian ZIP codes (5 digits or 8 digits with dash)
   - State codes (2 uppercase letters)
   - Phone numbers (Brazilian format)
3. Score data quality for each field (0-100)
4. Generate validation report with specific issues
5. Create data cleaning recommendations

**Use:** re.match() for exact format validation

In [None]:
def exercise_3_1_customer_data_validation():
    """
    Exercise 3.1: Comprehensive Customer Data Validation
    """
    
    # Sample customer data with quality issues
    customer_data = {
        'customer_id': [
            'CUST0001', 'customer_123', 'USER@456', '789', 'CUST-0002',
            'valid123', '', None, 'CUSTOMER_001', 'c001'
        ],
        'zip_code': [
            '01310100', '20040-020', '12345', '999999999', '00000',
            '12345-678', 'ABCDE', None, '01310', '20040-02'
        ],
        'state': [
            'SP', 'rj', 'MG', 'XX', '11', 'São Paulo', 'RS', None, 'sp', 'RJ'
        ],
        'phone': [
            '(11) 99999-8888', '21999887777', '+55 11 98888-7777', '123',
            '11 99999-8888', 'not-a-phone', None, '(11) 9999-8888', '11 9888-7777', '85 99999-8888'
        ]
    }

    # Validation patterns
    validation_patterns = {
        'customer_id': r'^[A-Z]{3,8}[0-9]{3,6}$',
        'zip_code': r'^(\d{5}|\d{5}-\d{3}|\d{8})$',
        'state': r'^[A-Z]{2}$',
        'phone': r'^(\(\d{2}\)\s?|\d{2}\s?)9?\d{4}-?\d{4}$'
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from customer data
    # 2. Apply validation patterns to each field
    # 3. Calculate quality scores
    # 4. Identify specific issues
    # 5. Generate cleaning recommendations
    # 6. Create comprehensive validation report
    
    pass

# Test your function
# result_3_1 = exercise_3_1_customer_data_validation()
# print(result_3_1)

### Exercise 3.2: Business Segment Standardization

**Business Context:** Standardize business segment names using regex

**Task:** Implement automated standardization

**Instructions:**
1. Create sample data with inconsistent segment names
2. Define patterns to identify and fix:
   - Mixed case variations
   - Special character inconsistencies
   - Space vs underscore variations
   - Common abbreviations
3. Create standardization mapping
4. Apply automated corrections
5. Generate before/after analysis

**Use:** re.sub() for text replacement and standardization

In [None]:
def exercise_3_2_business_segment_standardization():
    """
    Exercise 3.2: Business Segment Standardization
    """
    
    # Sample business segments with inconsistencies
    segment_data = [
        'Health & Beauty', 'health_beauty', 'HEALTH_BEAUTY', 'Health/Beauty',
        'Tech & Accessories', 'tech_accessories', 'Technology Accessories',
        'Home & Garden', 'home_garden', 'HOME_AND_GARDEN', 'House & Garden',
        'Sports & Leisure', 'sports_leisure', 'Sport & Recreation',
        'Food & Drink', 'food_beverage', 'FOOD_AND_DRINKS',
        'Auto & Vehicles', 'automotive', 'Cars & Auto',
        'Books & Media', 'books_media', 'Books/Media',
        '', None, 'UNKNOWN', 'Other'
    ]

    # YOUR CODE HERE:
    # 1. Define standardization patterns and rules
    # 2. Create mapping for common variations
    # 3. Apply standardization to all segments
    # 4. Count improvements made
    # 5. Generate standardization report
    # 6. Identify remaining inconsistencies
    
    pass

# Test your function
# result_3_2 = exercise_3_2_business_segment_standardization()
# print(result_3_2)

### Exercise 3.3: Review Content Quality Validation

**Business Context:** Validate review content for quality and compliance

**Task:** Implement content quality scoring

**Instructions:**
1. Create sample reviews with various quality issues
2. Define validation rules using regex:
   - Minimum meaningful content (not just punctuation)
   - No excessive repetition of characters
   - No personal information (phone, email, address)
   - No excessive capitalization (shouting)
   - No spam patterns
3. Score review quality (1-10)
4. Flag problematic reviews
5. Generate content moderation report

**Use:** multiple regex patterns for comprehensive validation

In [None]:
def exercise_3_3_review_content_validation():
    """
    Exercise 3.3: Review Content Quality Validation
    """
    
    # Sample reviews with quality issues
    sample_reviews = [
        "Produto excelente, recomendo para todos!",  # Good quality
        "PRODUTO MUITO RUIMMMMMM!!!!!!",  # Excessive caps and repetition
        "Call me at (11) 99999-8888 for great deals!",  # Personal info
        "....",  # No meaningful content
        "aaaaaaaaaaaaaaaaaaaaaaaaa",  # Spam pattern
        "Produto ok, nada excepcional mas serve",  # Normal quality
        "Buy here www.spam-site.com best prices!!!",  # Spam/promotional
        "",  # Empty
        None,  # Null
        "Produto muito bom mesmo, qualidade ótima e entrega rápida!",  # Good
        "COMPREM AQUI MELHOR PREÇO GARANTIDO!!!",  # Spam caps
        "My email is user@domain.com contact me",  # Personal info
        "Produto    com    espaços    excessivos",  # Formatting issues
        "Normal review with good feedback and details"  # Good quality
    ]

    # YOUR CODE HERE:
    # 1. Define validation patterns for each quality rule
    # 2. Score each review based on quality criteria
    # 3. Flag problematic content
    # 4. Categorize issues found
    # 5. Generate moderation recommendations
    # 6. Create quality improvement suggestions
    
    pass

# Test your function
# result_3_3 = exercise_3_3_review_content_validation()
# print(result_3_3)

---
# SECTION 4: BUSINESS INTELLIGENCE WITH REGEX
Advanced pattern analysis for strategic insights
---

### Exercise 4.1: Customer Journey Pattern Recognition

**Business Context:** Extract customer journey stages from review text

**Task:** Map customer journey patterns

**Instructions:**
1. Create reviews spanning different journey stages
2. Define patterns to identify:
   - Discovery stage (search, compare, consider)
   - Purchase stage (buy, order, payment)
   - Delivery stage (ship, arrive, receive)
   - Usage stage (use, test, experience)
   - Advocacy stage (recommend, share, review)
3. Map reviews to journey stages
4. Analyze sentiment by stage
5. Identify improvement opportunities

**Use:** regex with word boundaries and context

In [None]:
def exercise_4_1_customer_journey_analysis():
    """
    Exercise 4.1: Customer Journey Pattern Recognition
    """
    
    # Sample customer journey reviews
    journey_reviews = [
        "Pesquisei muito antes de comprar, comparei preços",  # Discovery
        "Processo de compra foi fácil, pagamento seguro",  # Purchase
        "Produto chegou rápido, embalagem perfeita",  # Delivery
        "Usando há 2 meses, funciona perfeitamente",  # Usage
        "Recomendo para todos, já indiquei para amigos",  # Advocacy
        "Descobri este produto por acaso, pesquisei reviews",  # Discovery
        "Comprei na promoção, desconto excelente",  # Purchase
        "Entrega atrasou, mas chegou bem embalado",  # Delivery
        "Testei todas as funções, produto ótimo",  # Usage
        "Já comprei 3 vezes, sempre recomendo",  # Advocacy
        "Vi no anúncio, pesquisei sobre o produto",  # Discovery
        None,
        "Review geral sem estágio específico mencionado"
    ]

    scores = [4, 5, 5, 5, 5, 4, 5, 3, 5, 5, 4, 3, 4]

    # YOUR CODE HERE:
    # 1. Define regex patterns for each journey stage
    # 2. Classify reviews by journey stage
    # 3. Calculate sentiment scores by stage
    # 4. Identify stage-specific issues
    # 5. Generate journey optimization insights
    # 6. Create improvement recommendations
    
    pass

# Test your function
# result_4_1 = exercise_4_1_customer_journey_analysis()
# print(result_4_1)

### Exercise 4.2: Competitive Analysis from Review Text

**Business Context:** Extract competitive insights using pattern matching

**Task:** Generate competitive intelligence

**Instructions:**
1. Create reviews mentioning competitors and comparisons
2. Define patterns to identify:
   - Competitor brand mentions
   - Price comparisons
   - Feature comparisons
   - Service comparisons
   - Switching patterns
3. Extract competitive intelligence
4. Analyze competitive positioning
5. Generate strategic recommendations

**Use:** named capture groups for structured extraction

In [None]:
def exercise_4_2_competitive_intelligence():
    """
    Exercise 4.2: Competitive Analysis from Review Text
    """
    
    # Sample competitive reviews
    competitive_reviews = [
        "Melhor que Amazon, preço mais barato aqui",
        "Mercado Livre demora mais, aqui chegou rápido",
        "Comparei com Americanas, este site é melhor",
        "Produto igual ao da Magazine Luiza por menos",
        "Vim do Submarino, atendimento aqui é superior",
        "Review sem menção de concorrentes",
        "Preço igual Casas Bahia mas entrega mais rápida",
        "Deixei Amazon para comprar aqui, sem arrependimento",
        None,
        "Shoppe tem mais variedade mas qualidade aqui é melhor",
        "Testei B2W e aqui, prefiro este site",
        "Produto único, não encontrei em outros lugares",
        "Concorrente X é mais caro, aqui compensa"
    ]

    # Known competitors for pattern matching
    competitors = [
        'Amazon', 'Mercado Livre', 'Americanas', 'Magazine Luiza',
        'Submarino', 'Casas Bahia', 'Shopee', 'B2W'
    ]

    # YOUR CODE HERE:
    # 1. Define patterns for competitive mentions
    # 2. Extract competitor references and context
    # 3. Classify comparison types (price, service, quality)
    # 4. Analyze competitive advantages mentioned
    # 5. Generate competitive intelligence report
    # 6. Create strategic positioning insights
    
    pass

# Test your function
# result_4_2 = exercise_4_2_competitive_intelligence()
# print(result_4_2)

### Exercise 4.3: Market Trend Detection from Text Patterns

**Business Context:** Identify emerging trends and changing preferences

**Task:** Detect market trends through pattern analysis

**Instructions:**
1. Create review data spanning different time periods
2. Define patterns to identify:
   - Technology adoption mentions
   - Sustainability concerns
   - Health and wellness trends
   - Convenience preferences
   - Quality expectations
3. Track pattern frequency over time
4. Identify emerging vs declining trends
5. Generate trend analysis report

**Use:** temporal analysis with regex pattern tracking

In [None]:
def exercise_4_3_trend_analysis():
    """
    Exercise 4.3: Market Trend Detection from Text Patterns
    """
    
    # Sample trend-related reviews with timestamps
    trend_data = {
        'review_date': [
            '2022-01-15', '2022-06-20', '2022-12-10', '2023-03-05',
            '2023-07-18', '2023-11-22', '2024-02-14', '2024-06-30',
            '2024-09-15', '2023-01-10', '2023-05-25', '2024-01-08'
        ],
        'review_text': [
            "Produto sustentável, embalagem reciclável",  # Sustainability
            "Funciona com Alexa, muito conveniente",  # Tech integration
            "Ingredientes naturais, sem químicos",  # Health/wellness
            "App funciona perfeitamente, muito prático",  # Digital convenience
            "Carbon neutral shipping, empresa consciente",  # Sustainability
            "Compatible with Google Home, smart features",  # Tech integration
            "Organic materials, eco-friendly production",  # Sustainability + Health
            "IA-powered features, very innovative",  # AI trend
            "Minimalist design, very aesthetic",  # Design trend
            "Traditional quality, like old times",  # Nostalgia trend
            "Fast delivery, instant gratification",  # Speed/convenience
            "Wellness-focused, improves mental health"  # Wellness trend
        ]
    }

    # Trend categories and keywords
    trend_patterns = {
        'sustainability': r'(sustentável|reciclável|eco-friendly|carbon neutral|organic)',
        'technology': r'(Alexa|Google Home|app|IA|AI|smart|digital)',
        'health_wellness': r'(natural|organic|wellness|mental health|químicos)',
        'convenience': r'(convenient|prático|fast|instant|fácil)',
        'aesthetics': r'(minimalist|aesthetic|design|beautiful|elegante)'
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame with temporal review data
    # 2. Apply trend pattern matching
    # 3. Track pattern frequency over time
    # 4. Identify growing vs declining trends
    # 5. Calculate trend momentum
    # 6. Generate trend forecast insights
    
    pass

# Test your function
# result_4_3 = exercise_4_3_trend_analysis()
# print(result_4_3)

---
# BONUS CHALLENGES
Advanced regex applications for business solutions
---

### Bonus 1: Multi-language Pattern Detection

**Challenge:** Detect patterns across Portuguese, English, and Spanish text in international customer reviews

In [None]:
def bonus_1_multilingual_analysis():
    """
    Bonus 1: Multi-language Pattern Detection
    """
    
    multilingual_reviews = [
        "Produto excelente, muito bom!",  # Portuguese
        "Excellent product, highly recommended!",  # English
        "Producto excelente, muy recomendado!",  # Spanish
        "Great quality, ótima qualidade, calidad excelente"  # Mixed
    ]

    # YOUR CODE HERE:
    # Create language detection and cross-language sentiment analysis
    
    pass

# Test your function
# result_bonus_1 = bonus_1_multilingual_analysis()
# print(result_bonus_1)

### Bonus 2: Advanced Text Mining Pipeline

**Challenge:** Create a comprehensive text analysis pipeline using regex for automated business intelligence extraction

In [None]:
def bonus_2_advanced_text_mining():
    """
    Bonus 2: Advanced Text Mining Pipeline
    """
    
    # Sample complex business text
    business_text = """
    Q3 2024 Results: Revenue increased 25% to R$ 2.5M.
    Customer satisfaction improved from 4.2 to 4.7 stars.
    New product line launched in São Paulo and Rio markets.
    Competitor analysis shows 15% price advantage.
    Mobile app downloads increased 300% since launch.
    """

    # YOUR CODE HERE:
    # Create comprehensive extraction pipeline for business metrics
    
    pass

# Test your function
# result_bonus_2 = bonus_2_advanced_text_mining()
# print(result_bonus_2)

---
# SUBMISSION CHECKLIST
---

## ADVANCED CRITERIA FOR REGEX EXERCISES:
✓ Proper regex pattern syntax and efficiency
✓ Appropriate use of capture groups and flags
✓ Business logic correctly implemented with patterns
✓ Edge cases handled (None, empty strings, special chars)
✓ Performance considerations for large datasets
✓ Results provide actionable business insights

## GRADING RUBRIC:
- Technical correctness (35%)
- Regex pattern proficiency (25%)
- Business problem solving (25%)
- Code optimization and best practices (15%)

## SUBMISSION REQUIREMENTS:
- All functions must execute without errors
- Include sample outputs as comments
- Explain complex regex patterns used
- Document business assumptions made
- Provide performance considerations for production use

## TESTING YOUR REGEX:
Use online regex testers (regex101.com) to validate patterns
Test with edge cases and unexpected input
Verify capture groups extract expected data
Check performance with large text samples

**Save as:** your_name_python_regex_exercises.ipynb

In [None]:
# Test your functions here
print("Testing Exercise 1.1 - Email Validation...")
# result_1_1 = exercise_1_1_email_validation()
# print(result_1_1)

print("\nTesting Exercise 1.2 - Phone Number Extraction...")
# result_1_2 = exercise_1_2_phone_number_extraction()
# print(result_1_2)

# Add more tests as needed
pass