# Week 6 Python Exercises: String Methods & Text Processing
## Wednesday Python Class - String Methods
### Business Context: Product Categorization & Data Cleaning

---

## BUSINESS SCENARIO:
You work as a Data Analyst for "NaijaMart", an emerging Nigerian e-commerce platform inspired by successful models like Olist in Brazil. Your task is to clean and standardize product data, customer information, and business segments to improve data quality for analytics and reporting.

## DATASETS USED:
- product_categories.csv (Portuguese/English product categories)
- customer_reviews.csv (Customer review data)
- business_deals.csv (Marketing and business segment data)
- customer_data.csv (Customer location information)

## LEARNING OBJECTIVES:
✓ Master Python string methods (strip(), lower(), upper(), title())
✓ Use string methods for pattern detection (startswith(), endswith(), find())
✓ Apply string manipulation for data standardization
✓ Handle missing values and empty strings in text data
✓ Combine strings for data formatting

---

## Setup Instructions

1. Import necessary libraries (pandas, numpy)
2. Load the provided CSV datasets
3. Complete each exercise following the instructions
4. Test your solutions with the provided sample data

In [None]:
# Required Imports
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Sample data loading (replace with actual file paths)
# product_categories = pd.read_csv('datasets/product_categories.csv')
# customer_reviews = pd.read_csv('datasets/customer_reviews.csv')
# business_deals = pd.read_csv('datasets/business_deals.csv')
# customer_data = pd.read_csv('datasets/customer_data.csv')

---
# SECTION 1: BASIC STRING METHODS
Practice fundamental string manipulation methods
---

### Exercise 1.1: City Name Standardization

**Business Context:** Nigerian cities often have inconsistent capitalization in databases

**Task:** Standardize customer city names to proper case (title case)

**Instructions:**
1. Create a sample DataFrame with Brazilian city names in various cases
2. Apply different string methods to demonstrate:
   - Original city name
   - UPPERCASE version
   - lowercase version
   - Title Case (First Letter Capitalized)
   - Length of city name
3. Handle missing values appropriately
4. Return a DataFrame with all transformations

**Expected output columns:** ['original_city', 'uppercase', 'lowercase', 'title_case', 'length']

In [None]:
def exercise_1_1_city_standardization():
    """
    Exercise 1.1: City Name Standardization
    """
    
    # Sample data for testing
    sample_cities = [
        'são paulo', 'RIO DE JANEIRO', '  salvador  ', 'BRASÍLIA',
        'fortaleza', 'Belo Horizonte', None, '', 'recife'
    ]

    # YOUR CODE HERE:
    # 1. Create DataFrame with city data
    # 2. Apply string transformations
    # 3. Handle missing values
    # 4. Calculate string lengths
    # 5. Return cleaned DataFrame
    
    pass

# Test your function
# result_1_1 = exercise_1_1_city_standardization()
# print(result_1_1)

### Exercise 1.2: Product Category Cleanup

**Business Context:** Clean Portuguese product category names by removing extra spaces

**Task:** Use strip() method to clean category names

**Instructions:**
1. Create sample data with product categories that have leading/trailing spaces
2. Use strip() method to clean category names
3. Calculate the difference in length before and after cleaning
4. Identify categories where cleaning made a difference
5. Create a summary report

**Return:** DataFrame showing original, cleaned, and length differences

In [None]:
def exercise_1_2_product_category_cleanup():
    """
    Exercise 1.2: Product Category Cleanup
    """
    
    # Sample product categories with various spacing issues
    sample_categories = [
        '  beleza_saude  ', 'informatica_acessorios', '  moveis_decoracao',
        'esporte_lazer  ', '   utilidades_domesticas   ', 'telefonia',
        '', '  ', 'perfumaria'
    ]

    # YOUR CODE HERE:
    # 1. Create DataFrame with category data
    # 2. Apply strip() method
    # 3. Calculate length differences
    # 4. Filter categories where cleaning helped
    # 5. Return analysis results
    
    pass

# Test your function
# result_1_2 = exercise_1_2_product_category_cleanup()
# print(result_1_2)

### Exercise 1.3: Business Segment Analysis

**Business Context:** Analyze business segment naming patterns

**Task:** Analyze business segment data

**Instructions:**
1. Create sample business segment data
2. Count unique segments
3. Show each segment in uppercase
4. Calculate average length of segment names
5. Group results by first character
6. Create summary statistics

**Return:** Dictionary with analysis results

In [None]:
def exercise_1_3_business_segment_analysis():
    """
    Exercise 1.3: Business Segment Analysis
    """
    
    # Sample business segments
    sample_segments = [
        'pet', 'home_appliances', 'health_beauty', 'household_utilities',
        'construction_tools_house_garden', 'sports_leisure', 'food_drink',
        'pet', 'home_appliances', 'tech_accessories'
    ]

    # YOUR CODE HERE:
    # 1. Create DataFrame or work with list
    # 2. Count unique segments
    # 3. Convert to uppercase
    # 4. Calculate average lengths
    # 5. Group by first character
    # 6. Return comprehensive analysis
    
    pass

# Test your function
# result_1_3 = exercise_1_3_business_segment_analysis()
# print(result_1_3)

---
# SECTION 2: PATTERN MATCHING WITH STRING METHODS
Learn to find patterns using Python string methods
---

### Exercise 2.1: Finding Health & Beauty Products

**Business Context:** Find all product categories related to health and beauty

**Task:** Identify health and beauty product categories

**Instructions:**
1. Create sample data with Portuguese and English category names
2. Find categories containing 'beleza', 'saude', 'health', or 'beauty'
3. Use case-insensitive matching
4. Categorize findings by type (Beauty/Health/Both)
5. Return structured results

**Use string methods:** lower(), find(), in operator

In [None]:
def exercise_2_1_health_beauty_products():
    """
    Exercise 2.1: Finding Health & Beauty Products
    """
    
    # Sample category data
    sample_data = {
        'portuguese_name': [
            'beleza_saude', 'informatica_acessorios', 'perfumaria',
            'cuidados_pessoais', 'saude_beleza', 'casa_construcao'
        ],
        'english_name': [
            'health_beauty', 'computers_accessories', 'perfumery',
            'personal_care', 'health_beauty', 'home_construction'
        ]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from sample data
    # 2. Define health/beauty keywords
    # 3. Search in both languages (case-insensitive)
    # 4. Categorize matches
    # 5. Return filtered and categorized results
    
    pass

# Test your function
# result_2_1 = exercise_2_1_health_beauty_products()
# print(result_2_1)

### Exercise 2.2: Email Domain Analysis (Simulated)

**Business Context:** Analyze customer email patterns from simulated data

**Task:** Analyze email domain patterns

**Instructions:**
1. Create simulated email addresses from customer IDs
2. Extract domains using string methods
3. Find emails ending with specific domains
4. Identify emails with numbers in username
5. Find emails with long usernames (>20 characters)

**Use:** split(), endswith(), isdigit(), len()

In [None]:
def exercise_2_2_email_domain_analysis():
    """
    Exercise 2.2: Email Domain Analysis (Simulated)
    """
    
    # Sample customer IDs for email simulation
    customer_ids = [
        'customer123', 'user_abc_def', 'buyer789xyz', 'short',
        'very_long_customer_username_here', 'test456', 'admin',
        'customer_with_numbers_123_abc', 'simple_user'
    ]

    # YOUR CODE HERE:
    # 1. Generate simulated emails with different domains
    # 2. Extract username and domain parts
    # 3. Analyze domain distribution
    # 4. Find patterns in usernames
    # 5. Create comprehensive analysis
    
    pass

# Test your function
# result_2_2 = exercise_2_2_email_domain_analysis()
# print(result_2_2)

### Exercise 2.3: Review Content Filtering

**Business Context:** Find reviews containing specific Portuguese keywords

**Task:** Filter and analyze review content

**Instructions:**
1. Create sample review data
2. Search for keywords: 'recomendo', 'produto', 'entrega'
3. Find reviews with exclamation marks
4. Extract first 50 characters for preview
5. Analyze correlation with review scores

**Use:** lower(), find(), count(), slice notation

In [None]:
def exercise_2_3_review_content_filtering():
    """
    Exercise 2.3: Review Content Filtering
    """
    
    # Sample review data
    sample_reviews = {
        'review_id': range(1, 11),
        'score': [5, 4, 5, 3, 5, 4, 2, 5, 4, 5],
        'title': [
            'Recomendo!', 'Bom produto', None, 'Regular', 'Excelente!',
            'Produto ok', 'Ruim', 'Super recomendo', 'Boa entrega', 'Perfeito!'
        ],
        'message': [
            'Recomendo este produto para todos!',
            'Produto chegou bem, mas entrega demorou.',
            'Normal, nada excepcional.',
            'Produto com defeito, não recomendo.',
            'Excelente produto! Entrega rápida! Recomendo!',
            'Produto ok, entrega no prazo.',
            'Produto ruim, entrega atrasada.',
            'Super recomendo! Melhor produto que já comprei!',
            'Boa entrega, produto conforme descrito.',
            'Produto perfeito! Entrega rápida!'
        ]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from sample data
    # 2. Search for each keyword category
    # 3. Count exclamation marks
    # 4. Create message previews
    # 5. Analyze patterns by score
    
    pass

# Test your function
# result_2_3 = exercise_2_3_review_content_filtering()
# print(result_2_3)

---
# SECTION 3: STRING SLICING AND EXTRACTION
Extract specific parts of text data
---

### Exercise 3.1: State Code Extraction

**Business Context:** Extract state codes from Brazilian state abbreviations

**Task:** Extract and analyze state codes

**Instructions:**
1. Create sample customer data with state codes
2. Extract first 2 characters for state abbreviation
3. Map to full state names
4. Count customers by state
5. Create state distribution analysis

**Use:** slice notation [:2], dictionary mapping

In [None]:
def exercise_3_1_state_code_extraction():
    """
    Exercise 3.1: State Code Extraction
    """
    
    # Sample customer data
    sample_data = {
        'customer_id': range(1, 21),
        'customer_state': [
            'SP', 'RJ', 'MG', 'SP', 'RS', 'PR', 'SC', 'BA', 'GO', 'SP',
            'RJ', 'MG', 'RS', 'SP', 'PR', 'RJ', 'BA', 'SP', 'MG', 'GO'
        ],
        'customer_city': [
            'São Paulo', 'Rio de Janeiro', 'Belo Horizonte', 'Campinas',
            'Porto Alegre', 'Curitiba', 'Florianópolis', 'Salvador',
            'Goiânia', 'São Paulo', 'Niterói', 'Uberlândia', 'Caxias do Sul',
            'Ribeirão Preto', 'Londrina', 'Petrópolis', 'Feira de Santana',
            'Santo André', 'Juiz de Fora', 'Aparecida de Goiânia'
        ]
    }

    # State code to full name mapping
    state_mapping = {
        'SP': 'São Paulo', 'RJ': 'Rio de Janeiro', 'MG': 'Minas Gerais',
        'RS': 'Rio Grande do Sul', 'PR': 'Paraná', 'SC': 'Santa Catarina',
        'BA': 'Bahia', 'GO': 'Goiás'
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from sample data
    # 2. Extract state codes
    # 3. Map to full names
    # 4. Count by state
    # 5. Create distribution analysis
    
    pass

# Test your function
# result_3_1 = exercise_3_1_state_code_extraction()
# print(result_3_1)

### Exercise 3.2: Product Category Prefix Analysis

**Business Context:** Analyze product category naming patterns

**Task:** Extract and analyze category prefixes and suffixes

**Instructions:**
1. Create sample category data with underscores
2. Extract first word (before first underscore)
3. Extract last word (after last underscore)
4. Count frequency of prefixes and suffixes
5. Identify most common patterns

**Use:** split(), indexing [0] and [-1]

In [None]:
def exercise_3_2_category_prefix_analysis():
    """
    Exercise 3.2: Product Category Prefix Analysis
    """
    
    # Sample category names
    categories = {
        'portuguese': [
            'beleza_saude', 'casa_mesa_banho', 'moveis_decoracao',
            'esporte_lazer', 'informatica_acessorios', 'utilidades_domesticas',
            'perfumaria', 'telefonia', 'relogios_presentes',
            'casa_construcao', 'beleza_personal', 'esporte_fitness'
        ],
        'english': [
            'health_beauty', 'bed_bath_table', 'furniture_decor',
            'sports_leisure', 'computers_accessories', 'housewares',
            'perfumery', 'telephony', 'watches_gifts',
            'home_construction', 'beauty_personal', 'sports_fitness'
        ]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from category data
    # 2. Split categories by underscore
    # 3. Extract first and last words
    # 4. Count frequency patterns
    # 5. Identify common prefixes/suffixes
    
    pass

# Test your function
# result_3_2 = exercise_3_2_category_prefix_analysis()
# print(result_3_2)

### Exercise 3.3: Review Message Length Analysis

**Business Context:** Categorize reviews by message length

**Task:** Analyze review length patterns

**Instructions:**
1. Create sample review data with varying lengths
2. Categorize into: Short (1-50), Medium (51-200), Long (201+), Empty
3. Calculate statistics for each category
4. Show sample messages for each category
5. Analyze correlation with review scores

**Use:** len(), conditional logic, string slicing for samples

In [None]:
def exercise_3_3_review_length_analysis():
    """
    Exercise 3.3: Review Message Length Analysis
    """
    
    # Sample review messages of varying lengths
    sample_messages = [
        "Ótimo!",  # Short
        "Produto muito bom, recomendo para todos que estão procurando qualidade.",  # Medium
        "Este produto superou todas as minhas expectativas. A qualidade é excepcional, o atendimento foi perfeito, e a entrega foi muito rápida. Recomendo fortemente para qualquer pessoa que esteja considerando esta compra. Definitivamente voltarei a comprar desta loja.",  # Long
        None,  # Empty
        "",  # Empty
        "Regular",  # Short
        "O produto é bom mas a entrega demorou mais do que o esperado. O atendimento ao cliente foi satisfatório.",  # Medium
        "Produto de excelente qualidade! Superou todas as expectativas. A embalagem veio perfeita, sem nenhum dano. O produto funciona exatamente como descrito na página. O preço é justo pela qualidade oferecida. A entrega foi rápida e eficiente. O atendimento ao cliente foi excepcional, sempre prontos para ajudar. Recomendo fortemente esta compra. Com certeza voltarei a comprar nesta loja. Muito obrigado pela experiência positiva!",  # Long
        "Bom produto",  # Short
        "Produto conforme descrição, entrega no prazo, sem problemas na compra."  # Medium
    ]

    scores = [5, 4, 5, 3, 3, 3, 4, 5, 4, 4]

    # YOUR CODE HERE:
    # 1. Create DataFrame with messages and scores
    # 2. Calculate message lengths
    # 3. Categorize by length ranges
    # 4. Calculate statistics by category
    # 5. Create correlation analysis
    
    pass

# Test your function
# result_3_3 = exercise_3_3_review_length_analysis()
# print(result_3_3)

---
# SECTION 4: STRING CONCATENATION AND FORMATTING
Combine text data for standardized output
---

### Exercise 4.1: Full Address Formatting

**Business Context:** Create standardized address strings

**Task:** Format addresses consistently

**Instructions:**
1. Create sample customer address data
2. Format as: "City, State ZIP"
3. Handle different capitalization cases
4. Ensure state codes are uppercase
5. Pad ZIP codes to 5 digits with leading zeros

**Use:** title(), upper(), zfill(), f-strings or format()

In [None]:
def exercise_4_1_address_formatting():
    """
    Exercise 4.1: Full Address Formatting
    """
    
    # Sample address data
    address_data = {
        'customer_id': range(1, 11),
        'city': [
            'são paulo', 'RIO DE JANEIRO', 'belo horizonte', 'PORTO ALEGRE',
            'curitiba', 'SALVADOR', 'brasília', 'fortaleza', 'RECIFE', 'manaus'
        ],
        'state': ['sp', 'rj', 'mg', 'rs', 'pr', 'ba', 'df', 'ce', 'pe', 'am'],
        'zip_code': [1310, 20040, 30112, 90040, 80060, 40070, 70040, 60160, 50070, 69005]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from address data
    # 2. Standardize city names (title case)
    # 3. Ensure state codes are uppercase
    # 4. Format ZIP codes with leading zeros
    # 5. Create formatted address strings
    
    pass

# Test your function
# result_4_1 = exercise_4_1_address_formatting()
# print(result_4_1)

### Exercise 4.2: Business Profile Summary

**Business Context:** Create business profile descriptions

**Task:** Generate formatted business summaries

**Instructions:**
1. Create sample business data
2. Format as: "[SEGMENT] business operating as [TYPE] through [LEAD_TYPE] channels"
3. Convert all text to uppercase
4. Replace underscores with spaces
5. Handle missing values appropriately

**Use:** upper(), replace(), format(), conditional logic

In [None]:
def exercise_4_2_business_profile_summary():
    """
    Exercise 4.2: Business Profile Summary
    """
    
    # Sample business data
    business_data = {
        'business_segment': [
            'health_beauty', 'home_appliances', 'sports_leisure',
            'food_drink', 'tech_accessories', None
        ],
        'business_type': [
            'manufacturer', 'reseller', 'reseller',
            'manufacturer', 'reseller', 'distributor'
        ],
        'lead_type': [
            'online_medium', 'online_big', 'offline',
            'online_medium', 'industry', 'online_small'
        ]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from business data
    # 2. Handle missing values
    # 3. Convert to uppercase
    # 4. Replace underscores with spaces
    # 5. Create formatted descriptions
    
    pass

# Test your function
# result_4_2 = exercise_4_2_business_profile_summary()
# print(result_4_2)

### Exercise 4.3: Review Summary Creation

**Business Context:** Create standardized review summaries

**Task:** Generate formatted review summaries

**Instructions:**
1. Create sample review data
2. Format as: "[SCORE]/5 stars - [TITLE]: [FIRST_50_CHARS]..."
3. Handle missing titles (use "No Title")
4. Handle missing messages (use "No Comment")
5. Truncate long messages with ellipsis

**Use:** f-strings, conditional expressions, string slicing

In [None]:
def exercise_4_3_review_summary_creation():
    """
    Exercise 4.3: Review Summary Creation
    """
    
    # Sample review data
    review_data = {
        'review_id': range(1, 8),
        'score': [5, 4, 5, 3, 5, 4, 2],
        'title': [
            'Excelente!', 'Bom produto', None, 'Regular',
            'Super recomendo', None, 'Não gostei'
        ],
        'message': [
            'Produto excepcional, recomendo para todos!',
            'Bom produto, entrega rápida.',
            None,
            'Produto ok, nada excepcional mas serve.',
            'Melhor compra que já fiz! Produto incrível com qualidade excepcional e entrega super rápida!',
            'Produto conforme descrito.',
            ''
        ]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from review data
    # 2. Handle missing titles and messages
    # 3. Truncate long messages appropriately
    # 4. Create formatted summary strings
    # 5. Return DataFrame with summaries
    
    pass

# Test your function
# result_4_3 = exercise_4_3_review_summary_creation()
# print(result_4_3)

---
# SECTION 5: REAL-WORLD BUSINESS SCENARIOS
Apply string processing to solve business problems
---

### Exercise 5.1: Product Category Standardization Project

**Business Context:** Create a mapping table for category standardization

**Task:** Standardize inconsistent category names

**Instructions:**
1. Create sample data with inconsistent category names
2. Standardize names using these rules:
   - All lowercase
   - Replace spaces with underscores
   - Remove special characters
   - Combine similar categories
3. Create frequency analysis
4. Suggest consolidation opportunities

**Use:** lower(), replace()

In [None]:
def exercise_5_1_category_standardization():
    """
    Exercise 5.1: Product Category Standardization Project
    """
    
    # Sample categories with inconsistencies
    sample_categories = [
        'Beleza & Saúde', 'beleza_saude', 'BELEZA_SAUDE', 'Beleza/Saúde',
        'Informática & Acessórios', 'informatica_acessorios', 'Tech Accessories',
        'Casa & Construção', 'casa_construcao', 'Home & Garden',
        'Esporte & Lazer', 'sports_leisure', 'ESPORTE_LAZER',
        'Móveis & Decoração', 'moveis_decoracao', 'Furniture',
        'Telefonia', 'TELEFONIA', 'Telephones'
    ]

    # YOUR CODE HERE:
    # 1. Create standardization function
    # 2. Apply to all categories
    # 3. Count frequencies
    # 4. Identify potential duplicates
    # 5. Create consolidation recommendations
    
    pass

# Test your function
# result_5_1 = exercise_5_1_category_standardization()
# print(result_5_1)

### Exercise 5.2: Customer Communication Personalization

**Business Context:** Generate personalized greeting messages

**Task:** Create personalized customer greetings

**Instructions:**
1. Create sample customer data with cities
2. Generate greetings based on city:
   - São Paulo: "Olá [Name], bem-vindo de São Paulo!"
   - Rio: "Oi [Name], saudações do Rio de Janeiro!"
   - Others: "Olá [Name], obrigado por escolher NaijaMart!"
3. Calculate character counts and distribution
4. Analyze greeting effectiveness

**Use:** string formatting, conditional logic, len()

In [None]:
def exercise_5_2_customer_personalization():
    """
    Exercise 5.2: Customer Communication Personalization
    """
    
    # Sample customer data
    customer_data = {
        'customer_id': [f'CUST_{i:04d}' for i in range(1, 21)],
        'customer_name': [
            'Ana Silva', 'João Santos', 'Maria Oliveira', 'Pedro Costa',
            'Lucia Ferreira', 'Carlos Souza', 'Fernanda Lima', 'Roberto Alves',
            'Patricia Rocha', 'Miguel Torres', 'Juliana Gomes', 'Rafael Martins',
            'Camila Barbosa', 'Diego Cardoso', 'Amanda Pereira', 'Bruno Dias',
            'Natalia Nascimento', 'Thiago Ribeiro', 'Isabela Campos', 'Leonardo Freitas'
        ],
        'city': [
            'São Paulo', 'Rio de Janeiro', 'Belo Horizonte', 'São Paulo',
            'Porto Alegre', 'São Paulo', 'Rio de Janeiro', 'Salvador',
            'Brasília', 'São Paulo', 'Rio de Janeiro', 'Curitiba',
            'Fortaleza', 'São Paulo', 'Rio de Janeiro', 'Recife',
            'Manaus', 'São Paulo', 'Goiânia', 'Belém'
        ]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from customer data
    # 2. Generate personalized greetings
    # 3. Calculate greeting statistics
    # 4. Analyze distribution by greeting type
    # 5. Create effectiveness metrics
    
    pass

# Test your function
# result_5_2 = exercise_5_2_customer_personalization()
# print(result_5_2)

### Exercise 5.3: Marketing Segment Analysis

**Business Context:** Analyze business segment patterns for marketing insights

**Task:** Generate marketing intelligence from segment data

**Instructions:**
1. Create sample business segment data
2. Identify segments containing keywords:
   - Home/house related
   - Health/beauty related
   - Technology related
3. Calculate segment statistics
4. Generate marketing insights
5. Create targeting recommendations

**Use:** string methods, conditional logic, aggregations

In [None]:
def exercise_5_3_marketing_intelligence():
    """
    Exercise 5.3: Marketing Segment Analysis
    """
    
    # Sample business data with segments
    business_data = {
        'segment': [
            'home_appliances', 'health_beauty', 'computer_accessories',
            'house_garden', 'beauty_personal', 'tech_gadgets',
            'home_furniture', 'health_supplements', 'mobile_phones',
            'household_utilities', 'beauty_cosmetics', 'computer_hardware',
            'home_decor', 'health_fitness', 'tech_accessories',
            'house_construction', 'beauty_tools', 'electronics'
        ],
        'business_type': [
            'reseller', 'manufacturer', 'reseller', 'manufacturer',
            'reseller', 'distributor', 'manufacturer', 'reseller',
            'reseller', 'manufacturer', 'reseller', 'distributor',
            'reseller', 'manufacturer', 'reseller', 'manufacturer',
            'reseller', 'distributor'
        ],
        'monthly_revenue': [
            15000, 25000, 8000, 35000, 12000, 18000,
            28000, 22000, 16000, 20000, 14000, 30000,
            19000, 26000, 11000, 40000, 13000, 24000
        ]
    }

    # YOUR CODE HERE:
    # 1. Create DataFrame from business data
    # 2. Categorize segments by keywords
    # 3. Calculate statistics by category
    # 4. Analyze revenue patterns
    # 5. Generate marketing insights
    
    pass

# Test your function
# result_5_3 = exercise_5_3_marketing_intelligence()
# print(result_5_3)

---
# BONUS CHALLENGES
Advanced string processing scenarios
---

### Bonus 1: Data Quality Score

**Challenge:** Create a data quality score for text fields based on:
- Completeness (not null/empty)
- Format consistency
- Character validity
- Length appropriateness

Score from 1-10 where 10 is highest quality

In [None]:
def bonus_1_data_quality_score():
    """
    Bonus 1: Data Quality Score
    """
    
    # Sample data with quality issues
    sample_data = {
        'category_name': [
            'health_beauty', 'TECH@ACCESSORIES!', '', None,
            'home_garden', 'sports123leisure', 'valid_category',
            '  spaced_category  ', 'special#chars%', 'good_name'
        ]
    }

    # YOUR CODE HERE:
    # Create quality scoring function and apply to data
    
    pass

# Test your function
# result_bonus_1 = bonus_1_data_quality_score()
# print(result_bonus_1)

### Bonus 2: Simple Fuzzy Matching

**Challenge:** Implement basic similarity scoring between strings using:
- Character overlap
- Length similarity
- Common substring detection

In [None]:
def bonus_2_fuzzy_matching():
    """
    Bonus 2: Simple Fuzzy Matching
    """
    
    categories = ['health_beauty', 'helth_beauty', 'health_beuty', 'tech_accessories']

    # YOUR CODE HERE:
    # Create similarity function and find matches
    
    pass

# Test your function
# result_bonus_2 = bonus_2_fuzzy_matching()
# print(result_bonus_2)

---
# SUBMISSION CHECKLIST
---

Before submitting, ensure your code:
✓ Handles missing values appropriately (None, '', NaN)
✓ Uses appropriate string methods for each task
✓ Includes comments explaining complex logic
✓ Follows consistent code formatting
✓ Produces realistic results for business scenarios
✓ Includes proper error handling where needed

## GRADING CRITERIA:
- Code correctness and functionality (40%)
- Proper use of string methods (30%)
- Business context understanding (20%)
- Code quality and documentation (10%)

## SUBMISSION FORMAT:
- Save as: your_name_python_string_exercises.ipynb
- Include all function implementations
- Add your name and date at the top of the notebook
- Test all functions with the provided sample data

## TESTING YOUR CODE:
Run each function and verify outputs make sense.
Check that edge cases (None, empty strings) are handled.
Ensure results align with business requirements.

In [None]:
# Test your functions here
print("Testing Exercise 1.1...")
# result_1_1 = exercise_1_1_city_standardization()
# print(result_1_1)

print("\nTesting Exercise 1.2...")
# result_1_2 = exercise_1_2_product_category_cleanup()
# print(result_1_2)

# Add more tests as needed
pass