# AI/ML Data Structure Applications
## 🔴 Advanced Level - WGU MSSWEAIE Focus

**Goal**: Apply data structures in AI/ML contexts for software engineering

**Time**: ~50 minutes

**Competencies**: Data handling, model management, pipeline design

---

## Scenario 1: ML Model Registry

**Context**: Manage multiple machine learning models in production

**Tasks**:
1. Create a model registry with metadata
2. Track model performance metrics
3. Find best performing models by category
4. Identify models needing retraining
5. Manage model versions and deployments

In [None]:
# Scenario 1: ML Model Registry

# Model metadata
models_data = [
    {
        "name": "fraud_detector_v1",
        "type": "classification",
        "accuracy": 0.94,
        "precision": 0.89,
        "recall": 0.91,
        "last_trained": "2024-01-15",
        "features": ["transaction_amount", "merchant_type", "time_of_day", "user_history"],
        "status": "production"
    },
    {
        "name": "recommendation_engine_v2",
        "type": "recommendation",
        "accuracy": 0.87,
        "precision": 0.82,
        "recall": 0.85,
        "last_trained": "2024-02-10",
        "features": ["user_preferences", "item_features", "interaction_history"],
        "status": "production"
    },
    {
        "name": "price_predictor_v1",
        "type": "regression",
        "accuracy": 0.76,
        "precision": 0.74,
        "recall": 0.78,
        "last_trained": "2023-12-20",
        "features": ["market_conditions", "historical_prices", "volume"],
        "status": "deprecated"
    }
]

# TODO: Create model registry dictionary (name -> metadata)
model_registry = 

# TODO: Find best model by accuracy
best_model = None  # TODO: Use max() with key parameter on accuracy

# TODO: Group models by type
models_by_type = {}

# TODO: Find all unique features across models
all_features = None  # TODO: Collect all features from all models into a set

# TODO: Find models needing attention (accuracy < 0.85 or deprecated)
models_need_attention = None  # TODO: Filter models with low accuracy or deprecated status

print(f"Best model: {best_model}")
print(f"Models by type: {list(models_by_type.keys())}")
print(f"Total unique features: {len(all_features)}")
print(f"Models needing attention: {models_need_attention}")

## Scenario 2: Data Pipeline Management

**Context**: Manage data preprocessing pipelines for ML training

**Tasks**:
1. Track data quality metrics
2. Manage feature engineering steps
3. Monitor pipeline health
4. Handle data validation errors
5. Optimize pipeline performance

In [None]:
# Scenario 2: Data Pipeline Management

# Pipeline execution logs
pipeline_logs = [
    {"pipeline": "customer_data_prep", "step": "data_ingestion", "status": "success", "records": 10000, "errors": 0},
    {"pipeline": "customer_data_prep", "step": "data_cleaning", "status": "success", "records": 9987, "errors": 13},
    {"pipeline": "customer_data_prep", "step": "feature_engineering", "status": "success", "records": 9987, "errors": 0},
    {"pipeline": "fraud_detection_prep", "step": "data_ingestion", "status": "failed", "records": 0, "errors": 1},
    {"pipeline": "fraud_detection_prep", "step": "data_cleaning", "status": "skipped", "records": 0, "errors": 0},
    {"pipeline": "product_analytics", "step": "data_ingestion", "status": "success", "records": 5000, "errors": 0},
    {"pipeline": "product_analytics", "step": "data_cleaning", "status": "warning", "records": 4950, "errors": 50}
]

# TODO: Group logs by pipeline
logs_by_pipeline = {}

# TODO: Calculate success rate by pipeline
pipeline_health = {}

# TODO: Find pipelines with errors
error_pipelines = None  # TODO: Filter pipelines with errors > 0

# TODO: Calculate total data processed
total_records = None  # TODO: Sum all records from logs
total_errors = None  # TODO: Sum all errors from logs

# TODO: Find unique pipeline steps
all_steps = None  # TODO: Collect all unique step names

print(f"Pipeline health: {pipeline_health}")
print(f"Pipelines with errors: {error_pipelines}")
print(f"Total records processed: {total_records}")
print(f"Total errors: {total_errors}")
print(f"Pipeline steps: {all_steps}")