# AutoGen Agent Deep Dive

This notebook explores the individual agents in the data analysis pipeline, their capabilities, and how they work together.

## Agent Architecture

Each agent inherits from `AutoGenAMPAgent` which combines:
- **AutoGen ConversableAgent**: Conversational AI capabilities
- **AMP Protocol Integration**: Standardized communication
- **Specialized Capabilities**: Domain-specific analysis functions


In [None]:
# Setup and imports
import sys
import os
import asyncio
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Add project modules to path
notebook_dir = Path.cwd()
project_root = notebook_dir.parent
sys.path.append(str(project_root))
sys.path.append(str(project_root / 'agents'))
sys.path.append(str(project_root / '../shared-lib'))

print(f"Project root: {project_root}")

In [None]:
# Import agent classes
from agents.data_collector import DataCollectorAgent
from agents.data_cleaner import DataCleanerAgent
from agents.statistical_analyst import StatisticalAnalystAgent
from agents.ml_analyst import MLAnalystAgent
from agents.visualization_agent import VisualizationAgent
from agents.quality_assurance import QualityAssuranceAgent

from amp_client import AMPClientConfig
from amp_types import TransportType

print("Agent classes imported successfully")

## Agent Configuration

Let's create configurations for each agent:

In [None]:
# Base LLM configuration
llm_config = {
    "config_list": [
        {
            "model": "gpt-4",
            "api_key": os.environ.get("OPENAI_API_KEY", "demo-key"),
            "api_type": "openai"
        }
    ]
}

# Base AMP configuration
def create_amp_config(agent_type: str) -> AMPClientConfig:
    return AMPClientConfig(
        agent_id=f"notebook_{agent_type}",
        agent_name=f"Notebook {agent_type.title()}",
        framework="autogen",
        version="1.0.0",
        transport_type=TransportType.HTTP,
        endpoint="http://localhost:8000",
        auto_reconnect=True,
        log_level="INFO"
    )

print("Configuration functions created")

## 1. Data Collector Agent

Responsible for ingesting data from various sources:
- File data (CSV, JSON, Excel, Parquet)
- Database connections (SQLite, PostgreSQL, MySQL)
- API endpoints (REST APIs)
- Data source validation

In [None]:
# Create Data Collector Agent
data_collector = DataCollectorAgent(
    amp_config=create_amp_config("data_collector"),
    llm_config=llm_config
)

print(f"Data Collector Agent: {data_collector.name}")
print(f"Capabilities: {list(data_collector.capabilities.keys())}")
print(f"Description: {data_collector.description}")
print(f"Tags: {data_collector.tags}")

# Show file readers
print(f"\nSupported file types: {list(data_collector.file_readers.keys())}")
print(f"Collection metrics: {data_collector.collection_metrics}")

In [None]:
# Explore Data Collector capabilities in detail
for cap_id, capability in data_collector.capabilities.items():
    print(f"\nCapability: {cap_id}")
    print(f"  Description: {capability.description}")
    print(f"  Category: {capability.category}")
    print(f"  Input Schema: {list(capability.input_schema['properties'].keys())}")
    print(f"  Response Time: {capability.constraints.response_time_ms}ms")

## 2. Data Cleaner Agent

Handles data preprocessing and quality improvement:
- Missing value detection and imputation
- Outlier detection and handling
- Data normalization and scaling
- Duplicate removal
- Data quality assessment

In [None]:
# Create Data Cleaner Agent
data_cleaner = DataCleanerAgent(
    amp_config=create_amp_config("data_cleaner"),
    llm_config=llm_config
)

print(f"Data Cleaner Agent: {data_cleaner.name}")
print(f"Capabilities: {list(data_cleaner.capabilities.keys())}")
print(f"Description: {data_cleaner.description}")
print(f"\nCleaning configuration: {data_cleaner.cleaning_config}")
print(f"Cleaning metrics: {data_cleaner.cleaning_metrics}")

## 3. Statistical Analyst Agent

Performs statistical analysis and hypothesis testing:
- Descriptive statistics and distributions
- Hypothesis testing (t-tests, ANOVA, chi-square)
- Correlation and association analysis
- Regression analysis with diagnostics
- Distribution testing and normality checks

In [None]:
# Create Statistical Analyst Agent
statistical_analyst = StatisticalAnalystAgent(
    amp_config=create_amp_config("statistical_analyst"),
    llm_config=llm_config
)

print(f"Statistical Analyst Agent: {statistical_analyst.name}")
print(f"Capabilities: {list(statistical_analyst.capabilities.keys())}")
print(f"Description: {statistical_analyst.description}")
print(f"\nAnalysis configuration: {statistical_analyst.analysis_config}")
print(f"Analysis metrics: {statistical_analyst.analysis_metrics}")

## 4. ML Analyst Agent

Handles machine learning and predictive modeling:
- Automated feature engineering and selection
- Model training with multiple algorithms
- Model evaluation and comparison
- Predictions with confidence estimates
- Feature importance analysis

In [None]:
# Create ML Analyst Agent
ml_analyst = MLAnalystAgent(
    amp_config=create_amp_config("ml_analyst"),
    llm_config=llm_config
)

print(f"ML Analyst Agent: {ml_analyst.name}")
print(f"Capabilities: {list(ml_analyst.capabilities.keys())}")
print(f"Description: {ml_analyst.description}")
print(f"\nML configuration: {ml_analyst.ml_config}")
print(f"\nClassification algorithms: {list(ml_analyst.classification_algorithms.keys())}")
print(f"Regression algorithms: {list(ml_analyst.regression_algorithms.keys())}")
print(f"ML metrics: {ml_analyst.ml_metrics}")

## 5. Visualization Agent

Creates charts, dashboards, and reports:
- Statistical plots (histograms, boxplots, scatter plots)
- Model performance visualizations
- Correlation and relationship plots
- Interactive dashboards
- Comprehensive analysis reports

In [None]:
# Create Visualization Agent
visualization_agent = VisualizationAgent(
    amp_config=create_amp_config("visualization"),
    llm_config=llm_config
)

print(f"Visualization Agent: {visualization_agent.name}")
print(f"Capabilities: {list(visualization_agent.capabilities.keys())}")
print(f"Description: {visualization_agent.description}")
print(f"\nVisualization configuration: {visualization_agent.viz_config}")
print(f"Visualization metrics: {visualization_agent.viz_metrics}")

## 6. Quality Assurance Agent

Validates results and ensures accuracy:
- Data quality validation and integrity checks
- Model performance validation and benchmarking
- Statistical test validation and assumption checking
- Pipeline audit and compliance checking
- Result consistency verification

In [None]:
# Create Quality Assurance Agent
qa_agent = QualityAssuranceAgent(
    amp_config=create_amp_config("qa"),
    llm_config=llm_config
)

print(f"QA Agent: {qa_agent.name}")
print(f"Capabilities: {list(qa_agent.capabilities.keys())}")
print(f"Description: {qa_agent.description}")
print(f"\nQA configuration: {qa_agent.qa_config}")
print(f"QA standards: {qa_agent.qa_standards}")
print(f"QA metrics: {qa_agent.qa_metrics}")

## Agent Communication Patterns

The agents communicate through:
1. **AMP Protocol Messages**: Structured capability invocations
2. **AutoGen Conversations**: Natural language discussions
3. **Shared Artifacts**: Data and results passed between agents

In [None]:
# Example conversation flow
agents = {
    "Data Collector": data_collector,
    "Data Cleaner": data_cleaner,
    "Statistical Analyst": statistical_analyst,
    "ML Analyst": ml_analyst,
    "Visualization": visualization_agent,
    "Quality Assurance": qa_agent
}

print("Agent Communication Flow:")
print("1. Data Collector → Ingests data → Provides dataset")
print("2. Data Cleaner → Receives dataset → Cleans and validates → Provides clean dataset")
print("3. Statistical Analyst → Receives clean dataset → Performs analysis → Provides statistical insights")
print("4. ML Analyst → Receives clean dataset → Builds models → Provides ML results")
print("5. Visualization → Receives all results → Creates visualizations → Provides charts and reports")
print("6. Quality Assurance → Validates all outputs → Provides quality assessment")

print("\nAMP Message Types:")
for agent_name, agent in agents.items():
    print(f"\n{agent_name}:")
    for cap_id in agent.capabilities.keys():
        print(f"  - {cap_id}")

## Conversation Examples

Here are examples of how agents would respond to natural language queries:

In [None]:
# Simulate conversation responses
conversation_examples = {
    "Data Collector": {
        "query": "Can you collect data from a CSV file?",
        "response": data_collector._process_conversation_message(
            "collect data from file.csv", None, []
        )
    },
    "Data Cleaner": {
        "query": "How can you clean missing values?",
        "response": data_cleaner._process_conversation_message(
            "handle missing values", None, []
        )
    },
    "Statistical Analyst": {
        "query": "Can you perform correlation analysis?",
        "response": statistical_analyst._process_conversation_message(
            "correlation analysis", None, []
        )
    },
    "ML Analyst": {
        "query": "What machine learning models can you build?",
        "response": ml_analyst._process_conversation_message(
            "train model", None, []
        )
    },
    "Visualization": {
        "query": "Can you create a dashboard?",
        "response": visualization_agent._process_conversation_message(
            "dashboard", None, []
        )
    },
    "Quality Assurance": {
        "query": "How do you validate data quality?",
        "response": qa_agent._process_conversation_message(
            "validate data", None, []
        )
    }
}

for agent_name, example in conversation_examples.items():
    print(f"\n{agent_name.upper()}:")
    print(f"Query: {example['query']}")
    print(f"Response: {example['response'][:200]}...")
    print("-" * 50)

## Artifact Management

Agents store and share data artifacts throughout the pipeline:

In [None]:
# Demonstrate artifact storage
import pandas as pd

# Create sample data
sample_data = pd.DataFrame({
    'feature1': np.random.normal(0, 1, 100),
    'feature2': np.random.normal(2, 1.5, 100),
    'target': np.random.choice([0, 1], 100)
})

# Store artifact in data collector
data_collector.store_artifact(
    "sample_dataset", 
    sample_data, 
    {"source": "synthetic", "rows": 100, "columns": 3}
)

print(f"Artifacts stored in Data Collector: {data_collector.list_artifacts()}")

# Retrieve artifact
retrieved_data = data_collector.get_artifact("sample_dataset")
print(f"Retrieved data shape: {retrieved_data.shape}")
print(f"Retrieved data preview:\n{retrieved_data.head()}")

## Agent Performance Metrics

Each agent tracks its performance and activity:

In [None]:
# Show agent metrics
agent_metrics = {}

for agent_name, agent in agents.items():
    if hasattr(agent, 'collection_metrics'):
        agent_metrics[agent_name] = agent.collection_metrics
    elif hasattr(agent, 'cleaning_metrics'):
        agent_metrics[agent_name] = agent.cleaning_metrics
    elif hasattr(agent, 'analysis_metrics'):
        agent_metrics[agent_name] = agent.analysis_metrics
    elif hasattr(agent, 'ml_metrics'):
        agent_metrics[agent_name] = agent.ml_metrics
    elif hasattr(agent, 'viz_metrics'):
        agent_metrics[agent_name] = agent.viz_metrics
    elif hasattr(agent, 'qa_metrics'):
        agent_metrics[agent_name] = agent.qa_metrics

# Display metrics
for agent_name, metrics in agent_metrics.items():
    print(f"\n{agent_name} Metrics:")
    for metric, value in metrics.items():
        print(f"  {metric}: {value}")

## Agent Health and Status

Agents provide health check information:

In [None]:
# Simulate health check (without actual AMP connection)
for agent_name, agent in agents.items():
    print(f"\n{agent_name} Status:")
    print(f"  Agent ID: {agent.amp_config.agent_id}")
    print(f"  Framework: {agent.amp_config.framework}")
    print(f"  Capabilities: {len(agent.capabilities)}")
    print(f"  Artifacts stored: {len(agent.list_artifacts())}")
    print(f"  Context size: {len(agent._conversation_context)}")
    
    # Show system message preview
    system_msg = agent.system_message[:100] + "..." if len(agent.system_message) > 100 else agent.system_message
    print(f"  System message: {system_msg}")

## Agent Customization

Agents can be customized for specific use cases:

In [None]:
# Example: Custom ML Analyst with specific algorithms
custom_ml_config = {
    "default_test_size": 0.3,
    "cv_folds": 10,
    "random_state": 123,
    "max_features_auto": 20
}

custom_ml_analyst = MLAnalystAgent(
    amp_config=create_amp_config("custom_ml"),
    llm_config=llm_config,
    ml_config=custom_ml_config
)

print(f"Custom ML Analyst configuration: {custom_ml_analyst.ml_config}")

# Example: Custom Data Cleaner with specific strategies
custom_cleaning_config = {
    "default_missing_strategy": "knn",
    "outlier_threshold": 2.5,
    "duplicate_threshold": 0.99
}

custom_data_cleaner = DataCleanerAgent(
    amp_config=create_amp_config("custom_cleaner"),
    llm_config=llm_config,
    cleaning_config=custom_cleaning_config
)

print(f"Custom Data Cleaner configuration: {custom_data_cleaner.cleaning_config}")

## Next Steps

### Extending Agents

You can extend agents by:
1. **Adding new capabilities**: Implement new AMP capabilities
2. **Customizing system messages**: Adjust agent personalities and instructions
3. **Modifying configurations**: Change default parameters and thresholds
4. **Adding domain logic**: Include domain-specific analysis methods

### Creating New Agents

To create a new agent:
1. Inherit from `AutoGenAMPAgent`
2. Define capabilities and their handlers
3. Implement `_process_conversation_message`
4. Add agent-specific configuration and metrics

### Integration Patterns

Agents can be integrated in different patterns:
- **Sequential Pipeline**: Linear flow from collection to reporting
- **Parallel Processing**: Multiple agents working simultaneously
- **Iterative Refinement**: Agents providing feedback to improve results
- **Conditional Workflows**: Different paths based on data characteristics

Explore the other notebooks for more specific examples and use cases.