# CSRD Platform SDK - Interactive Tutorial

**Welcome to the CSRD Reporting Platform SDK Tutorial!**

This Jupyter notebook provides a hands-on introduction to the CSRD Platform Python SDK.

## What You'll Learn

1. **Quick Start** - Generate your first CSRD report in minutes
2. **Individual Agents** - Use each of the 6 agents separately
3. **Advanced Configuration** - Customize the pipeline for your needs
4. **Data Visualization** - Analyze ESG metrics with pandas and matplotlib
5. **Troubleshooting** - Common issues and solutions

## Prerequisites

- Python 3.11+
- All dependencies installed: `pip install -r requirements.txt`
- OpenAI API key (for materiality assessment): `export OPENAI_API_KEY='your-key'`

## About This Platform

The CSRD Platform transforms raw ESG data into submission-ready EU CSRD reports with:

- **6-Agent Pipeline**: Intake → Materiality → Calculate → Aggregate → Report → Audit
- **Zero Hallucination**: 100% accurate calculations (no LLM involvement)
- **<30 Minute Processing**: Complete CSRD report for 10,000+ data points
- **XBRL Digital Tagging**: ESEF-compliant submission packages

Let's get started! 🚀

---
## Part 1: Setup and Imports

First, let's import the necessary libraries and configure our environment.

In [None]:
# Standard library imports
import os
import sys
import json
from pathlib import Path
from datetime import datetime

# Add parent directory to path
sys.path.insert(0, str(Path.cwd().parent))

# Data analysis imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# CSRD SDK imports
from sdk.csrd_sdk import (
    csrd_build_report,
    csrd_validate_data,
    csrd_assess_materiality,
    csrd_calculate_metrics,
    CSRDConfig,
    CSRDReport
)

# Configure visualization
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print("✓ All imports successful!")
print(f"Working directory: {Path.cwd()}")

In [None]:
# Set up paths to demo data
BASE_DIR = Path.cwd().parent
DEMO_ESG_DATA = BASE_DIR / "examples" / "demo_esg_data.csv"
DEMO_COMPANY_PROFILE = BASE_DIR / "examples" / "demo_company_profile.json"
OUTPUT_DIR = BASE_DIR / "output" / "notebook_demo"

# Create output directory
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

print(f"✓ Demo data: {DEMO_ESG_DATA.name}")
print(f"✓ Company profile: {DEMO_COMPANY_PROFILE.name}")
print(f"✓ Output directory: {OUTPUT_DIR}")

In [None]:
# Check for API key (required for materiality assessment)
api_key = os.getenv("OPENAI_API_KEY")

if api_key:
    print("✓ OpenAI API key found")
    print(f"  Key starts with: {api_key[:8]}...")
else:
    print("⚠ OpenAI API key not found")
    print("  Set via: os.environ['OPENAI_API_KEY'] = 'your-key-here'")
    print("  Materiality assessment will be skipped")

In [None]:
# Helper function for pretty printing
def print_section(title):
    """Print a section header."""
    print("\n" + "=" * 80)
    print(title)
    print("=" * 80)

def print_dict(d, indent=0):
    """Pretty print a dictionary."""
    for key, value in d.items():
        if isinstance(value, dict):
            print("  " * indent + str(key) + ":")
            print_dict(value, indent + 1)
        else:
            print("  " * indent + f"{key}: {value}")

print("✓ Helper functions loaded")

---
## Part 2: Quick Start Example

Let's generate your first CSRD report using the one-function API!

In [None]:
# Load company profile to extract information
with open(DEMO_COMPANY_PROFILE, 'r') as f:
    company_data = json.load(f)

print_section("Company Profile")
print(f"Legal Name: {company_data.get('legal_name')}")
print(f"Country: {company_data.get('country')}")
print(f"Sector: {company_data.get('sector', {}).get('industry')}")
print(f"Employees: {company_data.get('company_size', {}).get('employee_count'):,}")
print(f"Revenue: €{company_data.get('company_size', {}).get('revenue_eur'):,.0f}")

In [None]:
# Create CSRD configuration
config = CSRDConfig(
    company_name=company_data.get('legal_name'),
    company_lei=company_data.get('lei_code'),
    reporting_year=company_data.get('reporting_period', {}).get('fiscal_year', 2024),
    sector=company_data.get('sector', {}).get('industry'),
    country=company_data.get('country'),
    employee_count=company_data.get('company_size', {}).get('employee_count'),
    revenue=company_data.get('company_size', {}).get('revenue_eur'),
    
    # LLM configuration
    llm_provider="openai",
    llm_model="gpt-4o",
    llm_api_key=os.getenv("OPENAI_API_KEY"),
    
    # Quality thresholds
    quality_threshold=0.80,
    impact_materiality_threshold=5.0,
    financial_materiality_threshold=5.0
)

print("✓ CSRD configuration created")
print(f"  Company: {config.company_name}")
print(f"  Reporting Year: {config.reporting_year}")
print(f"  Quality Threshold: {config.quality_threshold * 100}%")

In [None]:
# Generate complete CSRD report (one function call!)
print_section("Generating CSRD Report")
print("This will execute all 6 agents...")
print("Expected time: 2-5 minutes for demo data\n")

import time
start_time = time.time()

# Skip materiality if no API key
skip_mat = not bool(config.llm_api_key)
if skip_mat:
    print("⚠ Skipping materiality assessment (no API key)\n")

report = csrd_build_report(
    esg_data=str(DEMO_ESG_DATA),
    company_profile=str(DEMO_COMPANY_PROFILE),
    config=config,
    output_dir=str(OUTPUT_DIR),
    skip_materiality=skip_mat,
    skip_audit=False,
    verbose=True
)

elapsed = time.time() - start_time
print(f"\n✓ Report generated in {elapsed:.1f} seconds")

In [None]:
# Display report summary
print(report.summary())

In [None]:
# Access report properties
print_section("Report Properties")

print(f"Report ID: {report.report_id}")
print(f"Company: {report.company_info.get('legal_name')}")
print(f"Reporting Year: {report.reporting_period.get('year')}")
print(f"\nCompliance Status: {report.compliance_status.compliance_status}")
print(f"Is Compliant: {report.is_compliant}")
print(f"Is Audit Ready: {report.is_audit_ready}")
print(f"\nMaterial Topics: {report.materiality.material_topics_count}")
print(f"Material Standards: {', '.join(report.material_standards)}")
print(f"\nMetrics Calculated: {report.metrics.total_metrics_calculated}")
print(f"Processing Time: {report.processing_time_total_minutes:.1f} minutes")

---
## Part 3: Individual Agent Usage

Now let's explore using each agent individually. This gives you more control over the pipeline.

### Agent 1: Data Validation (IntakeAgent)

In [None]:
# Validate ESG data without generating a full report
validation_result = csrd_validate_data(
    esg_data=str(DEMO_ESG_DATA),
    company_profile=str(DEMO_COMPANY_PROFILE),
    config=config
)

print_section("Data Validation Results")
metadata = validation_result.get('metadata', {})
print(f"Total Records: {metadata.get('total_records')}")
print(f"Valid Records: {metadata.get('valid_records')}")
print(f"Invalid Records: {metadata.get('invalid_records')}")
print(f"Data Quality Score: {metadata.get('data_quality_score'):.1f}/100")
print(f"Quality Threshold Met: {metadata.get('quality_threshold_met')}")

In [None]:
# Inspect validation issues
issues = validation_result.get('validation_issues', [])
print(f"\nValidation Issues Found: {len(issues)}")

if issues:
    print("\nFirst 5 issues:")
    for i, issue in enumerate(issues[:5], 1):
        print(f"\n{i}. {issue.get('severity', 'unknown').upper()}")
        print(f"   Code: {issue.get('error_code')}")
        print(f"   Message: {issue.get('message')}")
        print(f"   Field: {issue.get('field')}")

### Agent 2: Materiality Assessment (MaterialityAgent)

**Note**: This requires an LLM API key and may take 5-10 minutes.

In [None]:
# Run materiality assessment (if API key is available)
if config.llm_api_key:
    print("Running AI-powered double materiality assessment...")
    print("This may take 5-10 minutes...\n")
    
    materiality_result = csrd_assess_materiality(
        esg_data=validation_result,
        company_context=str(DEMO_COMPANY_PROFILE),
        config=config
    )
    
    print_section("Materiality Assessment Results")
    stats = materiality_result.get('summary_statistics', {})
    print(f"Total Topics Assessed: {stats.get('total_topics_assessed')}")
    print(f"Material Topics: {stats.get('material_topics_count')}")
    print(f"Impact Material: {stats.get('material_from_impact')}")
    print(f"Financial Material: {stats.get('material_from_financial')}")
    print(f"Double Material: {stats.get('double_material_count')}")
    
    print(f"\nESRS Standards Triggered:")
    for std in stats.get('esrs_standards_triggered', []):
        print(f"  • {std}")
else:
    print("⚠ Skipping materiality assessment (no API key)")
    print("Set OPENAI_API_KEY to enable this feature")

### Agent 3: Metrics Calculation (CalculatorAgent)

**Zero Hallucination Guarantee**: All calculations are 100% deterministic.

In [None]:
# Calculate ESRS metrics
calc_result = csrd_calculate_metrics(
    validated_data=validation_result,
    metrics_to_calculate=["E1-1", "E1-2", "E1-3", "E1-4", "S1-1", "G1-1"],
    config=config
)

print_section("Metrics Calculation Results")
metadata = calc_result.get('metadata', {})
print(f"Metrics Requested: {metadata.get('total_metrics_requested')}")
print(f"Metrics Calculated: {metadata.get('metrics_calculated')}")
print(f"Calculation Errors: {metadata.get('calculation_errors')}")
print(f"Processing Time: {metadata.get('processing_time_seconds'):.3f} seconds")
print(f"Time per Metric: {metadata.get('ms_per_metric'):.2f} ms")
print(f"\nZero Hallucination: {metadata.get('zero_hallucination_guarantee')}")

In [None]:
# Inspect calculated metrics
metrics = calc_result.get('calculated_metrics', [])
print(f"\nCalculated Metrics ({len(metrics)} total):\n")

for metric in metrics:
    print(f"{metric.get('metric_code')}: {metric.get('metric_name')}")
    print(f"  Value: {metric.get('value')} {metric.get('unit')}")
    print(f"  Formula: {metric.get('formula_used')}")
    print(f"  Provenance: {metric.get('calculation_provenance', {}).get('method')}")
    print()

---
## Part 4: Data Visualization

Let's visualize some of the ESG metrics using pandas and matplotlib.

In [None]:
# Convert report data to DataFrame
df = report.to_dataframe()

print(f"ESG Data Shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst 5 rows:")
df.head()

In [None]:
# Summary statistics
print_section("Data Summary")
print(df.describe())

In [None]:
# Visualize GHG emissions breakdown
if report.metrics.scope_1_emissions_tco2e:
    emissions_data = {
        'Scope 1': report.metrics.scope_1_emissions_tco2e or 0,
        'Scope 2': report.metrics.scope_2_emissions_tco2e or 0,
        'Scope 3': report.metrics.scope_3_emissions_tco2e or 0
    }
    
    fig, ax = plt.subplots(figsize=(10, 6))
    colors = ['#FF6B6B', '#FFA07A', '#FFD700']
    bars = ax.bar(emissions_data.keys(), emissions_data.values(), color=colors)
    
    ax.set_ylabel('Emissions (tCO2e)', fontsize=12)
    ax.set_title('GHG Emissions Breakdown by Scope', fontsize=14, fontweight='bold')
    ax.grid(axis='y', alpha=0.3)
    
    # Add value labels
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:,.0f}',
                ha='center', va='bottom', fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    print(f"Total GHG Emissions: {report.metrics.total_ghg_emissions_tco2e:,.2f} tCO2e")

In [None]:
# Visualize data quality scores
if 'data_quality' in df.columns:
    quality_counts = df['data_quality'].value_counts()
    
    fig, ax = plt.subplots(figsize=(8, 8))
    colors = ['#4CAF50', '#FFC107', '#F44336']
    wedges, texts, autotexts = ax.pie(
        quality_counts.values,
        labels=quality_counts.index,
        autopct='%1.1f%%',
        colors=colors,
        startangle=90
    )
    
    ax.set_title('Data Quality Distribution', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

In [None]:
# Material standards distribution
if report.material_standards:
    fig, ax = plt.subplots(figsize=(10, 6))
    
    standards = report.material_standards
    metrics_by_std = report.metrics.metrics_by_standard
    
    x = range(len(standards))
    values = [metrics_by_std.get(std, 0) for std in standards]
    
    bars = ax.bar(x, values, color='#2196F3')
    ax.set_xticks(x)
    ax.set_xticklabels(standards)
    ax.set_ylabel('Number of Metrics', fontsize=12)
    ax.set_title('Metrics by Material ESRS Standard', fontsize=14, fontweight='bold')
    ax.grid(axis='y', alpha=0.3)
    
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{int(height)}',
                ha='center', va='bottom', fontweight='bold')
    
    plt.tight_layout()
    plt.show()

---
## Part 5: Advanced Configuration

Customize the pipeline for specific use cases.

### Loading Configuration from YAML

In [None]:
# Load configuration from YAML file
config_file = BASE_DIR / "config" / "csrd_config.yaml"

if config_file.exists():
    custom_config = CSRDConfig.from_yaml(str(config_file))
    print("✓ Configuration loaded from YAML")
    print(f"  Company: {custom_config.company_name}")
    print(f"  LLM Provider: {custom_config.llm_provider}")
    print(f"  LLM Model: {custom_config.llm_model}")
    print(f"  Quality Threshold: {custom_config.quality_threshold}")
else:
    print("⚠ Config file not found, using default configuration")

### Custom Materiality Thresholds

In [None]:
# Create config with custom thresholds
strict_config = CSRDConfig(
    company_name="Example Corp",
    company_lei="549300EXAMPLE123456",
    reporting_year=2024,
    sector="Manufacturing",
    
    # Stricter thresholds
    quality_threshold=0.95,  # 95% data quality required
    impact_materiality_threshold=7.0,  # Higher bar for impact
    financial_materiality_threshold=7.0,  # Higher bar for financial
)

print("Custom Configuration Created:")
print(f"  Quality Threshold: {strict_config.quality_threshold * 100}%")
print(f"  Impact Threshold: {strict_config.impact_materiality_threshold}/10")
print(f"  Financial Threshold: {strict_config.financial_materiality_threshold}/10")

### Using Alternative LLM Providers

In [None]:
# Configure for Anthropic Claude
anthropic_config = CSRDConfig(
    company_name="Example Corp",
    company_lei="549300EXAMPLE123456",
    reporting_year=2024,
    sector="Manufacturing",
    
    # Anthropic configuration
    llm_provider="anthropic",
    llm_model="claude-3-5-sonnet-20241022",
    llm_api_key=os.getenv("ANTHROPIC_API_KEY"),
)

print("Alternative LLM Configuration:")
print(f"  Provider: {anthropic_config.llm_provider}")
print(f"  Model: {anthropic_config.llm_model}")
print(f"  API Key Set: {bool(anthropic_config.llm_api_key)}")

---
## Part 6: Working with Report Outputs

### Saving Reports in Different Formats

In [None]:
# Save as JSON
json_path = OUTPUT_DIR / "report.json"
report.save_json(str(json_path))
print(f"✓ Saved to JSON: {json_path}")

# Save summary as Markdown
summary_path = OUTPUT_DIR / "summary.md"
report.save_summary(str(summary_path))
print(f"✓ Saved summary: {summary_path}")

# Get JSON string
json_str = report.to_json(indent=2)
print(f"\nJSON size: {len(json_str):,} characters")

### Accessing Raw Report Data

In [None]:
# Get raw report dictionary
raw_report = report.to_dict()

print("Raw Report Structure:")
for key in raw_report.keys():
    print(f"  • {key}")

# Access specific sections
print(f"\nReport Metadata:")
print_dict(raw_report.get('report_metadata', {}))

### Analyzing Compliance Results

In [None]:
# Detailed compliance analysis
compliance = report.compliance_status

print_section("Compliance Analysis")
print(f"Status: {compliance.compliance_status}")
print(f"\nRule Execution:")
print(f"  Total Rules: {compliance.total_rules_checked}")
print(f"  Passed: {compliance.rules_passed} ({compliance.rules_passed/compliance.total_rules_checked*100:.1f}%)")
print(f"  Failed: {compliance.rules_failed}")
print(f"  Warnings: {compliance.rules_warning}")
print(f"\nFailure Breakdown:")
print(f"  Critical: {compliance.critical_failures}")
print(f"  Major: {compliance.major_failures}")
print(f"  Minor: {compliance.minor_failures}")
print(f"\nAudit Readiness: {compliance.audit_ready}")

In [None]:
# Visualize compliance results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Overall compliance pie chart
labels = ['Passed', 'Failed', 'Warnings']
sizes = [compliance.rules_passed, compliance.rules_failed, compliance.rules_warning]
colors = ['#4CAF50', '#F44336', '#FFC107']
explode = (0.1, 0, 0)

ax1.pie(sizes, explode=explode, labels=labels, colors=colors,
        autopct='%1.1f%%', startangle=90)
ax1.set_title('Overall Compliance Status', fontweight='bold')

# Failure severity breakdown
failure_labels = ['Critical', 'Major', 'Minor']
failure_sizes = [compliance.critical_failures, compliance.major_failures, compliance.minor_failures]
failure_colors = ['#D32F2F', '#F57C00', '#FDD835']

ax2.bar(failure_labels, failure_sizes, color=failure_colors)
ax2.set_ylabel('Count', fontsize=12)
ax2.set_title('Failure Severity Breakdown', fontweight='bold')
ax2.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

---
## Part 7: Troubleshooting Tips

### Common Issues and Solutions

#### 1. Data Quality Issues

In [None]:
# Check data quality before processing
def check_data_quality(validation_result):
    """Check if data quality meets requirements."""
    metadata = validation_result.get('metadata', {})
    score = metadata.get('data_quality_score', 0)
    threshold = metadata.get('quality_threshold', 80)
    
    print(f"Data Quality Score: {score:.1f}/100")
    print(f"Required Threshold: {threshold}/100")
    
    if score < threshold:
        print("\n⚠ Data quality below threshold!")
        print("\nRecommendations:")
        print("1. Review validation issues")
        print("2. Fix data completeness gaps")
        print("3. Verify metric codes against ESRS catalog")
        print("4. Check unit consistency")
        return False
    else:
        print("\n✓ Data quality meets requirements")
        return True

check_data_quality(validation_result)

#### 2. API Key Issues

In [None]:
# Verify API key configuration
def check_api_keys():
    """Check API key availability."""
    openai_key = os.getenv("OPENAI_API_KEY")
    anthropic_key = os.getenv("ANTHROPIC_API_KEY")
    
    print("API Key Status:")
    print(f"  OpenAI: {'✓ Set' if openai_key else '✗ Not set'}")
    print(f"  Anthropic: {'✓ Set' if anthropic_key else '✗ Not set'}")
    
    if not openai_key and not anthropic_key:
        print("\n⚠ No LLM API keys found")
        print("\nTo enable materiality assessment:")
        print("  export OPENAI_API_KEY='your-key'")
        print("  # or")
        print("  export ANTHROPIC_API_KEY='your-key'")
        return False
    return True

check_api_keys()

#### 3. Performance Optimization

In [None]:
# Performance tips
print("Performance Optimization Tips:\n")
print("1. Skip materiality for faster processing (testing only):")
print("   skip_materiality=True")
print("\n2. Process data in batches for large datasets")
print("\n3. Use validation-only mode first:")
print("   result = csrd_validate_data(...)")
print("\n4. Cache intermediate results")
print("\n5. Expected performance:")
print("   - 1,000 data points: ~5 minutes")
print("   - 10,000 data points: ~15 minutes")
print("   - 50,000 data points: ~45 minutes")

---
## Part 8: Next Steps

### Further Learning

In [None]:
print("Congratulations! You've completed the CSRD Platform SDK tutorial.\n")
print("Next Steps:\n")
print("1. Read the User Guide:")
print("   docs/USER_GUIDE.md")
print("\n2. Explore the API Reference:")
print("   docs/API_REFERENCE.md")
print("\n3. Review deployment options:")
print("   docs/DEPLOYMENT_GUIDE.md")
print("\n4. Try with your own data:")
print("   • Prepare ESG data in CSV format")
print("   • Create company profile JSON")
print("   • Run the pipeline")
print("\n5. Join the community:")
print("   • GitHub Issues: Report bugs or request features")
print("   • Email: csrd@greenlang.io")
print("\nHappy reporting! 🚀")

---
## Appendix: Quick Reference

In [None]:
# Quick reference for common operations
print("CSRD Platform SDK - Quick Reference\n")
print("=" * 80)
print("\n1. GENERATE COMPLETE REPORT\n")
print("from sdk.csrd_sdk import csrd_build_report, CSRDConfig")
print("")
print("config = CSRDConfig(company_name='...', ...)")
print("report = csrd_build_report(")
print("    esg_data='data.csv',")
print("    company_profile='company.json',")
print("    config=config,")
print("    output_dir='output'")
print(")")
print("\n" + "=" * 80)
print("\n2. VALIDATE DATA ONLY\n")
print("from sdk.csrd_sdk import csrd_validate_data")
print("")
print("result = csrd_validate_data(")
print("    esg_data='data.csv',")
print("    config=config")
print(")")
print("\n" + "=" * 80)
print("\n3. ACCESS REPORT DATA\n")
print("report.summary()  # Text summary")
print("report.to_dataframe()  # Convert to pandas DataFrame")
print("report.save_json('report.json')  # Save to file")
print("report.is_compliant  # Check compliance status")
print("\n" + "=" * 80)