# Introduction
# Exercise 4: LCA Data Validation Framework

## Overview
This exercise focuses on implementing and using AI-assisted validation techniques for Life Cycle Assessment (LCA) data. You'll learn to create robust validation pipelines that combine rule-based checks with LLM-powered analysis to ensure data quality and reliability.

## Learning Objectives
After completing this exercise, you will be able to:
- Implement comprehensive data validation frameworks for LCA studies
- Use LLMs to validate complex environmental impact calculations
- Design effective validation prompts for specific LCA use cases
- Apply best practices for result verification and quality assurance

## Prerequisites
- Completed Exercises 1-3
- Understanding of LCA methodology and impact categories
- Familiarity with basic prompt engineering concepts

## Exercise Structure (25 minutes)

# Part 1: Using LLMs for LCA Data Validation
**Time: 5 minutes**

## Overview
Learn how to leverage Large Language Models (LLMs) to validate Life Cycle Assessment (LCA) data by crafting effective validation prompts.

## Key Validation Tasks for LLMs

### 1. Data Completeness Validation
```python
# Example prompt template for checking data completeness
COMPLETENESS_PROMPT = """You are an expert in Life Cycle Assessment (LCA) according to ISO 14040/14044 standards.
Analyze this EPD data for completeness:

{epd_data}

For each impact category:
1. Check if all required values are present
2. Verify units are provided
3. Identify any missing metadata

Respond in JSON format:
{
  "completeness_status": boolean,
  "missing_elements": [...],
  "recommendations": [...]
}
"""

# Usage example
response = await claude.validate(COMPLETENESS_PROMPT.format(
    epd_data=your_epd_json
))
```

### 2. Methodology Validation
```python
# Example prompt for validating LCA methodology
METHODOLOGY_PROMPT = """As an LCA expert, validate this methodology description:

{methodology_description}

Check:
1. ISO 14040/14044 compliance
2. System boundary completeness
3. Allocation procedures
4. Cut-off criteria

Respond in JSON with:
- Compliance status
- Issues found
- Required corrections
"""
```

### 3. Unit Consistency Check
```python
# Example prompt for unit validation
UNIT_CHECK_PROMPT = """Review these LCA values for unit consistency:

{impact_values}

Tasks:
1. Verify units follow ISO standards
2. Check conversion accuracy
3. Flag any inconsistencies
4. Suggest standardized units

Respond with identified issues and corrections needed.
"""
```

### 4. Uncertainty Analysis
```python
UNCERTAINTY_PROMPT = """Analyze uncertainty in these LCA results:

{lca_results}

Consider:
1. Data quality indicators
2. Uncertainty ranges
3. Confidence levels
4. Critical assumptions

Provide structured feedback on uncertainty assessment.
"""
```

## Example Implementation

```python
class LLMValidator:
    """Uses LLMs to validate LCA data."""
    
    def __init__(self, model="claude-3-5-sonnet"):
        self.model = model
        self.load_prompts()
    
    async def validate_epd(self, epd_data: dict) -> dict:
        """Complete EPD validation using LLM."""
        
        # 1. Check completeness
        completeness = await self.check_completeness(epd_data)
        
        # 2. Validate methodology
        if "methodology" in epd_data:
            methodology = await self.validate_methodology(
                epd_data["methodology"]
            )
        
        # 3. Check units
        units = await self.validate_units(epd_data)
        
        # 4. Analyze uncertainty
        uncertainty = await self.assess_uncertainty(epd_data)
        
        # Compile results
        return {
            "completeness": completeness,
            "methodology": methodology,
            "units": units,
            "uncertainty": uncertainty,
            "timestamp": datetime.now().isoformat()
        }
```

## Key Principles for LLM Validation

1. **Structured Prompts**
   - Clear task definition
   - Specific validation criteria
   - Requested response format

2. **Response Parsing**
   - Parse JSON responses
   - Handle validation errors
   - Extract actionable feedback

3. **Quality Control**
   - Cross-validate critical checks
   - Handle LLM uncertainty
   - Log validation decisions

## Practice Exercise (3 minutes)

Write a prompt to validate this EPD data:
```json
{
  "product": "Mineral Wool Insulation",
  "impact_categories": {
    "global_warming_potential": {
      "value": 12.5,
      "unit": "kg CO2 eq."
    },
    "acidification_potential": {
      "value": 0.08,
      "unit": "g SO2 eq."  // Note: Non-standard unit
    }
  },
  "system_boundary": "cradle-to-gate"
}
```

Your prompt should:
1. Check impact category completeness
2. Validate unit consistency
3. Verify system boundary adequacy

## Next Steps
In Part 2, you'll implement these validation approaches using the provided LCAValidator class, integrating LLM calls for complex validation tasks.

# Part 2: Implementing LLM-Based Validation Checks
**Time: 10 minutes**

## Overview
Now that you understand how to craft validation prompts, you'll implement the LLMValidator class to automate EPD validation using Claude. We'll focus on creating robust, reusable validation components.

## Implementation Tasks

### 2.1 Set Up Validation Config (2 minutes)
```python
from team_template.src.validation import LLMValidator
import json

# Create your validation configuration
VALIDATION_CONFIG = {
    "model": "claude-3-5-sonnet",  # Current best model for structured validation
    "required_categories": [
        "global_warming_potential",
        "acidification_potential",
        "eutrophication_potential"
    ],
    "standard_units": {
        "global_warming_potential": "kg CO2 eq.",
        "acidification_potential": "kg SO2 eq.",
        "eutrophication_potential": "kg PO4 eq."
    }
}

# Initialize validator
validator = LLMValidator(config=VALIDATION_CONFIG)
```

### 2.2 Create Validation Prompts (3 minutes)
Your task is to complete these prompt templates:

```python
VALIDATION_PROMPTS = {
    "completeness": """As an LCA expert, analyze this EPD data:
{epd_data}

Required categories:
{required_categories}

Tasks:
1. Check all required categories exist with values
2. Verify units match standards: {standard_units}
3. Identify any missing metadata

Respond in JSON format:
{
    "status": boolean,
    "missing_items": [...],
    "unit_issues": [...],
    "confidence": float
}""",

    # TODO: Complete these prompt templates
    "methodology": "...",  # Add methodology validation prompt
    "uncertainty": "...",  # Add uncertainty analysis prompt
    "system_boundary": "..."  # Add system boundary validation prompt
}
```

### 2.3 Implement Core Validation Methods (5 minutes)

Complete the `LLMValidator` implementation:

```python
class LLMValidator:
    def __init__(self, config: dict):
        self.config = config
        self.prompts = VALIDATION_PROMPTS
        
    async def validate_epd(self, epd_data: dict) -> dict:
        """Complete EPD validation using LLM."""
        try:
            # TODO: Implement validation workflow
            # 1. Check completeness
            # 2. Validate methodology
            # 3. Analyze uncertainty
            # 4. Combine results
            pass
            
        except Exception as e:
            return {
                "status": "error",
                "message": str(e),
                "timestamp": datetime.now().isoformat()
            }
    
    async def _check_completeness(self, data: dict) -> dict:
        """Validate data completeness using LLM."""
        # TODO: Implement completeness check
        pass
    
    async def _validate_methodology(self, data: dict) -> dict:
        """Validate methodology using LLM."""
        # TODO: Implement methodology validation
        pass
    
    async def _analyze_uncertainty(self, data: dict) -> dict:
        """Analyze result uncertainty using LLM."""
        # TODO: Implement uncertainty analysis
        pass
```

## Example Solution

Here's a partially completed implementation to guide you:

```python
class LLMValidator:
    async def validate_epd(self, epd_data: dict) -> dict:
        """Complete EPD validation using LLM."""
        try:
            # Step 1: Completeness check
            completeness_result = await self._check_completeness(epd_data)
            
            # Step 2: If complete, validate methodology
            methodology_result = None
            if completeness_result["status"]:
                methodology_result = await self._validate_methodology(epd_data)
            
            # Step 3: Uncertainty analysis
            uncertainty_result = await self._analyze_uncertainty(epd_data)
            
            # Combine results
            return {
                "status": "success",
                "results": {
                    "completeness": completeness_result,
                    "methodology": methodology_result,
                    "uncertainty": uncertainty_result
                },
                "timestamp": datetime.now().isoformat(),
                "overall_valid": all([
                    completeness_result["status"],
                    methodology_result["status"] if methodology_result else False,
                    uncertainty_result["confidence"] >= 0.8
                ])
            }
            
        except Exception as e:
            return {
                "status": "error",
                "message": str(e),
                "timestamp": datetime.now().isoformat()
            }
```

## Testing Your Implementation

Use this sample EPD data to test your validator:

```python
test_epd = {
    "product_info": {
        "name": "Mineral Wool Insulation",
        "manufacturer": "InsulCo",
        "declaration_number": "EPD-123"
    },
    "impact_categories": {
        "global_warming_potential": {
            "value": 12.5,
            "unit": "kg CO2 eq.",
            "uncertainty": "±10%"
        },
        "acidification_potential": {
            "value": 0.08,
            "unit": "g SO2 eq.",  # Incorrect unit!
            "uncertainty": "±15%"
        }
    },
    "methodology": {
        "system_boundary": "cradle-to-gate",
        "allocation_method": "mass-based",
        "data_quality": "verified EPD data"
    }
}

# Test validation
async def test_validation():
    validator = LLMValidator(VALIDATION_CONFIG)
    results = await validator.validate_epd(test_epd)
    print(json.dumps(results, indent=2))
```

## Common Implementation Challenges

1. **Response Parsing**
   - Always validate JSON responses from LLM
   - Handle malformed responses gracefully
   ```python
   try:
       result = json.loads(llm_response)
   except json.JSONDecodeError:
       result = {"status": False, "error": "Invalid LLM response format"}
   ```

2. **Error Handling**
   - Handle API rate limits
   - Manage timeout errors
   - Log validation failures

3. **Confidence Scoring**
   - Combine multiple validation aspects
   - Weight different validation components
   - Set appropriate thresholds

## Success Criteria

Your implementation should:
- [ ] Successfully validate complete EPDs
- [ ] Identify missing required categories
- [ ] Flag incorrect units
- [ ] Assess methodology compliance
- [ ] Handle errors gracefully
- [ ] Provide clear validation results

## Next Steps
In Part 3, you'll:
- Integrate this validator with the main LCA workflow
- Add caching for common validations
- Implement batch validation capabilities
- Create validation reports

# Part 3: Advanced LLM Validation and Workflow Integration
**Time: 10 minutes**

## Overview
Build on your validator implementation by adding advanced features, error recovery, and workflow integration. Learn to handle complex validation scenarios and create robust validation pipelines.

## 3.1 Advanced Validation Features (3 minutes)

### Chain-of-Thought Validation
Use multi-step LLM reasoning for complex validations:

```python
class AdvancedLLMValidator(LLMValidator):
    async def validate_complex_methodology(self, data: dict) -> dict:
        # Step 1: Analyze system boundaries
        boundary_prompt = f"""As an LCA expert, analyze this system boundary definition:
{data.get('methodology', {}).get('system_boundary', '')}

Think through:
1. Is it clearly defined?
2. Are all life cycle stages properly categorized?
3. Are exclusions justified?

Explain your reasoning step-by-step, then provide a structured assessment."""

        boundary_result = await self.validate(boundary_prompt)

        # Step 2: Validate allocation methods
        allocation_prompt = f"""Given this allocation methodology:
{data.get('methodology', {}).get('allocation_method', '')}

Consider:
1. Is it appropriate for this product system?
2. Is it consistently applied?
3. Is it justified according to ISO 14044?

Walk through your analysis step-by-step."""

        allocation_result = await self.validate(allocation_prompt)

        # Step 3: Synthesize results
        synthesis_prompt = f"""Given these analyses:

System Boundary Analysis:
{boundary_result}

Allocation Analysis:
{allocation_result}

Provide a final assessment of methodology validity."""

        return await self.validate(synthesis_prompt)
```

### Self-Reflection and Confidence Scoring
Add validation quality checks:

```python
async def validate_with_confidence(self, data: dict) -> dict:
    # Initial validation
    result = await self.validate_epd(data)
    
    # Self-reflection prompt
    reflection_prompt = f"""Review your validation of this EPD data:

Validation Result:
{json.dumps(result, indent=2)}

Original Data:
{json.dumps(data, indent=2)}

Critically assess:
1. Did you check all required aspects?
2. Are your conclusions well-justified?
3. What is your confidence level for each assessment?
4. What additional checks might be needed?

Provide a confidence score and any recommendations."""

    reflection = await self.validate(reflection_prompt)
    
    # Combine results
    return {
        "validation": result,
        "confidence_assessment": reflection,
        "timestamp": datetime.now().isoformat()
    }
```

## 3.2 Error Recovery and Edge Cases (3 minutes)

### Implementing Robust Error Handling:

```python
class RobustLLMValidator(AdvancedLLMValidator):
    async def safe_validate(self, data: dict) -> dict:
        retries = 3
        backoff = 1  # seconds
        
        for attempt in range(retries):
            try:
                result = await self.validate_with_confidence(data)
                
                # Validate response structure
                if not self._is_valid_response(result):
                    raise ValueError("Invalid response structure")
                    
                return result
                
            except json.JSONDecodeError:
                # Handle malformed JSON responses
                if attempt == retries - 1:
                    return self._format_error("JSON parsing failed")
                await asyncio.sleep(backoff)
                
            except Exception as e:
                if attempt == retries - 1:
                    return self._format_error(str(e))
                await asyncio.sleep(backoff)
                backoff *= 2  # Exponential backoff
    
    def _is_valid_response(self, response: dict) -> bool:
        """Validate response structure."""
        required_fields = {
            'validation', 'confidence_assessment', 'timestamp'
        }
        return all(field in response for field in required_fields)
    
    def _format_error(self, message: str) -> dict:
        return {
            "status": "error",
            "message": message,
            "timestamp": datetime.now().isoformat()
        }
```

## 3.3 Workflow Integration (4 minutes)

### Creating a Complete Validation Pipeline:

```python
class LCAValidationPipeline:
    def __init__(self):
        self.validator = RobustLLMValidator()
        self.cache = {}
        
    async def process_epd(self, epd_data: dict) -> dict:
        """Complete EPD validation workflow."""
        try:
            # 1. Check cache
            cache_key = self._generate_cache_key(epd_data)
            if cache_key in self.cache:
                return self.cache[cache_key]
            
            # 2. Basic validation
            basic_result = await self.validator.safe_validate(epd_data)
            
            if not basic_result['validation']['status']:
                return basic_result
            
            # 3. Advanced validation if basic passes
            advanced_result = await self.validator.validate_complex_methodology(epd_data)
            
            # 4. Compile results
            final_result = {
                "basic_validation": basic_result,
                "advanced_validation": advanced_result,
                "overall_status": (
                    basic_result['validation']['status'] and 
                    advanced_result['compliance_status']
                ),
                "timestamp": datetime.now().isoformat()
            }
            
            # 5. Cache result
            self.cache[cache_key] = final_result
            
            return final_result
            
        except Exception as e:
            return {
                "status": "error",
                "message": f"Pipeline error: {str(e)}",
                "timestamp": datetime.now().isoformat()
            }
    
    def _generate_cache_key(self, data: dict) -> str:
        """Generate cache key from EPD data."""
        # Implementation depends on your caching strategy
        pass
```

### Example Usage:

```python
async def validate_epd_collection():
    pipeline = LCAValidationPipeline()
    
    # Example EPDs with different challenges
    epds = [
        {
            "product_info": {"name": "Product A"},
            "impact_categories": {...},
            "methodology": {...}
        },
        {
            "product_info": {"name": "Product B"},
            "impact_categories": {...},  # Missing categories
            "methodology": {...}
        },
        {
            "product_info": {"name": "Product C"},
            "impact_categories": {...},
            "methodology": {...}  # Complex allocation
        }
    ]
    
    results = []
    for epd in epds:
        result = await pipeline.process_epd(epd)
        results.append(result)
    
    return results
```

## Success Criteria

Your implementation should:
- [ ] Handle complex validation scenarios
- [ ] Recover from API errors gracefully
- [ ] Provide confidence scores
- [ ] Cache validation results
- [ ] Generate detailed validation reports

## Common Advanced Challenges

1. **Rate Limiting**
   - Implement exponential backoff
   - Track API usage
   - Cache frequent validations

2. **Response Quality**
   - Validate LLM response structure
   - Handle incomplete responses
   - Cross-validate critical assessments

3. **Performance**
   - Optimize prompt length
   - Implement efficient caching
   - Use batch processing where possible

## Next Steps
- Add more sophisticated validation rules
- Implement validation result analytics
- Create custom validation reports
- Integrate with other LCA tools