# Validate Analysis Results

This notebook compares analysis outputs between **production runs** (baseline) and **test runs** to validate that tests produced expected results.

**Modes:**
- **Single validation**: Compare one production/test pair for one perspective (Section 3A)
- **Lookup test IDs**: Populate test analysis IDs by name + EDM lookup (Section 3B)
- **Batch validation**: Compare multiple pairs from CSV/XLSX, all perspectives (Section 3C)

**Perspectives validated (batch mode):** GR (Gross), GU (Ground-Up), RL (Reinsurance Layer)

**Endpoints compared:**
- Settings (analysis configuration)
- Statistics (`/stats`)
- EP Metrics (`/ep`)
- Event Loss Table (`/elt`)
- Period Loss Table (`/plt`) - HD analyses only

**Fields compared per endpoint:**

| Endpoint | Fields |
|----------|--------|
| Settings | engineType, engineVersion, analysisType, analysisMode, analysisFramework, modelProfile, outputProfile, eventRateSchemeNames, peril, perilCode, subPeril, region, regionCode, lossAmplification, insuranceType, vulnerabilityCurve, engineSubType, isMultiEvent |
| Statistics | epType, purePremium, totalStdDev, cv, netPurePremium, activation, exhaustion, totalLossRatio, limit, premium, netStdDev, exhaustAllReinstatements |
| EP Metrics | epType, value (contains returnPeriods and positionValues arrays) |
| ELT | eventId, sourceId, positionValue, stdDevI, stdDevC, expValue, rate, peril, region, oepWUC |
| PLT | periodId, eventId, weight, eventDate, lossDate, positionValue, peril, region |

## 1. Setup & Imports

In [None]:
%load_ext autoreload
%autoreload 2

from helpers.analysis_results_validator import (
    AnalysisResultsValidator,
    # Single validation output
    print_validation_summary,
    print_validation_details,
    # Batch validation output
    batch_results_to_dataframe,
    print_batch_summary,
    export_batch_failures_to_json,
    export_batch_summary_to_csv,
)

validator = AnalysisResultsValidator()
print("Setup complete.")

## 2. Configuration

**Common settings** apply to both single and batch validation.

In [None]:
# === Common Settings ===

# Comparison settings
RELATIVE_TOLERANCE = 1e-4  # For floating-point comparison (0.01% relative difference)
DECIMAL_PLACES = 0         # Values must match when rounded to this many decimal places
MAX_DIFF = 100             # Maximum allowed difference between rounded values
MAX_DIFFERENCES_TO_SHOW = 10  # Limit output for large datasets

# Note: PLT comparison is automatically included for HD analyses only (based on engineType)

## 3A. Single Validation (One Pair, One Perspective)

Use this section to validate a single production/test pair for a specific perspective.

Enter the **appAnalysisId** values - these are the IDs shown in the Moody's RiskModeler UI (e.g., 35810).

In [None]:
# Single validation configuration
PRODUCTION_APP_ANALYSIS_ID = 1660  # Replace with your production analysis ID
TEST_APP_ANALYSIS_ID = 4987        # Replace with your test analysis ID
PERSPECTIVE_CODE = 'RL'            # 'GR' (Gross), 'GU' (Ground-Up), 'RL' (Reinsurance Layer)

# Run single validation
result = validator.validate(
    production_app_analysis_id=PRODUCTION_APP_ANALYSIS_ID,
    test_app_analysis_id=TEST_APP_ANALYSIS_ID,
    perspective_code=PERSPECTIVE_CODE,
    relative_tolerance=RELATIVE_TOLERANCE,
    decimal_places=DECIMAL_PLACES,
    max_diff=MAX_DIFF,
)

print_validation_summary(result)

### Single Validation Details (if failed)

In [None]:
print_validation_details(result, max_differences=MAX_DIFFERENCES_TO_SHOW)

## 3B. Lookup Test Analysis IDs (Optional)

Populate `test_app_analysis_id` by looking up analyses using `test_analysis_name` + `test_edm_name` columns.

In [None]:
from helpers.irp_integration import IRPClient
import pandas as pd

# === Configuration ===
LOOKUP_FILE_PATH = 'analysis_results/big_config_ALL_Inc_USIF_results.xlsx'
SAVE_OUTPUT = True  # Set to True to save updated file

# Load and process
df = pd.read_excel(LOOKUP_FILE_PATH)
if 'test_app_analysis_id' not in df.columns:
    df['test_app_analysis_id'] = pd.NA

client = IRPClient()
found, errors = 0, 0

for idx, row in df.iterrows():
    analysis_name, edm_name = row.get('test_analysis_name'), row.get('test_edm_name')
    if pd.isna(analysis_name) or pd.isna(edm_name) or pd.notna(row.get('test_app_analysis_id')):
        continue
    try:
        analysis = client.analysis.get_analysis_by_name(str(analysis_name).strip(), str(edm_name).strip())
        if app_id := analysis.get('appAnalysisId'):
            df.at[idx, 'test_app_analysis_id'] = int(app_id)
            found += 1
            print(f"[OK] {analysis_name} -> {app_id}")
    except Exception as e:
        errors += 1
        print(f"[ERROR] {analysis_name}: {e}")

print(f"\nFound: {found}, Errors: {errors}")

if SAVE_OUTPUT:
    df.to_excel(LOOKUP_FILE_PATH, index=False)
    print(f"Saved to: {LOOKUP_FILE_PATH}")

## 3C. Batch Validation (Multiple Pairs, All Perspectives)

Use this section to validate multiple production/test pairs from a CSV or XLSX file.
All three perspectives (GR, GU, RL) are validated automatically.

**File format (CSV or XLSX):**
```
production_app_analysis_id,test_app_analysis_id,name
1575,4342,USFL_Other
1576,4343,USEQ_Primary
1577,4344,USHU_Full
```

Required columns:
- `production_app_analysis_id`: App analysis ID for production (baseline)
- `test_app_analysis_id`: App analysis ID for test run
- `name` (optional): Label for this analysis pair

In [None]:
# Path to CSV or XLSX file with analysis pairs
FILE_PATH = 'analysis_results/big_config_ALL_Inc_USIF_results.xlsx'  # Replace with your file path (.csv or .xlsx)

# Progress callback (optional)
def show_progress(current, total, name, perspective):
    print(f"[{current}/{total}] {name} - {perspective}")

# Run batch validation (validates all perspectives: GR, GU, RL)
# PLT is automatically included for HD analyses only
batch_result = validator.validate_batch_from_file(
    file_path=FILE_PATH,
    relative_tolerance=RELATIVE_TOLERANCE,
    decimal_places=DECIMAL_PLACES,
    max_diff=MAX_DIFF,
    progress_callback=show_progress,
)

print()
print_batch_summary(batch_result)

### Batch Results Summary Table

In [None]:
# Display results as a DataFrame
df = batch_results_to_dataframe(batch_result)
df

### Export Results (Optional)

Export the summary to CSV and detailed failures to JSON.

In [None]:
# Export summary to CSV (saved to validation_outputs folder)
summary_path = export_batch_summary_to_csv(batch_result)
print(f"Summary exported to: {summary_path}")

# Export detailed failures to JSON (only if there are failures)
if batch_result.failed_count > 0:
    failures_path = export_batch_failures_to_json(
        batch_result,
        max_differences=MAX_DIFFERENCES_TO_SHOW
    )
    print(f"Failure details exported to: {failures_path}")
else:
    print("No failures to export.")