# 02 - Generic Transformations & Business Logic Testing

Tests for **core transformation functions** that power the pipeline.

## Modules Tested
1. **`utils/transformations/base/generic_transforms.py`**
   - `apply_conditional_transform()` - when/otherwise logic from config
   - `apply_business_filters()` - Filter DataFrame using config rules
   - `apply_transformations()` - Apply series of transformations

2. **`utils/transformations/operations/business_logic.py`**
   - `extract_capitals()` - Extract SMP/LCI from label/amount fields
   - `calculate_movements()` - Calculate AFN, RES, RPT, RPC, NBPTF
   - `calculate_exposures()` - Calculate expo_ytd and expo_gli

3. **`utils/transformations/base/column_operations.py`**
   - `lowercase_all_columns()` - Column standardization
   - `apply_column_config()` - Apply passthrough/rename/computed columns
---

## Setup

In [None]:
import sys
from pathlib import Path

# Add project root to path
project_root = Path().absolute().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType

# Create Spark session
spark = SparkSession.builder \
    .appName("UtilsTransformTesting") \
    .master("local[1]") \
    .config("spark.sql.shuffle.partitions", "1") \
    .getOrCreate()

print(f"✓ Spark {spark.version} session created")

In [None]:
from utils.transformations.base.generic_transforms import (
    apply_conditional_transform,
    apply_business_filters,
    apply_transformations
)
from utils.transformations.base.column_operations import (
    lowercase_all_columns,
    apply_column_config
)
from utils.transformations.operations.business_logic import (
    extract_capitals,
    calculate_movements,
    calculate_exposures
)
from utils.helpers import compute_date_ranges

print("✓ Transformation functions imported successfully")

---
## 1. Column Operations Testing

In [None]:
# Test lowercase_all_columns
print("Testing lowercase_all_columns:")
print("-" * 60)

# Create test DataFrame with mixed case columns
df_mixed = spark.createDataFrame([
    ("A001", "Policy1", 1000),
    ("A002", "Policy2", 2000)
], ["NOPOL", "PolicyName", "Amount"])

print("Original columns:", df_mixed.columns)

# Apply lowercase
df_lower = lowercase_all_columns(df_mixed)
print("After lowercase:", df_lower.columns)
print("✓ All columns lowercase" if all(c.islower() for c in df_lower.columns) else "✗ Some columns not lowercase")

df_lower.show(3, truncate=False)

In [None]:
# Test apply_column_config
print("\nTesting apply_column_config:")
print("-" * 60)

# Create test DataFrame
df_test = spark.createDataFrame([
    ("P001", 100, None, 50),
    ("P002", 200, 10, 75)
], ["nopol", "mtprprto", "txcede", "prcdci"])

# Define config
config = {
    'passthrough': ['nopol', 'mtprprto'],
    'rename': {'prcdci': 'prcie'},
    'computed': {
        'tx': {
            'type': 'coalesce_default',
            'source_col': 'txcede',
            'default': 0
        }
    },
    'init': {
        'dircom': ('AZ', StringType())
    }
}

df_configured = apply_column_config(df_test, config, '202509', 2025, 9)

print("Columns after config:")
print(f"  {df_configured.columns}")
print("\nData:")
df_configured.show(truncate=False)

---
## 2. Conditional Transform Testing

In [None]:
# Test apply_conditional_transform - coassurance logic
print("Testing apply_conditional_transform - Coassurance classification:")
print("-" * 60)

# Create test data
df_coass = spark.createDataFrame([
    ("P001", "1", "3"),  # APERITION
    ("P002", "1", "4"),  # ACCEPTEE
    ("P003", "1", "8"),  # INTERNATIONALE
    ("P004", "1", "9"),  # AUTRES
    ("P005", "0", None),  # SANS COASSURANCE
], ["nopol", "cdpolgp1", "cdcoas"])

# Define coassurance config
coass_config = {
    'conditions': [
        {'col': 'cdpolgp1', 'op': '==', 'value': '1', 'and_col': 'cdcoas', 'and_in': ['3', '6'], 'result': 'APERITION'},
        {'col': 'cdpolgp1', 'op': '==', 'value': '1', 'and_col': 'cdcoas', 'and_in': ['4', '5'], 'result': 'COASS. ACCEPTEE'},
        {'col': 'cdpolgp1', 'op': '==', 'value': '1', 'and_col': 'cdcoas', 'and_in': ['8'], 'result': 'ACCEPTATION INTERNATIONALE'},
        {'col': 'cdpolgp1', 'op': '==', 'value': '1', 'and_col': 'cdcoas', 'and_not_in': ['3', '4', '5', '6', '8'], 'result': 'AUTRES'}
    ],
    'default': 'SANS COASSURANCE'
}

df_result = apply_conditional_transform(df_coass, 'coass', coass_config)

print("Coassurance classification results:")
df_result.select('nopol', 'cdpolgp1', 'cdcoas', 'coass').show(truncate=False)

# Verify each case
results = df_result.select('nopol', 'coass').collect()
print("\nVerifications:")
print(f"  ✓ P001: APERITION" if results[0]['coass'] == 'APERITION' else f"  ✗ P001: {results[0]['coass']}")
print(f"  ✓ P002: COASS. ACCEPTEE" if results[1]['coass'] == 'COASS. ACCEPTEE' else f"  ✗ P002: {results[1]['coass']}")
print(f"  ✓ P003: ACCEPTATION INTERNATIONALE" if results[2]['coass'] == 'ACCEPTATION INTERNATIONALE' else f"  ✗ P003: {results[2]['coass']}")
print(f"  ✓ P004: AUTRES" if results[3]['coass'] == 'AUTRES' else f"  ✗ P004: {results[3]['coass']}")
print(f"  ✓ P005: SANS COASSURANCE" if results[4]['coass'] == 'SANS COASSURANCE' else f"  ✗ P005: {results[4]['coass']}")

---
## 3. Business Filters Testing

In [None]:
# Test apply_business_filters
print("Testing apply_business_filters - Construction market filters:")
print("-" * 60)

# Create test data
df_filters = spark.createDataFrame([
    ("P001", "6", "2", "R", "01", "H90061"),  # Should be EXCLUDED (bad intermed)
    ("P002", "6", "2", "R", "01", "123456"),  # Should PASS
    ("P003", "5", "2", "R", "01", "123456"),  # Should be EXCLUDED (wrong market)
    ("P004", "6", "1", "R", "01", "123456"),  # Should be EXCLUDED (wrong segment)
    ("P005", "6", "2", "X", "01", "123456"),  # Should be EXCLUDED (bad nature)
    ("P006", "6", "2", "R", "4", "123456"),   # Should be EXCLUDED (bad status)
], ["nopol", "cmarch", "cseg", "cdnatp", "cdsitp", "noint"])

print(f"Original row count: {df_filters.count()}")

# Define filter config
filter_config = {
    'description': 'Construction market filters',
    'filters': [
        {'type': 'equals', 'column': 'cmarch', 'value': '6', 'description': 'Construction market only'},
        {'type': 'equals', 'column': 'cseg', 'value': '2', 'description': 'Segment 2 only'},
        {'type': 'in', 'column': 'cdnatp', 'values': ['R', 'O', 'T', 'C'], 'description': 'Valid natures'},
        {'type': 'not_in', 'column': 'cdsitp', 'values': ['4', '5'], 'description': 'Exclude status 4,5'},
        {'type': 'not_in', 'column': 'noint', 'values': ['H90061'], 'description': 'Exclude bad intermed'}
    ]
}

df_filtered = apply_business_filters(df_filters, filter_config)

print(f"Filtered row count: {df_filtered.count()}")
print(f"\nRemaining policies:")
df_filtered.select('nopol', 'cmarch', 'cseg', 'cdnatp', 'cdsitp', 'noint').show(truncate=False)

# Verify only P002 should pass all filters
remaining = [row['nopol'] for row in df_filtered.collect()]
print("\nVerification:")
print(f"  ✓ Only P002 remains" if remaining == ['P002'] else f"  ✗ Wrong policies: {remaining}")

---
## 4. Capital Extraction Testing

In [None]:
# Test extract_capitals
print("Testing extract_capitals - SMP and LCI extraction:")
print("-" * 60)

# Create test data with capital labels and amounts
df_capitals = spark.createDataFrame([
    ("P001", "SMP GLOBAL DU CONTRAT", 1000000.0, "LCI GLOBAL DU CONTRAT", 500000.0),
    ("P002", "Capital quelconque", 0.0, "LCI GLOBALE", 750000.0),
    ("P003", "SMP RETENU", 2000000.0, "Autre capital", 0.0),
], ["nopol", "lbcapi1", "mtcapi1", "lbcapi2", "mtcapi2"])

# Define capital extraction config
capital_config = {
    'smp_100': {
        'keywords': ['SMP GLOBAL DU CONTRAT', 'SMP RETENU'],
        'exclude_keywords': [],
        'label_prefix': 'lbcapi',
        'amount_prefix': 'mtcapi',
        'num_indices': 2
    },
    'lci_100': {
        'keywords': ['LCI GLOBAL DU CONTRAT', 'LCI GLOBALE'],
        'exclude_keywords': [],
        'label_prefix': 'lbcapi',
        'amount_prefix': 'mtcapi',
        'num_indices': 2
    }
}

df_extracted = extract_capitals(df_capitals, capital_config)

print("Extracted capitals:")
df_extracted.select('nopol', 'smp_100', 'lci_100').show(truncate=False)

# Verify results
results = df_extracted.select('nopol', 'smp_100', 'lci_100').collect()
print("\nVerifications:")
print(f"  ✓ P001 SMP=1000000" if results[0]['smp_100'] == 1000000.0 else f"  ✗ P001 SMP={results[0]['smp_100']}")
print(f"  ✓ P001 LCI=500000" if results[0]['lci_100'] == 500000.0 else f"  ✗ P001 LCI={results[0]['lci_100']}")
print(f"  ✓ P002 SMP=0 (no match)" if results[1]['smp_100'] == 0.0 else f"  ✗ P002 SMP={results[1]['smp_100']}")
print(f"  ✓ P002 LCI=750000" if results[1]['lci_100'] == 750000.0 else f"  ✗ P002 LCI={results[1]['lci_100']}")
print(f"  ✓ P003 SMP=2000000" if results[2]['smp_100'] == 2000000.0 else f"  ✗ P003 SMP={results[2]['smp_100']}")

---
## 5. Movement Calculations Testing

In [None]:
# Test calculate_movements
print("Testing calculate_movements - AFN, RES, NBPTF:")
print("-" * 60)

# Get date ranges for 202509
dates = compute_date_ranges("202509")

# Create test data
df_movements = spark.createDataFrame([
    ("P001", "2025-01-15", "2025-01-15", None, None, None, "R", "1", 1000.0),  # AFN
    ("P002", "2024-01-01", "2024-01-01", "2025-05-20", None, None, "R", "1", 2000.0),  # RES
    ("P003", "2020-01-01", "2020-01-01", None, None, None, "R", "1", 3000.0),  # NBPTF
], ["nopol", "dtcrepol", "dteffan", "dtresilp", "dttraan", "dttraar", "cdnatp", "cdsitp", "primeto"])

# Column mapping
column_mapping = {
    'creation_date': 'dtcrepol',
    'effective_date': 'dteffan',
    'termination_date': 'dtresilp',
    'transfer_start': 'dttraan',
    'transfer_end': 'dttraar',
    'type_col1': None,
    'type_col2': None,
    'type_col3': None
}

# Also need cssseg column for portugal
df_movements = df_movements.withColumn('cssseg', lit('1'))

df_mvts = calculate_movements(df_movements, dates, 2025, column_mapping)

print("Movement indicators:")
df_mvts.select('nopol', 'nbafn', 'nbres', 'nbptf', 'primeto').show(truncate=False)

# Verify
results = df_mvts.select('nopol', 'nbafn', 'nbres', 'nbptf').collect()
print("\nVerifications:")
print(f"  ✓ P001 is AFN (nbafn=1)" if results[0]['nbafn'] == 1 else f"  ✗ P001 nbafn={results[0]['nbafn']}")
print(f"  ✓ P002 is RES (nbres=1)" if results[1]['nbres'] == 1 else f"  ✗ P002 nbres={results[1]['nbres']}")
print(f"  ✓ P003 is NBPTF (nbptf=1)" if results[2]['nbptf'] == 1 else f"  ✗ P003 nbptf={results[2]['nbptf']}")

---
## 6. Exposure Calculations Testing

In [None]:
# Test calculate_exposures
print("Testing calculate_exposures - expo_ytd and expo_gli:")
print("-" * 60)

# Create test data: Contract active all year
df_exposure = spark.createDataFrame([
    ("P001", "2025-01-01", None, "R", "1"),  # Full year exposure
    ("P002", "2025-05-01", None, "R", "1"),  # Partial year (from May)
    ("P003", "2024-01-01", "2025-03-31", "R", "3"),  # Terminated in March
], ["nopol", "dtcrepol", "dtresilp", "cdnatp", "cdsitp"])

# Column mapping
exposure_mapping = {
    'creation_date': 'dtcrepol',
    'termination_date': 'dtresilp'
}

df_expo = calculate_exposures(df_exposure, dates, 2025, exposure_mapping)

print("Exposure calculations:")
df_expo.select('nopol', 'dtcrepol', 'dtresilp', 'expo_ytd', 'expo_gli').show(truncate=False)

results = df_expo.select('nopol', 'expo_ytd', 'expo_gli').collect()
print("\nVerifications (approximate):")
print(f"  P001 expo_ytd: {results[0]['expo_ytd']:.4f} (expect ~0.75 for 9 months)")
print(f"  P001 expo_gli: {results[0]['expo_gli']:.4f} (expect ~1.0 for full month)")
print(f"  P002 expo_ytd: {results[1]['expo_ytd']:.4f} (expect ~0.42 for 5 months)")
print(f"  P003 expo_ytd: {results[2]['expo_ytd']:.4f} (expect ~0.25 for Q1 only)")

---
## Summary

This notebook tested:
- ✅ Column operations (lowercase, config application)
- ✅ Conditional transforms (coassurance classification)
- ✅ Business filters (construction market filtering)
- ✅ Capital extraction (SMP/LCI from label fields)
- ✅ Movement calculations (AFN, RES, NBPTF)
- ✅ Exposure calculations (expo_ytd, expo_gli)

All tests focus on **visual inspection** of results with clear expected outcomes.