# Capitaux 03: Full Pipeline Testing

**Purpose**: Test complete Capitaux pipeline (AZ + AZEC → Silver)

**Tests**:
1. Run AZCapitauxProcessor
2. Run AZECCapitauxProcessor  
3. Verify output datasets

---

In [1]:
import sys
from pathlib import Path

project_root = Path.cwd().parent.parent
sys.path.insert(0, str(project_root))
print(f"Project root: {project_root}")

Project root: /workspace/new_python


In [2]:
from pyspark.sql import SparkSession
# from azfr_fsspec_utils import fspath
# import azfr_fsspec_abfs

# azfr_fsspec_abfs.use()

spark = SparkSession.builder \
    .appName("Capitaux_Pipeline_Testing") \
    .getOrCreate()

print(f"✓ Spark {spark.version}")

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/12/17 20:59:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


✓ Spark 3.4.4


## 1. Initialize Processors

In [3]:
from utils.loaders.config_loader import ConfigLoader
from utils.logger import PipelineLogger
from src.processors.capitaux_processors.az_capitaux_processor import AZCapitauxProcessor
from src.processors.capitaux_processors.azec_capitaux_processor import AZECCapitauxProcessor

config = ConfigLoader(str(project_root / "config" / "config.yml"))
logger = PipelineLogger("capitaux_test")

VISION = "202509"
print(f"Testing pipeline for vision: {VISION}")

Testing pipeline for vision: 202509


## 2. Run AZ Capitaux Processor

In [4]:
try:
    az_processor = AZCapitauxProcessor(spark, config, logger)
    
    # CORRECTED: Use read() + transform() pattern
    print("Step 1: Reading AZ bronze data...")
    df_az = az_processor.read(VISION)
    print(f"✓ Read: {df_az.count():,} rows")
    
    print("\nStep 2: Transforming AZ data...")
    df_az_transformed = az_processor.transform(df_az, VISION)
    print(f"✓ AZ Capitaux: {df_az_transformed.count():,} rows")
    
    # Show sample
    df_az_transformed.select('nopol', 'smp_100_ind', 'lci_100_ind').show(5)
    
except Exception as e:
    print(f"✗ AZ Processor error: {e}")
    import traceback
    traceback.print_exc()
    df_az_transformed = None

2025-12-17 20:59:35 - capitaux_test - INFO - AZ Capitaux Processor initialized
Step 1: Reading AZ bronze data...
2025-12-17 20:59:35 - capitaux_test - INFO - Reading AZ capital data for vision 202509
2025-12-17 20:59:40 - capitaux_test - INFO - ✓ SUCCESS: Read 30,000 records from bronze (AZ)
✓ Read: 30,000 rows

Step 2: Transforming AZ data...
2025-12-17 20:59:41 - capitaux_test - INFO - Starting AZ capital transformations
2025-12-17 20:59:41 - capitaux_test - INFO - STEP 1: Applying business filters
2025-12-17 20:59:41 - capitaux_test - INFO - Business filters applied successfully (7 filters)
2025-12-17 20:59:42 - capitaux_test - INFO - After filters: 30,000 records
2025-12-17 20:59:42 - capitaux_test - INFO - STEP 2: Applying column configuration
2025-12-17 20:59:42 - capitaux_test - INFO - STEP 3: Extracting capitals WITH indexation
2025-12-17 20:59:42 - capitaux_test - INFO - Starting capital indexation for 14 columns
2025-12-17 20:59:47 - capitaux_test - INFO - ✓ SUCCESS: Capital 

25/12/17 20:59:56 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.


+-----------+------------------+------------------+
|      nopol|       smp_100_ind|       lci_100_ind|
+-----------+------------------+------------------+
|POL00000001| 371783.2128600712| 416142.4045579633|
|POL00000002|421077.06401883403|               0.0|
|POL00000003|25808.016231812424|465914.22257479606|
|POL00000004|486766.65877089376|  424164.860278312|
|POL00000005| 426367.5303140222| 263618.7169654887|
+-----------+------------------+------------------+
only showing top 5 rows



## 3. Run AZEC Capitaux Processor

In [5]:
try:
    azec_processor = AZECCapitauxProcessor(spark, config, logger)
    
    # CORRECTED: Use read() + transform() pattern
    print("Step 1: Reading AZEC bronze data...")
    df_azec = azec_processor.read(VISION)
    print(f"✓ Read CAPITXCU: {df_azec.count():,} rows")
    
    print("\nStep 2: Transforming AZEC data...")
    df_azec_transformed = azec_processor.transform(df_azec, VISION)
    print(f"✓ AZEC Capitaux: {df_azec_transformed.count():,} rows")
    
    # Show sample
    df_azec_transformed.select('nopol', 'cdprod', 'smp_100_ind', 'lci_100_ind').show(5)
    
except Exception as e:
    print(f"⚠ AZEC Processor (expected if CAPITXCU missing): {e}")
    df_azec_transformed = None

2025-12-17 20:59:58 - capitaux_test - INFO - AZEC Capitaux Processor initialized
Step 1: Reading AZEC bronze data...
2025-12-17 20:59:58 - capitaux_test - INFO - Reading AZEC capital data for vision 202509
2025-12-17 20:59:59 - capitaux_test - INFO - ✓ SUCCESS: Read 1,600 records from CAPITXCU
✓ Read CAPITXCU: 1,600 rows

Step 2: Transforming AZEC data...
2025-12-17 20:59:59 - capitaux_test - INFO - Starting AZEC capital transformations
2025-12-17 20:59:59 - capitaux_test - INFO - STEP 1: Processing CAPITXCU (SMP/LCI by branch)
2025-12-17 20:59:59 - capitaux_test - INFO - STEP 2: Reading and aggregating INCENDCU (PE/RD)
2025-12-17 20:59:59 - capitaux_test - INFO - STEP 3: Joining PE/RD data
2025-12-17 20:59:59 - capitaux_test - INFO - STEP 4: Enriching with segmentation
2025-12-17 20:59:59 - capitaux_test - INFO - Segmentation enrichment successful
2025-12-17 20:59:59 - capitaux_test - INFO - STEP 5: Filtering construction market (CMARCH=6)
2025-12-17 21:00:00 - capitaux_test - INFO - 

## 4. Verify Output Schemas

In [6]:
if df_az_transformed is not None:
    print("AZ Schema:")
    print(f"  Columns: {len(df_az_transformed.columns)}")
    print(f"  Capital columns: {[c for c in df_az_transformed.columns if '100' in c][:5]}")
    
if df_azec_transformed is not None:
    print("\nAZEC Schema:")
    print(f"  Columns: {len(df_azec_transformed.columns)}")
    print(f"  Capital columns: {[c for c in df_azec_transformed.columns if '100' in c][:5]}")

AZ Schema:
  Columns: 191
  Capital columns: ['smp_100_ind', 'lci_100_ind', 'perte_exp_100_ind', 'risque_direct_100_ind', 'limite_rc_100_par_sin_ind']

AZEC Schema:
  Columns: 20
  Capital columns: ['smp_pe_100', 'smp_dd_100', 'lci_pe_100', 'lci_dd_100', 'smp_100_ind']


## 5. Optional: Write to Silver (Manual)

In [None]:
# Uncomment to write outputs manually
# if df_az_transformed is not None:
#     az_processor.write(df_az_transformed, VISION)
#     print("✓ AZ data written to silver")
# 
# if df_azec_transformed is not None:
#     azec_processor.write(df_azec_transformed, VISION)
#     print("✓ AZEC data written to silver")

## Summary

In [7]:
print("="*60)
print("CAPITAUX PIPELINE TESTING COMPLETE")
print("="*60)
print(f"\nVision: {VISION}")
print(f"AZ Capitaux:   {'✓' if df_az_transformed is not None else '✗'}")
print(f"AZEC Capitaux: {'✓' if df_azec_transformed is not None else '⚠ (optional)'}")

print("\nKey learnings:")
print("  1. Use read() + transform() for testing (run() writes directly)")
print("  2. AZ: ipf file_group (combines IPFE16 + IPFE36)")
print("  3. AZEC: capitxcu_azec + incendcu_azec")
print("  4. Both create indexed (_ind) and non-indexed capitals")
print("\n→ Run production: python main.py --vision 202509 --component capitaux")

CAPITAUX PIPELINE TESTING COMPLETE

Vision: 202509
AZ Capitaux:   ✓
AZEC Capitaux: ✓

Key learnings:
  1. Use read() + transform() for testing (run() writes directly)
  2. AZ: ipf file_group (combines IPFE16 + IPFE36)
  3. AZEC: capitxcu_azec + incendcu_azec
  4. Both create indexed (_ind) and non-indexed capitals

→ Run production: python main.py --vision 202509 --component capitaux
