# Capitaux 01: AZ Capitaux Processor Testing

**Purpose**: Test AZ capital extraction from IPF_AZ data

**Tests**:
1. Read IPF_AZ bronze data
2. Apply business filters
3. Extract 8 capital types (SMP, LCI, PE, RD, RC limits...)
4. Calculate capital normalization (100% basis)

---

In [1]:
import sys
from pathlib import Path

project_root = Path.cwd().parent.parent
sys.path.insert(0, str(project_root))
print(f"Project root: {project_root}")

Project root: /workspace/new_python


In [2]:
from pyspark.sql import SparkSession
# from azfr_fsspec_utils import fspath
# import azfr_fsspec_abfs

# azfr_fsspec_abfs.use()

spark = SparkSession.builder \
    .appName("Capitaux_AZ_Testing") \
    .getOrCreate()

print(f"✓ Spark {spark.version}")

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/12/17 20:14:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


✓ Spark 3.4.4


## 1. Load Configuration

In [3]:
from utils.loaders.config_loader import ConfigLoader
from utils.loaders.transformation_loader import TransformationLoader
import json

config = ConfigLoader(str(project_root / "config" / "config.yml"))
loader = TransformationLoader(str(project_root / "config" / "transformations"))

# Load capitaux extraction config
cap_config_path = project_root / "config" / "transformations" / "capitaux_extraction_config.json"
with open(cap_config_path, 'r') as f:
    capital_config = json.load(f)

# Remove comments
capital_types = {k: v for k, v in capital_config.items() if not k.startswith('_')}

print(f"Capital types to extract: {list(capital_types.keys())}")

Capital types to extract: ['smp_100', 'lci_100', 'perte_exp_100', 'risque_direct_100', 'limite_rc_100_par_sin', 'limite_rc_100_par_sin_tous_dom', 'limite_rc_100_par_an', 'smp_pe_100', 'smp_rd_100']


## 2. Read Bronze Data

In [4]:
from src.reader import BronzeReader

VISION = "202509"

bronze_reader = BronzeReader(
    spark, config, 
    str(project_root / "config" / "reading_config.json")
)

df = bronze_reader.read_file_group('ipf_az', VISION)
print(f"✓ Read {df.count():,} rows")

✓ Read 30,000 rows


## 3. Apply Business Filters

In [5]:
# CORRECTED: Fixed import path
from utils.transformations import apply_business_filters

business_rules = loader.get_business_rules()
az_filters = business_rules['business_filters']['az']

df_filtered = apply_business_filters(df, az_filters)
print(f"✓ After filters: {df_filtered.count():,} rows")

✓ After filters: 30,000 rows


## 4. Extract Capitals

In [6]:
from utils.transformations import extract_capitals

df_capitals = extract_capitals(df_filtered, capital_types)

# Show extracted capitals
capital_cols = [c for c in df_capitals.columns if any(
    x in c for x in ['smp', 'lci', 'perte', 'risque', 'limite_rc']
)]

print(f"✓ Capital columns created: {capital_cols}")
df_capitals.select('nopol', *capital_cols[:4]).show(5)

✓ Capital columns created: ['mtsmpr', 'smp_100', 'lci_100', 'perte_exp_100', 'risque_direct_100', 'limite_rc_100_par_sin', 'limite_rc_100_par_sin_tous_dom', 'limite_rc_100_par_an', 'smp_pe_100', 'smp_rd_100']


25/12/17 20:14:24 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.


+-----------+-------+---------+---------+-------------+
|      nopol| mtsmpr|  smp_100|  lci_100|perte_exp_100|
+-----------+-------+---------+---------+-------------+
|POL00000001| 684.33|344605.86|192627.18|    197686.05|
|POL00000002|1131.03|360173.35|      0.0|          0.0|
|POL00000003| 481.65| 24260.07|248216.67|     16022.81|
|POL00000004| 434.63|148891.11|197332.76|    494283.78|
|POL00000005| 225.72|235841.25| 254365.7|          0.0|
+-----------+-------+---------+---------+-------------+
only showing top 5 rows



## 5. Capital Statistics

In [7]:
from pyspark.sql.functions import sum as spark_sum, count, when, col

for cap_col in capital_cols[:4]:
    if cap_col in df_capitals.columns:
        stats = df_capitals.agg(
            count(when(col(cap_col) > 0, True)).alias('non_zero'),
            spark_sum(cap_col).alias('total')
        ).collect()[0]
        print(f"{cap_col}: {stats['non_zero']:,} non-zero, total={stats['total']:,.0f}")

mtsmpr: 30,000 non-zero, total=29,861,042
smp_100: 27,904 non-zero, total=7,088,350,434
lci_100: 21,545 non-zero, total=5,469,309,310
perte_exp_100: 20,386 non-zero, total=5,244,533,111


## Summary

In [8]:
print("="*60)
print("AZ CAPITAUX TESTING COMPLETE")
print("="*60)
print(f"Capital types: {list(capital_types.keys())}")
print("\n→ Next: Notebook 02 - AZEC Capitaux")

AZ CAPITAUX TESTING COMPLETE
Capital types: ['smp_100', 'lci_100', 'perte_exp_100', 'risque_direct_100', 'limite_rc_100_par_sin', 'limite_rc_100_par_sin_tous_dom', 'limite_rc_100_par_an', 'smp_pe_100', 'smp_rd_100']

→ Next: Notebook 02 - AZEC Capitaux
