# PTF_MVT 04: Consolidation Testing

**Purpose**: Test consolidation (AZ + AZEC → Gold)

**Tests**: Union, IRD enrichment, ISIC codification, client data

---

In [1]:
import sys
from pathlib import Path

project_root = Path.cwd().parent.parent
sys.path.insert(0, str(project_root))
print(f"Project root: {project_root}")

Project root: /workspace/new_python


In [2]:
from pyspark.sql import SparkSession
# from azfr_fsspec_utils import fspath
# import azfr_fsspec_abfs

# azfr_fsspec_abfs.use()

spark = SparkSession.builder \
    .appName("PTF_MVT_Consolidation") \
    .getOrCreate()

print(f"✓ Spark {spark.version}")

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/12/17 11:22:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


✓ Spark 3.4.4


## 1. Initialize

In [3]:
from utils.loaders.config_loader import ConfigLoader
from utils.logger import PipelineLogger

config = ConfigLoader(str(project_root / "config" / "config.yml"))
logger = PipelineLogger("consolidation_test")

VISION = "202509"
print(f"Vision: {VISION}")

Vision: 202509


In [4]:
%run 02_az_processor_testing.ipynb
%run 03_azec_processor_testing.ipynb

Project root: /workspace/new_python
✓ Spark 3.4.4
Vision: 202509
2025-12-17 11:22:15 - az_test - INFO -   Running AZProcessor
2025-12-17 11:22:15 - az_test - INFO - Starting: AZProcessor.read()
2025-12-17 11:22:15 - az_test - INFO - Reading ipf_az files (PTF16 + PTF36)


25/12/17 11:22:15 WARN SparkSession: Using an existing Spark session; only runtime SQL configurations will take effect.


2025-12-17 11:22:19 - az_test - INFO - Completed: AZProcessor.read() (Duration: 3.73s)
2025-12-17 11:22:19 - az_test - INFO - Read 30000 rows
2025-12-17 11:22:19 - az_test - INFO - Starting: AZProcessor.transform()
2025-12-17 11:22:19 - az_test - INFO - STEP 0: Applying business filters (construction market)
2025-12-17 11:22:19 - az_test - INFO - Business filters applied successfully (7 filters)
2025-12-17 11:22:19 - az_test - INFO - STEP 1: Renaming columns
2025-12-17 11:22:19 - az_test - INFO - STEP 2: Initializing columns to 0
2025-12-17 11:22:20 - az_test - INFO - STEP 3: Adding computed columns (tx, top_coass, coass, partcie)
2025-12-17 11:22:20 - az_test - INFO - STEP 4: Adding metadata columns (vision, dircom, cdpole)
2025-12-17 11:22:20 - az_test - INFO - STEP 5: Joining IPFM99 for product 01099
2025-12-17 11:22:21 - az_test - INFO - Successfully joined reference data: ipfm99_az
2025-12-17 11:22:21 - az_test - INFO - STEP 6: Extracting capitals (SMP, LCI, PERTE_EXP, RISQUE_DIRE

[Stage 12:>                                                         (0 + 6) / 6]

2025-12-17 11:22:25 - az_test - INFO - Transformed to 15000 rows
2025-12-17 11:22:25 - az_test - INFO - Starting: AZProcessor.write()
2025-12-17 11:22:25 - az_test - INFO - Writing silver data to: /workspace/datalake/silver/2025/09/mvt_const_ptf_202509.parquet
2025-12-17 11:22:25 - az_test - INFO - Format: parquet, Compression: snappy, Mode: overwrite


25/12/17 11:22:26 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
25/12/17 11:22:36 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
25/12/17 11:22:36 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 84.44% for 9 writers
25/12/17 11:22:36 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 76.00% for 10 writers
25/12/17 11:22:36 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 69.09% for 11 writers
25/12/17 11:22:36 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 63.33% for 12 writers


2025-12-17 11:22:38 - az_test - INFO - ✓ SUCCESS: Silver data written successfully
2025-12-17 11:22:38 - az_test - INFO - Completed: AZProcessor.write() (Duration: 12.55s)
2025-12-17 11:22:38 - az_test - INFO - ✓ SUCCESS: AZProcessor completed successfully


25/12/17 11:22:38 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 69.09% for 11 writers
25/12/17 11:22:38 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 76.00% for 10 writers
25/12/17 11:22:38 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 84.44% for 9 writers
25/12/17 11:22:38 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
                                                                                

✓ AZ Processor: 15,000 rows
  Columns: 172


                                                                                

+-----------+---------+---------+---------+-------------+
|      nopol|  smp_100|  lci_100|perte_exp|risque_direct|
+-----------+---------+---------+---------+-------------+
|POL00000086|469716.95|401396.38| 92286.73|    287580.67|
|POL00000598|      0.0|317592.83|      0.0|     31053.36|
|POL00000613|166158.91| 34526.25|257950.16|    260002.36|
+-----------+---------+---------+---------+-------------+
only showing top 3 rows

+-----------+---------+----------------+-------+
|      nopol|top_coass|           coass|partcie|
+-----------+---------+----------------+-------+
|POL00000001|        1| COASS. ACCEPTEE| 0.9361|
|POL00000002|        0|SANS COASSURANCE|    1.0|
|POL00000003|        0|SANS COASSURANCE|    1.0|
+-----------+---------+----------------+-------+
only showing top 3 rows

+-----------+-----+-----+-----+-----+-----+
|      nopol|nbafn|nbres|nbrpt|nbrpc|nbptf|
+-----------+-----+-----+-----+-----+-----+
|POL00000086|    0|    0|    0|    0|    1|
|POL00000598|    0|    0|

25/12/17 11:22:45 WARN SparkSession: Using an existing Spark session; only runtime SQL configurations will take effect.


2025-12-17 11:22:45 - azec_test - INFO - Completed: AZECProcessor.read() (Duration: 0.29s)
2025-12-17 11:22:45 - azec_test - INFO - Read 800 rows
2025-12-17 11:22:45 - azec_test - INFO - Starting: AZECProcessor.transform()
2025-12-17 11:22:45 - azec_test - INFO - STEP 1: Applying column configuration
2025-12-17 11:22:45 - azec_test - INFO - STEP 2: Applying business filters
2025-12-17 11:22:45 - azec_test - INFO - Business filters applied successfully (5 filters)
2025-12-17 11:22:45 - azec_test - INFO - STEP 3: Handling AZEC migration
2025-12-17 11:22:45 - azec_test - INFO - Migration table joined - NBPTF_NON_MIGRES_AZEC calculated
2025-12-17 11:22:45 - azec_test - INFO - STEP 4: Updating dates and policy states
2025-12-17 11:22:46 - azec_test - INFO - STEP 5: Calculating movements
2025-12-17 11:22:46 - azec_test - INFO - STEP 6: Calculating suspension periods (nbj_susp_ytd)
2025-12-17 11:22:46 - azec_test - INFO - STEP 7: Calculating exposures
2025-12-17 11:22:46 - azec_test - INFO - 

[Stage 93:>                                                       (0 + 12) / 12]

2025-12-17 11:22:50 - azec_test - INFO - Transformed to 3 rows
2025-12-17 11:22:50 - azec_test - INFO - Starting: AZECProcessor.write()
2025-12-17 11:22:50 - azec_test - INFO - Writing silver data to: /workspace/datalake/silver/2025/09/azec_ptf_202509.parquet
2025-12-17 11:22:50 - azec_test - INFO - Format: parquet, Compression: snappy, Mode: overwrite


                                                                                

2025-12-17 11:22:51 - azec_test - INFO - ✓ SUCCESS: Silver data written successfully
2025-12-17 11:22:51 - azec_test - INFO - Completed: AZECProcessor.write() (Duration: 1.58s)
2025-12-17 11:22:51 - azec_test - INFO - ✓ SUCCESS: AZECProcessor completed successfully
✓ AZEC Processor: 3 rows
  Columns: 87
+---------+-----+-----+-----+
|   police|nbafn|nbres|nbptf|
+---------+-----+-----+-----+
|AZ0000658|    0|    0|    1|
|AZ0000659|    0|    0|    1|
|AZ0000738|    1|    0|    0|
+---------+-----+-----+-----+

+---------+------------+
|   police|nbj_susp_ytd|
+---------+------------+
|AZ0000658|           0|
|AZ0000659|           0|
|AZ0000738|           0|
+---------+------------+

AZEC PROCESSOR TESTING COMPLETE
Result: ✓ Success
→ Next: Notebook 04 - Consolidation


## 2. Run Consolidation Processor

In [5]:
from src.processors.ptf_mvt_processors.consolidation_processor import ConsolidationProcessor

try:
    consolidation = ConsolidationProcessor(spark, config, logger)
    df_gold = consolidation.run(VISION)
    
    if df_gold is not None:
        print(f"✓ Consolidation: {df_gold.count():,} rows")
        print(f"  Columns: {len(df_gold.columns)}")
    else:
        print("⚠ Consolidation returned None")
except Exception as e:
    print(f"✗ Error: {e}")
    df_gold = None

2025-12-17 11:23:29 - azec_test - INFO -   Running ConsolidationProcessor
2025-12-17 11:23:29 - azec_test - INFO - Starting: ConsolidationProcessor.read()
2025-12-17 11:23:29 - azec_test - INFO - Reading AZ silver data (mvt_const_ptf)
2025-12-17 11:23:29 - azec_test - INFO - Completed: ConsolidationProcessor.read() (Duration: 0.28s)
2025-12-17 11:23:29 - azec_test - INFO - Read 15000 rows
2025-12-17 11:23:29 - azec_test - INFO - Starting: ConsolidationProcessor.transform()
2025-12-17 11:23:29 - azec_test - INFO - STEP 1: Reading AZEC silver data
2025-12-17 11:23:29 - azec_test - INFO - STEP 2: Harmonizing AZ schema
2025-12-17 11:23:29 - azec_test - INFO - STEP 3: Harmonizing AZEC schema
2025-12-17 11:23:30 - azec_test - INFO - STEP 3.5: Applying AZEC-specific transformations
2025-12-17 11:23:30 - azec_test - INFO - STEP 4: Consolidating AZ + AZEC
2025-12-17 11:23:30 - azec_test - INFO - STEP 4.5: Applying common transformations
2025-12-17 11:23:30 - azec_test - INFO - STEP 4.6: Enrichi

[Stage 240:>                                                        (0 + 1) / 1]

2025-12-17 11:24:35 - azec_test - INFO - ✓ SUCCESS: Gold data written successfully
2025-12-17 11:24:35 - azec_test - INFO - Completed: ConsolidationProcessor.write() (Duration: 55.16s)
2025-12-17 11:24:35 - azec_test - INFO - ✓ SUCCESS: ConsolidationProcessor completed successfully


                                                                                

✓ Consolidation: 306,425 rows
  Columns: 280


## 3. Verify Gold Output

In [6]:
if df_gold is not None:
    # DIRCOM distribution (AZ vs AZEC)
    print("DIRCOM distribution:")
    df_gold.groupBy('dircom').count().show()
    
    # Sample output
    df_gold.select('nopol', 'dircom', 'cdprod', 'primeto').show(5)

DIRCOM distribution:
+------+------+
|dircom| count|
+------+------+
|    az|306422|
|  azec|     3|
+------+------+

+-----------+------+------+------------+
|      nopol|dircom|cdprod|     primeto|
+-----------+------+------+------------+
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
+-----------+------+------+------------+
only showing top 5 rows



## Summary

In [7]:
print("="*50)
print("CONSOLIDATION TESTING COMPLETE")
print("="*50)
spark.stop()
print(f"Result: {'✓ Success' if df_gold is not None else '✗ Failed'}")
print("\n→ Run: python main.py --vision 202509 --component ptf_mvt")

CONSOLIDATION TESTING COMPLETE
Result: ✓ Success

→ Run: python main.py --vision 202509 --component ptf_mvt
