# PTF_MVT 04: Consolidation Testing

**Purpose**: Test consolidation (AZ + AZEC → Gold)

**Tests**: Union, IRD enrichment, ISIC codification, client data

---

In [1]:
import sys
from pathlib import Path

project_root = Path.cwd().parent.parent
sys.path.insert(0, str(project_root))
print(f"Project root: {project_root}")

Project root: /workspace/new_python


In [2]:
from pyspark.sql import SparkSession
# from azfr_fsspec_utils import fspath
# import azfr_fsspec_abfs

# azfr_fsspec_abfs.use()

spark = SparkSession.builder \
    .appName("PTF_MVT_Consolidation") \
    .getOrCreate()

print(f"✓ Spark {spark.version}")

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/12/17 11:22:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


✓ Spark 3.4.4


## 1. Initialize

In [3]:
from utils.loaders.config_loader import ConfigLoader
from utils.logger import PipelineLogger

config = ConfigLoader(str(project_root / "config" / "config.yml"))
logger = PipelineLogger("consolidation_test")

VISION = "202509"
print(f"Vision: {VISION}")

Vision: 202509


## 2. Run Consolidation Processor

In [5]:
from src.processors.ptf_mvt_processors.consolidation_processor import ConsolidationProcessor

try:
    consolidation = ConsolidationProcessor(spark, config, logger)
    df_gold = consolidation.run(VISION)
    
    if df_gold is not None:
        print(f"✓ Consolidation: {df_gold.count():,} rows")
        print(f"  Columns: {len(df_gold.columns)}")
    else:
        print("⚠ Consolidation returned None")
except Exception as e:
    print(f"✗ Error: {e}")
    df_gold = None

2025-12-17 11:23:29 - azec_test - INFO -   Running ConsolidationProcessor
2025-12-17 11:23:29 - azec_test - INFO - Starting: ConsolidationProcessor.read()
2025-12-17 11:23:29 - azec_test - INFO - Reading AZ silver data (mvt_const_ptf)
2025-12-17 11:23:29 - azec_test - INFO - Completed: ConsolidationProcessor.read() (Duration: 0.28s)
2025-12-17 11:23:29 - azec_test - INFO - Read 15000 rows
2025-12-17 11:23:29 - azec_test - INFO - Starting: ConsolidationProcessor.transform()
2025-12-17 11:23:29 - azec_test - INFO - STEP 1: Reading AZEC silver data
2025-12-17 11:23:29 - azec_test - INFO - STEP 2: Harmonizing AZ schema
2025-12-17 11:23:29 - azec_test - INFO - STEP 3: Harmonizing AZEC schema
2025-12-17 11:23:30 - azec_test - INFO - STEP 3.5: Applying AZEC-specific transformations
2025-12-17 11:23:30 - azec_test - INFO - STEP 4: Consolidating AZ + AZEC
2025-12-17 11:23:30 - azec_test - INFO - STEP 4.5: Applying common transformations
2025-12-17 11:23:30 - azec_test - INFO - STEP 4.6: Enrichi

[Stage 240:>                                                        (0 + 1) / 1]

2025-12-17 11:24:35 - azec_test - INFO - ✓ SUCCESS: Gold data written successfully
2025-12-17 11:24:35 - azec_test - INFO - Completed: ConsolidationProcessor.write() (Duration: 55.16s)
2025-12-17 11:24:35 - azec_test - INFO - ✓ SUCCESS: ConsolidationProcessor completed successfully


                                                                                

✓ Consolidation: 306,425 rows
  Columns: 280


## 3. Verify Gold Output

In [6]:
if df_gold is not None:
    # DIRCOM distribution (AZ vs AZEC)
    print("DIRCOM distribution:")
    df_gold.groupBy('dircom').count().show()
    
    # Sample output
    df_gold.select('nopol', 'dircom', 'cdprod', 'primeto').show(5)

DIRCOM distribution:
+------+------+
|dircom| count|
+------+------+
|    az|306422|
|  azec|     3|
+------+------+

+-----------+------+------+------------+
|      nopol|dircom|cdprod|     primeto|
+-----------+------+------+------------+
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
|POL00000011|    az| 01012|20937.840558|
+-----------+------+------+------------+
only showing top 5 rows



## Summary

In [7]:
print("="*50)
print("CONSOLIDATION TESTING COMPLETE")
print("="*50)
spark.stop()
print(f"Result: {'✓ Success' if df_gold is not None else '✗ Failed'}")
print("\n→ Run: python main.py --vision 202509 --component ptf_mvt")

CONSOLIDATION TESTING COMPLETE
Result: ✓ Success

→ Run: python main.py --vision 202509 --component ptf_mvt
