# Fase 06 — Packaging / System Composition

Este notebook implementa **completamente** la Fase 06 del pipeline **MLOps4OFP**.

👉 Es **autónomo** y **equivalente 1:1** al script `06_packaging.py`.
👉 Puede ejecutarse incluso si el script no existe.

## Rol de F06
- Empaquetar un **sistema autocontenido**
- Resolver linajes aguas arriba
- Materializar conocimiento (objetivos y eventos)
- Sellar artefactos y metadata

**NO ejecuta inferencia.**
**NO calcula métricas.**



## Contrato de entrada y salida

### Entrada
- Variante F06 existente (`params.yaml` ya creado)
- Variantes F05 padres
- Artefactos aguas arriba (F02–F05)

### Salida
En `executions/06_packaging/<vNNN>/`:
- `models/`
- `replay/replay_events.parquet`
- `objectives.json`
- `events_catalog.json`
- `06_packaging_metadata.json`



## Parámetros de ejecución


In [None]:
VARIANT = "v001"   # ← ajustar
PHASE = "06_packaging"


## Bootstrap y detección del proyecto


In [None]:
import sys, json, yaml, shutil
from pathlib import Path
from datetime import datetime, timezone
from time import perf_counter
import pyarrow.parquet as pq

execution_dir = Path.cwd().resolve()
project_root = execution_dir
for _ in range(10):
    if (project_root / 'mlops4ofp').exists():
        break
    project_root = project_root.parent
else:
    raise RuntimeError('No se pudo localizar project root')

sys.path.insert(0, str(project_root))

from mlops4ofp.tools.params_manager import ParamsManager
from mlops4ofp.tools.traceability import write_metadata
from mlops4ofp.tools.artifacts import get_git_hash


## Carga de parámetros F06


In [None]:
t_start = perf_counter()
pm = ParamsManager(PHASE, project_root)
pm.set_current(VARIANT)
variant_root = pm.current_variant_dir()

with open(variant_root / 'params.yaml', 'r', encoding='utf-8') as f:
    params = yaml.safe_load(f)

parent_variants_f05 = params['parent_variants_f05']
temporal = params['temporal']
models_cfg = params['models']
replay_cfg = params['replay']

print(json.dumps(params, indent=2))


## Resolución de linaje completo


In [None]:
lineage = {'f05': set(parent_variants_f05), 'f04': set(), 'f03': set(), 'f02': set()}

for v05 in parent_variants_f05:
    f05p = yaml.safe_load((project_root / 'executions/05_modeling' / v05 / 'params.yaml').read_text())
    lineage['f04'].add(f05p['parent_variant'])

for v04 in lineage['f04']:
    f04p = yaml.safe_load((project_root / 'executions/04_targetengineering' / v04 / 'params.yaml').read_text())
    lineage['f03'].add(f04p['parent_variant'])

for v03 in lineage['f03']:
    f03p = yaml.safe_load((project_root / 'executions/03_preparewindowsds' / v03 / 'params.yaml').read_text())
    lineage['f02'].add(f03p['parent_variant'])

assert len(lineage['f02']) == 1, 'F06 requiere un único F02 común'
parent_f02 = list(lineage['f02'])[0]

print({k: sorted(v) for k, v in lineage.items()})


## Materialización de objetivos (F04)


In [None]:
objectives = {}
for v04 in lineage['f04']:
    f04p = yaml.safe_load((project_root / 'executions/04_targetengineering' / v04 / 'params.yaml').read_text())
    objectives[v04] = f04p.get('target_expression')

(variant_root / 'objectives.json').write_text(json.dumps(objectives, indent=2))
print('[OK] objectives.json generado')


## Materialización del catálogo de eventos (F02)


In [None]:
f02p = yaml.safe_load((project_root / 'executions/02_prepareeventsds' / parent_f02 / 'params.yaml').read_text())
events_catalog = f02p.get('event_catalog')
(variant_root / 'events_catalog.json').write_text(json.dumps(events_catalog, indent=2))
print('[OK] events_catalog.json generado')


## Copia de modelos seleccionados


In [None]:
models_dir = variant_root / 'models'
models_dir.mkdir(exist_ok=True)
selected_models = []

for m in models_cfg:
    src = project_root / 'executions/05_modeling' / m['source_f05'] / 'models' / m['model_id']
    dst = models_dir / f"{m['f04_id']}__{m['model_id']}"
    if dst.exists(): shutil.rmtree(dst)
    shutil.copytree(src, dst)
    selected_models.append(m)

print(f'[OK] {len(selected_models)} modelos copiados')


## Materialización del replay


In [None]:
replay_dir = variant_root / 'replay'
replay_dir.mkdir(exist_ok=True)
replay_src = project_root / 'executions/02_prepareeventsds' / parent_f02 / '02_prepareeventsds_dataset.parquet'
pq.write_table(pq.read_table(replay_src), replay_dir / 'replay_events.parquet')
print('[OK] replay materializado')


## Escritura de metadata y trazabilidad


In [None]:
metadata = {
    'phase': PHASE,
    'variant': VARIANT,
    'created_at': datetime.now(timezone.utc).isoformat(),
    'git_commit': get_git_hash(),
    'temporal': temporal,
    'lineage': {k: sorted(v) for k, v in lineage.items()},
    'models': selected_models,
}

meta_path = variant_root / f'{PHASE}_metadata.json'
meta_path.write_text(json.dumps(metadata, indent=2))

write_metadata(
    stage=PHASE,
    variant=VARIANT,
    parent_variant=None,
    parent_variants=parent_variants_f05,
    inputs=[str(replay_src)],
    outputs=[str(models_dir), str(replay_dir)],
    params=params,
    metadata_path=meta_path,
)

print(f'[DONE] F06 completada en {perf_counter() - t_start:.1f}s')
