#  Test Pipeline - Demo Notebooks

Ten notebook uruchamia wszystkie Demo notebooks w kolejności i raportuje status wykonania.

**Kolejność wykonania:**
1. `00_Setup.ipynb` - Konfiguracja katalogu i schematu
2. `01_EDA_and_Validation.ipynb` - Generowanie danych z defektami
3. `02_Data_Splitting.ipynb` - Podział train/test
4. `03_Data_Imputing.ipynb` - Imputacja brakujących wartości
5. `04_Feature_Transformation.ipynb` - Transformacje cech
6. `05_Feature_Engineering.ipynb` - Inżynieria cech
7. `06_ML_Pipelines.ipynb` - Pipeline ML
8. `07_Feature_Store_MLflow.ipynb` - Feature Store i MLflow

In [0]:
%skip

%sql
use catalog data_ml_preparation;
drop schema data_ml_preparation.ml_dp_trainer cascade

In [0]:
import time
from datetime import datetime

DEMO_NOTEBOOKS = [
    ("00_Setup", "../demo/00_Setup"),
    ("01_EDA_and_Validation", "../demo/01_EDA_and_Validation"),
    ("02_Data_Splitting", "../demo/02_Data_Splitting"),
    ("03_Data_Imputing", "../demo/03_Data_Imputing"),
    ("04_Feature_Transformation", "../demo/04_Feature_Transformation"),
    ("05_Feature_Engineering", "../demo/05_Feature_Engineering"),
    ("06_ML_Pipelines", "../demo/06_ML_Pipelines"),
    ("07_Feature_Store_MLflow", "../demo/07_Feature_Store_MLflow"),
]

NOTEBOOK_TIMEOUT = 600
print(f" Zaplanowano wykonanie {len(DEMO_NOTEBOOKS)} notebooków")

In [0]:
results = []
pipeline_start = time.time()

for name, path in DEMO_NOTEBOOKS:
    print(f"\n▶️ Uruchamiam: {name}")
    start_time = time.time()
    
    try:
        result = dbutils.notebook.run(path, NOTEBOOK_TIMEOUT)
        elapsed = time.time() - start_time
        results.append({"notebook": name, "status": " SUCCESS", "time_sec": round(elapsed, 2), "result": result or "OK"})
        print(f"    SUCCESS ({elapsed:.2f}s)")
    except Exception as e:
        elapsed = time.time() - start_time
        results.append({"notebook": name, "status": " FAILED", "time_sec": round(elapsed, 2), "result": str(e)[:200]})
        print(f"    FAILED ({elapsed:.2f}s): {str(e)[:100]}")

print(f"\n KONIEC: {time.time() - pipeline_start:.1f}s")

In [0]:
import pandas as pd

df_results = pd.DataFrame(results)
success = len([r for r in results if 'SUCCESS' in r['status']])
failed = len([r for r in results if 'FAILED' in r['status']])

print(f"\n Sukces: {success}/{len(results)}")
print(f" Błędy: {failed}/{len(results)}")
display(df_results)

if failed == 0:
    print("\n WSZYSTKIE NOTEBOOKI WYKONANE POMYŚLNIE!")
else:
    print(f"\n️ {failed} BŁĘDÓW!")
    for r in results:
        if 'FAILED' in r['status']:
            print(f" {r['notebook']}: {r['result']}")