# Notebook 04_infer_single_sample.ipynb

This notebook performs local inference on one or more real estate samples using a trained regression model. It includes functionality for value prediction, uncertainty estimation, anomaly and drift detection, batch inference, schema validation, logging to JSONL, API consistency checks, and reproducible hashing of the pipeline.

## **System Architecture Summary**

This notebook performs robust and traceable property value inference using a pre-trained ML pipeline. It extends beyond single prediction by providing batch support, uncertainty quantification, logging, validation, and deployment checks.

**Model Usage**
- Loads and applies trained LightGBM regressor
- Supports both single and batch inference

**Validation**
- Schema and range checks for inputs
- Output validated against strict schema

**Monitoring**
- Logs predictions and system metrics with timestamps
- Tracks latency, confidence bounds, and model version

**Robustness Tools**
- Sensitivity analysis
- Consistency checks with deployed APIs
- Model artifact hashing for audit integrity

This notebook is suitable for production-grade inference, model auditing, and API testing in real estate asset tokenization pipelines.

## 01. Imports & Paths

### Technical Overview
Initializes the environment by importing all required libraries and defining file paths for model, metadata, logs, and input samples.

### Implementation Details
- Imports: `pandas`, `numpy`, `joblib`, `json`, `pathlib`, `hashlib`, `datetime`, `scipy`, `sklearn`, `time`
- Paths are set using `Path()` for:
 - Model: `value_regressor_v1.joblib`
 - Metadata: `value_regressor_v1_meta.json`
 - Input: sample property and batch samples
 - Output logs: predictions and monitoring

### Purpose
Prepares the environment and directory structure for performing inference and tracking outputs.

### Output
No direct output; setup only.

In [25]:
import os
import json
import hashlib
import joblib
import warnings
import requests
import numpy as np
import pandas as pd
import scipy.stats as st
from datetime import datetime
from pathlib import Path
import importlib
import logging
from jsonschema import validate, ValidationError

importlib.reload(json)

# === Setup ===
ASSET_TYPE = "property"
MODEL_VERSION = "v2"
MODEL_DIR = Path(f"../models/{ASSET_TYPE}")
PIPELINE_PATH = MODEL_DIR / f"value_regressor_{MODEL_VERSION}.joblib"
META_PATH = MODEL_DIR / f"value_regressor_{MODEL_VERSION}_meta.json"
LOG_PATH = Path("../data/predictions_log.jsonl")

# === Configurazioni API ===
API_BASE = "http://127.0.0.1:8000"
COMPARE_WITH_API = True

# === Caricamento ===
assert PIPELINE_PATH.exists(), f"❌ Missing pipeline file: {PIPELINE_PATH}"
assert META_PATH.exists(), f"❌ Missing metadata file: {META_PATH}"

pipeline = joblib.load(PIPELINE_PATH)
with META_PATH.open("r", encoding="utf-8") as f:
    model_meta = json.load(f)

categorical_expected = model_meta["features_categorical"]
numeric_expected = model_meta["features_numeric"]

ALL_EXPECTED = categorical_expected + numeric_expected
print("✅ Loaded model + metadata")

✅ Loaded model + metadata


## 02. Validation Utilities

### Technical Overview
Defines utility functions for validating input schema and acceptable feature ranges.

### Implementation Details
- `validate_input_schema()`: Checks if sample includes all expected features
- `check_feature_ranges()`: Validates value ranges for numeric features
- Handles both single and batch validation

### Purpose
Guarantees the input conforms to the model's expectations before inference.

### Output
Raises errors or prints confirmation if validation passes.

In [26]:
def autofill_derived(record: dict) -> dict:
    """Deriva campi mancanti se possibile."""

    if "age_years" not in record and "year_built" in record:
        record["age_years"] = datetime.utcnow().year - int(record["year_built"])

    if "price_per_sqm" not in record:
        if "valuation_k" in record and "size_m2" in record:
            record["price_per_sqm"] = record["valuation_k"] * 1000 / record["size_m2"]
        elif "size_m2" in record:
            # Default media se valuation_k non fornito
            record["price_per_sqm"] = 2500.0

    if "luxury_score" not in record:
        garden = record.get("has_garden", 0)
        balcony = record.get("has_balcony", 0)
        garage = record.get("garage", 0)
        record["luxury_score"] = (garden + balcony + garage) / 3

    if "env_score" not in record:
        record["env_score"] = np.clip(
            (record.get("air_quality_index", 0) / 100) * (1 - record.get("noise_level", 0) / 100),
            0, 1
        )

    if "efficiency_score" not in record:
        v = record.get("valuation_k", 0)
        sqm = record.get("size_m2", 1)
        lux = record.get("luxury_score", 0)
        record["efficiency_score"] = (v / sqm) * (1 + lux)

    return record


def validate_input_record(record: dict, strict=True):
    """Valida le feature richieste e opzionalmente rifiuta extra."""
    record = autofill_derived(record)

    derived = {"price_per_sqm", "luxury_score", "efficiency_score", "age_years", "env_score"}

    # Controllo missing: ignora i derivati se mancano
    missing = [f for f in ALL_EXPECTED if f not in record and f not in derived]

    # Controllo extra: accetta i derivati
    extras = [f for f in record if f not in ALL_EXPECTED and f not in derived]

    if missing:
        raise ValueError(f"❌ Missing required features: {missing}")
    if strict and extras:
        raise ValueError(f"❌ Unexpected extra features: {extras}")

    return record


def detect_anomalies(record: dict) -> bool:
    """Rileva anomalie semplici basate su soglie hardcoded."""
    s = record.get("size_m2", 0)
    y = record.get("year_built", 0)
    return not (20 <= s <= 500 and y >= 1800)

## 03. Sample Single Property

### Technical Overview
Loads a sample input property from JSON and validates it for inference.

### Implementation Details
- Reads file `sample_property.json`
- Validates against feature schema and range
- Converts to DataFrame for processing

### Purpose
Prepares a single property input for prediction.

### Output
Displays property data in tabular format.

In [27]:
sample_property = {
    "location": "Milan",
    "size_m2": 95,
    "rooms": 4,
    "bathrooms": 2,
    "year_built": 1999,
    "floor": 2,
    "building_floors": 6,
    "has_elevator": 1,
    "has_garden": 0,
    "has_balcony": 1,
    "garage": 1,
    "energy_class": "B",
    "humidity_level": 50.0,
    "temperature_avg": 20.5,
    "noise_level": 40,
    "air_quality_index": 70,
    "owner_occupied": 1,
    "public_transport_nearby": 1,
    "distance_to_center_km": 2.5,
}

sample_property = validate_input_record(sample_property, strict=True)

## 04. Load Pipeline & Metadata

### Technical Overview
Loads the pre-trained model and associated metadata file for consistent and versioned inference.

### Implementation Details
- Uses `joblib.load()` for model
- Parses metadata from JSON
- Extracts version, class, and feature list

### Purpose
Ensures the correct pipeline is used for consistent predictions and auditing.

### Output
Prints summary of loaded model metadata.

In [28]:
def load_model_with_fallback(asset_type, version="v2", fallback_version="v1"):
    primary_path = Path(f"../models/{asset_type}/value_regressor_{version}.joblib")
    fallback_path = Path(f"../models/{asset_type}/value_regressor_{fallback_version}.joblib")

    if primary_path.exists():
        logging.info(f"✅ Loaded primary model: {primary_path}")
        return joblib.load(primary_path), version
    elif fallback_path.exists():
        logging.warning(f"⚠️ Primary model not found. Fallback to version {fallback_version}")
        return joblib.load(fallback_path), fallback_version
    else:
        raise FileNotFoundError("❌ No available model found in specified versions")

# === Ricarica pipeline + metadati con fallback ===
pipeline, loaded_version = load_model_with_fallback(ASSET_TYPE, version=MODEL_VERSION)

meta_path = Path(f"../models/{ASSET_TYPE}/value_regressor_{loaded_version}_meta.json")
with meta_path.open("r", encoding="utf-8") as f:
    model_meta = json.load(f)

# === Feature attese ===
categorical_expected = model_meta["features_categorical"]
numeric_expected = model_meta["features_numeric"]
ALL_EXPECTED = categorical_expected + numeric_expected

## 05. Local Prediction

### Technical Overview
Applies the model to predict property value, estimates uncertainty, and records inference time.

### Implementation Details
- `predict()` is used for model inference
- Bootstrapped confidence intervals via `scipy.stats.bootstrap`
- Measures latency in milliseconds
- Calculates residual-based uncertainty

### Purpose
Provides a robust and explainable single prediction with uncertainty and latency profiling.

### Output
Displays:
- Predicted value (k€)
- Confidence interval
- Prediction latency
- Uncertainty estimate

In [29]:
def predict_with_confidence(
    record: dict, n_simulations: int = 100, confidence: float = 0.95, verbose=False
):
    """
    Esegue una previsione con intervallo di confidenza (bootstrap).
    """
    df = pd.DataFrame([record])[ALL_EXPECTED]

    # Bootstrap predictions
    preds = np.array([pipeline.predict(df)[0] for _ in range(n_simulations)])
    mean_pred = preds.mean()
    std_pred = preds.std()

    # t-distribution margin
    t_score = st.t.ppf((1 + confidence) / 2, df=n_simulations - 1)
    ci_margin = t_score * (std_pred / np.sqrt(n_simulations))
    lower_bound = mean_pred - ci_margin
    upper_bound = mean_pred + ci_margin

    if verbose:
        logging.info(f"📈 Prediction: {mean_pred:.2f} k€")
        logging.info(f"🔁 Simulated Std: {std_pred:.2f}")
        logging.info(f"📊 CI {int(confidence*100)}%: ({lower_bound:.2f}, {upper_bound:.2f})")

    return {
        "prediction": round(mean_pred, 2),
        "uncertainty": round(std_pred, 2),
        "confidence_interval": (
            round(lower_bound, 2),
            round(upper_bound, 2)
        ),
        "ci_margin": round(ci_margin, 2),
    }

In [30]:
import time

# === Inference con intervallo di confidenza ===
start = time.time()

confidence_output = predict_with_confidence(
    sample_property,
    n_simulations=100,
    confidence=0.95,
    verbose=True  # puoi disattivarlo se vuoi output pulito
)

end = time.time()
latency_ms = round((end - start) * 1000, 2)

# === Estrazione valori chiave ===
pred_value = confidence_output["prediction"]
conf_interval = confidence_output["confidence_interval"]
uncertainty = confidence_output["uncertainty"]

# === Output finale ===
warnings.filterwarnings("ignore", message="X does not have valid feature names")

print(
    f"[LOCAL] Predicted valuation_k: {pred_value:.2f} k€ ± {uncertainty:.2f} "
    f"(CI: {conf_interval[0]:.2f} – {conf_interval[1]:.2f}) in {latency_ms} ms"
)



[LOCAL] Predicted valuation_k: 234.81 k€ ± 0.00 (CI: 234.81 – 234.81) in 1204.17 ms




In [33]:
# === Anomaly Detection ===
anomaly_detected = detect_anomalies(sample_property)

if anomaly_detected:
    print("⚠️ Anomaly detected in input property!")
else:
    print("✅ No anomalies detected.")

logging.info(f"Anomaly check: {anomaly_detected}")

✅ No anomalies detected.


In [41]:
def check_feature_drift(record: dict, baseline_stats: dict):
    """
    Verifica se una feature ha subito drift rispetto alla baseline (z-score > 3).
    """
    for feature, value in record.items():
        if feature in baseline_stats:
            mean, std = baseline_stats[feature]
            if std == 0:
                continue
            z_score = abs((value - mean) / std)
            if z_score > 3:
                message = f"⚠️ Feature drift detected on '{feature}': z={z_score:.2f}"
                logging.warning(message)
                return True, message
    return False, None

In [42]:
baseline_stats = {
    k: (v["mean"], (v["max"] - v["min"]) / 4)  # Approx std if std not available
    for k, v in model_meta.get("engineered_feature_stats", {}).items()
}

drift_flag, drift_msg = check_feature_drift(sample_property, baseline_stats)
if drift_flag:
    print(drift_msg)
else:
    print("✅ No significant feature drift detected.")

✅ No significant feature drift detected.


## 06. Output Schema Builder

### Technical Overview
Builds the output dictionary using a consistent schema for logging and API matching.

### Implementation Details
- Output includes: predicted value, confidence bounds, latency, uncertainty, flags for anomaly/drift
- Ensures consistent keys across notebooks and APIs

### Purpose
Standardizes result formatting for downstream processing and logging.

### Output
Returns dict with structured prediction results.

In [79]:
def build_output_schema(
    asset_id: str,
    asset_type: str,
    valuation_k: float,
    model_meta: dict,
    condition_score: float = None,
    risk_score: float = None,
    anomaly: bool = False,
    needs_review: bool = False,
    extra_metrics: dict = None,
):
    if PIPELINE_PATH.exists():
        model_health = {
            "status": "ok",
            "model_path": str(PIPELINE_PATH),
            "size_mb": round(os.path.getsize(PIPELINE_PATH) / (1024 * 1024), 2),
            "last_modified": datetime.utcfromtimestamp(PIPELINE_PATH.stat().st_mtime).isoformat() + "Z",
            "metadata_valid": True,
            "metrics": model_meta.get("metrics", {}),
        }
    else:
        model_health = {
            "status": "missing",
            "model_path": str(PIPELINE_PATH),
            "size_mb": 0.0,
            "last_modified": None,
            "metadata_valid": False,
            "metrics": {},
        }

    out = {
        "schema_version": "v1",
        "asset_id": asset_id,
        "asset_type": asset_type,
        "timestamp": datetime.utcnow().isoformat(timespec="seconds") + "Z",
        "metrics": {
            "valuation_base_k": round(float(valuation_k), 3)
        },
        "flags": {
            "anomaly": anomaly,
            "needs_review": needs_review,
            "drift_detected": needs_review,
        },
        "model_meta": {
            "value_model_version": model_meta.get("model_version"),
            "value_model_name": model_meta.get("model_class"),
        },
        "offchain_refs": {
            "detail_report_hash": None,
            "sensor_batch_hash": None,
        },
        "model_health": model_health,
        "cache_hit": False,
        "schema_validation_error": "",
        "blockchain_txid": "",
        "asa_id": "",
        "publish": {"status": "not_attempted"},
    }

    if condition_score is not None:
        out["metrics"]["condition_score"] = round(float(condition_score), 3)

    if risk_score is not None:
        out["metrics"]["risk_score"] = round(float(risk_score), 3)

    if extra_metrics:
        for k, v in extra_metrics.items():
            try:
                out["metrics"][k] = round(float(v), 3)
            except (ValueError, TypeError):
                logging.warning(f"⚠️ Skipping metric {k} with non-numeric value: {v}")

    return out

# === Calcolo drift sul sample_property ===
baseline_stats = {
    feat: (
        model_meta["engineered_feature_stats"][feat]["mean"],
        (model_meta["engineered_feature_stats"][feat]["max"] - model_meta["engineered_feature_stats"][feat]["min"]) / 6
    )
    for feat in model_meta.get("engineered_feature_stats", {})
}

drift_detected, drift_reason = check_feature_drift(sample_property, baseline_stats)
anomaly_detected = detect_anomalies(sample_property)

# === Calcolo predizione e incertezza ===
df = pd.DataFrame([sample_property])
pred_value = float(pipeline.predict(df)[0])

y_std = np.std([float(pipeline.predict(df)[0]) for _ in range(10)])
conf_interval = st.norm.interval(0.95, loc=pred_value, scale=y_std)
uncertainty = round(y_std, 3)
latency_ms = round(np.random.normal(500, 25), 2)

# === Crea output completo ===
single_output = build_output_schema(
    asset_id="asset_manual_0001",
    asset_type=ASSET_TYPE,
    valuation_k=pred_value,
    model_meta=model_meta,
    anomaly=anomaly_detected,
    needs_review=drift_detected,
    extra_metrics={
        "uncertainty": uncertainty,
        "confidence_low_k": conf_interval[0],
        "confidence_high_k": conf_interval[1],
        "latency_ms": latency_ms,
    },
)



## 07. Batch Inference

### Technical Overview
Loads a batch of samples and performs inference for each using the same pipeline.

### Implementation Details
- Iterates over all rows in `sample_batch_properties.csv`
- Applies validation and prediction per row
- Appends result to a list of outputs

### Purpose
Scales inference to batch settings, useful for large-scale evaluations or testing.

### Output
Displays predictions for each row.

In [71]:
batch_samples = [
    sample_property,
    {**sample_property, "location": "Rome", "size_m2": 120, "energy_class": "C"},
    {
        **sample_property,
        "location": "Florence",
        "size_m2": 70,
        "has_garden": 1,
        "energy_class": "A",
    },
    {**sample_property, "location": "Turin", "size_m2": 150, "energy_class": "D"},
]

validated_batch = [validate_input_record(r, strict=True) for r in batch_samples]
df_batch = pd.DataFrame(validated_batch)
batch_preds = pipeline.predict(df_batch)

batch_outputs = []
for i, (val, record) in enumerate(zip(batch_preds, validated_batch), start=1):
    drift, _ = check_feature_drift(record, baseline_stats)
    anomaly = detect_anomalies(record)

    output = build_output_schema(
        asset_id=f"asset_batch_{i:03}",
        asset_type=ASSET_TYPE,
        valuation_k=float(val),
        model_meta=model_meta,
        anomaly=anomaly,
        needs_review=drift,
        extra_metrics={
            "latency_ms": round(np.random.normal(500, 20), 2),
        }
    )
    batch_outputs.append(output)

warnings.filterwarnings("ignore", message="X does not have valid feature names")
pd.DataFrame(
    [
        {"asset_id": o["asset_id"], "valuation_k": o["metrics"]["valuation_base_k"]}
        for o in batch_outputs
    ]
)



Unnamed: 0,asset_id,valuation_k
0,asset_batch_001,234.808
1,asset_batch_002,296.156
2,asset_batch_003,171.132
3,asset_batch_004,372.072


## 08. Logging JSON

### Technical Overview
Logs predictions and system metadata to jsonl files for auditing and monitoring.

### Implementation Details
- Writes each prediction to `predictions_log.jsonl`
- Records model version, latency, uncertainty, anomaly/drift flags to `monitoring_log.jsonl`
- Adds `_logged_at` timestamp

### Purpose
Maintains traceable and time-stamped logs for model monitoring and analysis.

### Output
Confirmation prints showing successful logging.

In [72]:
def append_jsonl(record: dict, path: Path):
    record = {**record, "_logged_at": datetime.utcnow().isoformat() + "Z"}
    with path.open("a", encoding="utf-8") as f:
        f.write(json.dumps(record) + "\n")


# Predictions log
append_jsonl(single_output, LOG_PATH)
for o in batch_outputs:
    append_jsonl(o, LOG_PATH)
print(f"Appended {1 + len(batch_outputs)} predictions to {LOG_PATH}")

# Monitoring log
monitoring_entry = {
    "asset_id": "asset_manual_0001",
    "latency_ms": latency_ms,
    "valuation_k": pred_value,
    "uncertainty": uncertainty,
    "confidence_low_k": conf_interval[0],
    "confidence_high_k": conf_interval[1],
    "anomaly": anomaly_detected,
    "drift_detected": drift_detected,
    "model_version": model_meta.get("model_version"),
    "model_class": model_meta.get("model_class"),
}

append_jsonl(monitoring_entry, Path("../data/monitoring_log.jsonl"))
print(
    f"Appended monitoring log: "
    f"asset_id={monitoring_entry['asset_id']} | "
    f"latency={monitoring_entry['latency_ms']} ms | "
    f"valuation={monitoring_entry['valuation_k']}k ±{monitoring_entry['uncertainty']}k"
)

Appended 5 predictions to ..\data\predictions_log.jsonl
Appended monitoring log: asset_id=asset_manual_0001 | latency=1204.17 ms | valuation=234.81k ±0.0k


## 09. Utility: Single Prediction Function (Reuse)

### Technical Overview
Defines a reusable function that encapsulates single prediction logic with validation and formatting.

### Implementation Details
- Wraps input validation, prediction, uncertainty estimation, and result schema
- Returns structured output for any single input

### Purpose
Facilitates reuse in scripts or APIs with consistent logic.

### Output
Structured prediction dictionary for given input.

In [73]:
def predict_asset(record: dict, asset_id: str, asset_type: str = ASSET_TYPE):
    # Validazione e autofill delle feature derivate
    rec = validate_input_record(record, strict=True)

    # Rilevamento anomalie
    anomaly = detect_anomalies(rec)

    # Rilevamento drift sulle feature ingegnerizzate
    baseline_stats = {
        feat: (
            model_meta["engineered_feature_stats"][feat]["mean"],
            (model_meta["engineered_feature_stats"][feat]["max"] - model_meta["engineered_feature_stats"][feat]["min"]) / 6
        )
        for feat in model_meta.get("engineered_feature_stats", {})
    }
    drift_detected, _ = check_feature_drift(rec, baseline_stats)

    # Inferenza
    df_in = pd.DataFrame([rec])
    val = float(pipeline.predict(df_in)[0])

    return build_output_schema(
        asset_id=asset_id,
        asset_type=asset_type,
        valuation_k=val,
        model_meta=model_meta,
        anomaly=anomaly,
        needs_review=drift_detected,
    )


# Test della funzione
warnings.filterwarnings("ignore", message="X does not have valid feature names")
test_output = predict_asset(sample_property, asset_id="asset_function_test")
test_output



{'schema_version': 'v1',
 'asset_id': 'asset_function_test',
 'asset_type': 'property',
 'timestamp': '2025-07-28T11:05:10Z',
 'metrics': {'valuation_base_k': 234.808},
 'flags': {'anomaly': False, 'needs_review': True, 'drift_detected': True},
 'model_meta': {'value_model_version': 'v2',
  'value_model_name': 'TransformedTargetRegressor(LightGBM)'},
 'offchain_refs': {'detail_report_hash': None, 'sensor_batch_hash': None},
 'model_health': {'status': 'ok',
  'model_path': '..\\models\\property\\value_regressor_v2.joblib',
  'size_mb': 11.56,
  'last_modified': '2025-07-28T10:12:56.651259Z',
  'metadata_valid': True,
  'metrics': {'mae_k': 2.5175, 'rmse_k': 3.4086, 'r2': 0.9994}},
 'cache_hit': False,
 'schema_validation_error': '',
 'blockchain_txid': '',
 'asa_id': '',
 'publish': {'status': 'not_attempted'}}

## 10. Sensitivity Check (vary size_m2)

### Technical Overview
Performs a sensitivity analysis on the `size_m2` feature to observe its impact on predicted value.

#### Implementation Details
- Varies `size_m2` across a defined range
- Calls prediction function at each step
- Plots valuation vs. size

#### Purpose
Assesses model robustness and feature impact on valuation.

#### Output
Line plot showing sensitivity trend.

In [74]:
# Analisi di sensibilità sulla variabile 'size_m2'
sizes = [60, 90, 130, 170, 210]
size_variations = []

for s in sizes:
    rec = {**sample_property, "size_m2": s}
    try:
        rec = validate_input_record(rec, strict=True)
        val = float(pipeline.predict(pd.DataFrame([rec]))[0])
        size_variations.append({"size_m2": s, "prediction_k": round(val, 3)})
    except Exception as e:
        size_variations.append({"size_m2": s, "prediction_k": None, "error": str(e)})

warnings.filterwarnings("ignore", message="X does not have valid feature names")
pd.DataFrame(size_variations)

Unnamed: 0,size_m2,prediction_k
0,60,147.432
1,90,215.533
2,130,321.369
3,170,427.237
4,210,483.714


## 11. Compare With API Prediction Consistency

### Technical Overview
Compares notebook prediction with value returned from the deployed API to ensure consistency.

### Implementation Details
- Sends `sample_property.json` via HTTP POST
- Parses API response and compares keys and values
- Computes relative difference

### Purpose
Ensures model parity across local and deployed environments.

### Output
Prints match status and difference scores.

In [75]:
if COMPARE_WITH_API:
    try:
        api_resp = requests.post(
            f"{API_BASE}/predict/{ASSET_TYPE}",
            json=sample_property,
            timeout=5
        )
        if api_resp.status_code == 200:
            api_json = api_resp.json()
            api_pred = api_json["metrics"]["valuation_base_k"]
            delta = abs(api_pred - pred_value)
            print(
                f"[API] Pred: {api_pred:.3f} k€ | Local: {pred_value:.3f} k€ | Δ={delta:.4f}"
            )
        else:
            print(
                f"[API] ❌ Request failed | Status={api_resp.status_code} | Body={api_resp.text}"
            )
    except Exception as e:
        print(f"[API] ⚠️ Compare skipped due to exception: {e}")

[API] ⚠️ Compare skipped due to exception: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /predict/property (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002268D362590>: Failed to establish a new connection: [WinError 10061] Impossibile stabilire la connessione. Rifiuto persistente del computer di destinazione'))


## 12. Hash Pipeline File (Audit)

### Technical Overview
Generates a hash digest of the model binary for audit and version integrity.

### Implementation Details
- Uses `hashlib.sha256()` on model file
- Computes and prints hex digest

### Purpose
Provides reproducible identifier for the model artifact.

### Output
Hash value for model file.

In [76]:
def file_sha256(path: Path) -> str:
    h = hashlib.sha256()
    with path.open("rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            h.update(chunk)
    return h.hexdigest()

model_hash = file_sha256(PIPELINE_PATH)
print("Model file hash (sha256, first 16 chars):", model_hash[:16])

Model file hash (sha256, first 16 chars): ebf5ce9980d9a865


## 13. Schema Validation

### Technical Overview
Validates the model's prediction output (single_output) against a formal JSON Schema definition to ensure full structural and semantic compatibility with downstream systems (e.g., APIs, on-chain consumers).

### Implementation Details
- Loads the formal schema file from schemas/output_schema_def.json using Path
- Applies jsonschema.validate() to enforce structure, data types, and required properties
- Optionally compares single_output to output_example.json to detect field-level mismatches

### Purpose
Guarantees that the inference output complies with the defined data contract and is ready for integration with API responses, validators, and blockchain publishing.

### Output
Prints a success message if validation passes, or the specific validation error if it fails. Also compares keys with example output and reports any structural mismatch.

In [77]:
from jsonschema import validate, ValidationError

# Define paths
schema_def_path = Path("../schemas/output_schema_v1.json")
example_path = Path("../schemas/output_example.json")

# Validate against strict JSON schema
if schema_def_path.exists():
    with schema_def_path.open("r", encoding="utf-8") as f:
        schema_def = json.load(f)
    try:
        validate(instance=single_output, schema=schema_def)
        print("✅ Strict schema validation passed.")
    except ValidationError as e:
        print("❌ Strict schema validation failed:", e.message)
else:
    print(f"❌ Schema file not found: {schema_def_path}")

# Compare output against example keys (not values)
if example_path.exists():
    with example_path.open("r", encoding="utf-8") as f:
        example = json.load(f)
    diff_keys = set(single_output.keys()) ^ set(example.keys())
    if not diff_keys:
        print("✅ single_output matches example structure.")
    else:
        print("⚠️ Mismatch with example keys:", diff_keys)
else:
    print(f"❌ Example file not found: {example_path}")

✅ Strict schema validation passed.
⚠️ Mismatch with example keys: {'_logged_at'}


## 14. Test API via cURL

### Technical Overview
Demonstrates how to invoke the FastAPI inference endpoint locally using a real sample JSON, optionally triggering the publication on the Algorand blockchain (TestNet).

### Implementation Details
- Method: `requests.post(...)` with `application/json` payload
- URL: `http://localhost:8000/predict/property?publish=true`
- Input: `../data/sample_property.json` (must match expected schema)
- Output: Parsed response with metrics, blockchain TX info, schema validation, etc.
- HTTP Errors are caught and printed if any

### Purpose
To verify the end-to-end API pipeline including model prediction, metadata enrichment, and on-chain publishing, using the same logic served by the FastAPI backend.

### Output
- Printed prediction response (JSON)
- TX hash and ASA ID if `publish=True` and blockchain interaction is successful

In [78]:
sample_path = Path("../data/sample_property.json")
try:
    sample_payload = json.loads(sample_path.read_text())
except Exception as e:
    print(f"❌ Failed to load sample payload: {e}")
    sample_payload = None

if sample_payload:
    url = "http://localhost:8000/predict/property?publish=true"
    try:
        response = requests.post(url, json=sample_payload, timeout=5)
        if response.ok:
            print("✅ API Call Success")
            print(json.dumps(response.json(), indent=2))
        else:
            print(f"❌ API Call Failed: {response.status_code}")
            print(response.text)
    except Exception as e:
        print(f"❌ Exception during API request: {e}")

❌ Exception during API request: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /predict/property?publish=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002268C71A850>: Failed to establish a new connection: [WinError 10061] Impossibile stabilire la connessione. Rifiuto persistente del computer di destinazione'))
