<h1 style="text-align: center; font-size: 50px;"> Fine-Tuned Model Registration Service </h1>

This notebook demonstrates how to register a fine-tuned LLM comparison service that allows switching between base and fine-tuned models through a single MLflow endpoint. This follows the same pattern used across all AI-Blueprints for consistent model deployment and serving.

# Notebook Overview

- Start Execution
- Install and Import Libraries
- Configure Settings
- Verify and Prepare Model Assets
- Model Service Registration to MLFlow

 # Start Execution

In [1]:
import logging
import time

# Configure logger
logger: logging.Logger = logging.getLogger("register_model_logger")
logger.setLevel(logging.INFO)
logger.propagate = False  # Prevent duplicate logs from parent loggers

# Set formatter
formatter: logging.Formatter = logging.Formatter(
    fmt="%(asctime)s - %(levelname)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)

# Configure and attach stream handler
stream_handler: logging.StreamHandler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)

In [2]:
start_time = time.time()  

logger.info("Model registration notebook execution started.")

2025-08-21 13:48:42 - INFO - Model registration notebook execution started.


# Install and Import Libraries

In [3]:
%%time

%pip install -r ../requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.
CPU times: user 57.3 ms, sys: 42.2 ms, total: 99.4 ms
Wall time: 2.56 s


In [4]:
import os
import sys
import yaml
from pathlib import Path
import warnings
import mlflow
from typing import Dict, Any, Optional, Union, List, Tuple


# Add the core directory to the path to import utils
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))

# ===============================
# 🧠 Model Selection & Loading
# ===============================
from core.selection.model_selection import ModelSelector

# ===============================
# 🚀 Deployment & Registration
# ===============================
from core.deploy.deploy_fine_tuning import register_llm_comparison_model

# ===============================
# ⚙️ Utility Functions
# ===============================
from src.utils import (
    load_configuration,
    load_secrets,
    load_secrets_to_env,
    configure_proxy,
    login_huggingface,
    get_configs_dir,
    get_fine_tuned_models_dir,
    get_models_dir,
    format_model_path
)

2025-08-21 13:48:57.235874: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-21 13:48:57.248843: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1755784137.265235    1561 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1755784137.270416    1561 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1755784137.283734    1561 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

# Configure Settings

In [5]:
# Suppress Python warnings
warnings.filterwarnings("ignore")

In [6]:
# Configuration paths and parameters
DEMO_FOLDER = Path("../demo")
CONFIG_PATH = str(get_configs_dir() / "config.yaml")
SECRETS_PATH = str(get_configs_dir() / "secrets.yaml")
MLFLOW_EXPERIMENT_NAME = "AIStudio-Fine-Tuning-Experiment"
MODEL_SERVICE_RUN_NAME = "AIStudio-Fine-Tuning-Service-Run"
MODEL_SERVICE_NAME = "AIStudio-Fine-Tuning-Model"

# Model configuration - update these based on your training
BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Update to match your base model
FINE_TUNED_MODEL_NAME = "Orpo-TinyLlama-1.1B-Chat-v1.0-FT"  # Update to match your fine-tuned model

logger.info("📋 Model Registration Configuration:")
logger.info(f"   • Base model (HF): {BASE_MODEL}")
logger.info(f"   • Fine-tuned model: {FINE_TUNED_MODEL_NAME}")
logger.info(f"   • MLflow experiment: {MLFLOW_EXPERIMENT_NAME}")
logger.info(f"   • Service name: {MODEL_SERVICE_NAME}")

2025-08-21 13:49:01 - INFO - 📋 Model Registration Configuration:
2025-08-21 13:49:01 - INFO -    • Base model (HF): TinyLlama/TinyLlama-1.1B-Chat-v1.0
2025-08-21 13:49:01 - INFO -    • Fine-tuned model: Orpo-TinyLlama-1.1B-Chat-v1.0-FT
2025-08-21 13:49:01 - INFO -    • MLflow experiment: AIStudio-Fine-Tuning-Experiment
2025-08-21 13:49:01 - INFO -    • Service name: AIStudio-Fine-Tuning-Model


### Configuration and Secrets Loading

In this section, we load configuration parameters and API keys from separate YAML files. This separation helps maintain security by keeping sensitive information (API keys) separate from configuration settings.

- **config.yaml**: Contains non-sensitive configuration parameters like model sources and URLs
- **secrets.yaml**: Contains sensitive API keys for services like HuggingFace
- *(Optional for Premium users)* Secrets such as API keys for services like HuggingFace can be stored as environment variables for the project and loaded into the notebook (see the project's README file for steps on how to save secrets in Secrets Manager).

In [7]:
# Load secrets from secrets.yaml file (if it exists) into environment
if Path(SECRETS_PATH).exists():
    load_secrets_to_env(SECRETS_PATH)
else:
    print(f"No secrets file found at {SECRETS_PATH}; relying on preexisting environment")

# Retrieve secrets from environment
try:
    secrets = load_secrets()
except ValueError:
    secrets = {}

# Load configuration and secrets
config = load_configuration(CONFIG_PATH)

print("✅ Configuration loaded successfully")
print("✅ Secrets loaded successfully")

✅ Loaded 2 secrets into environment variables.
✅ Configuration loaded successfully
✅ Secrets loaded successfully


In [8]:
# Configure proxy if needed
configure_proxy(config)

### 🔐 Login to Hugging Face

To access gated models (e.g., LLaMA, Mistral, or Gemma), you must authenticate using your Hugging Face token.

Make sure your `secrets.yaml` file (or AIS Secrets Manager for Premium Users) contains the following key:

```yaml
AIS_HUGGINGFACE_API_KEY: your_huggingface_token
```

**Note**: Please refer to this project's README for detailed instuctions on how to configure secrets.

In [9]:
# Login to Hugging Face (required for downloading gated models)
try:
    login_huggingface(secrets)
    logger.info("✅ Hugging Face authentication successful")
except Exception as e:
    logger.warning(f"⚠️ Hugging Face authentication failed: {e}")
    logger.info("Some models may not be accessible if they require authentication")

2025-08-21 13:49:01 - INFO - ✅ Hugging Face authentication successful


✅ Logged into Hugging Face successfully.


# Verify and Prepare Model Assets

Before registering the models, let's verify that both the base model and fine-tuned model are accessible. If the base model hasn't been downloaded locally yet, we'll download it using the same approach as the training workflow.

In [10]:
def verify_and_prepare_model_assets():
    """Verify and prepare both base and fine-tuned model assets."""
    
    # Check fine-tuned model directory
    fine_tuned_dir = get_fine_tuned_models_dir()
    fine_tuned_path = fine_tuned_dir / FINE_TUNED_MODEL_NAME
    
    if fine_tuned_path.exists():
        logger.info(f"✅ Fine-tuned model found: {fine_tuned_path}")
        fine_tuned_available = True
    else:
        logger.warning(f"⚠️ Fine-tuned model not found: {fine_tuned_path}")
        logger.info("Please run the run-workflow.ipynb notebook first to create the fine-tuned model")
        fine_tuned_available = False
    
    # Handle base model - download locally if needed using ModelSelector
    logger.info(f"🔍 Checking base model: {BASE_MODEL}")
    
    try:
        # Use ModelSelector to handle model downloading and verification
        selector = ModelSelector()
        selector.select_model(BASE_MODEL)
        
        # Get the local model path
        base_model_local_path = selector.format_model_path(BASE_MODEL)
        
        logger.info(f"✅ Base model prepared locally: {base_model_local_path}")
        
        return fine_tuned_available, base_model_local_path
        
    except Exception as e:
        logger.error(f"❌ Failed to prepare base model: {str(e)}")
        return False, None

# Verify and prepare assets
assets_verified, base_model_path = verify_and_prepare_model_assets()

2025-08-21 13:49:01 - INFO - ✅ Fine-tuned model found: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/output/fine_tuned_models/Orpo-TinyLlama-1.1B-Chat-v1.0-FT
2025-08-21 13:49:01 - INFO - 🔍 Checking base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
2025-08-21 13:49:01,778 — INFO — [ModelSelector] Selected model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
2025-08-21 13:49:01,781 — INFO — [ModelSelector] Downloading model snapshot to: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/models/TinyLlama__TinyLlama-1.1B-Chat-v1.0


Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]

2025-08-21 13:49:01,979 — INFO — [ModelSelector] ✅ Model downloaded successfully to: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/models/TinyLlama__TinyLlama-1.1B-Chat-v1.0
2025-08-21 13:49:01,980 — INFO — [ModelSelector] Loading model and tokenizer from: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/models/TinyLlama__TinyLlama-1.1B-Chat-v1.0
2025-08-21 13:49:41,961 — INFO — [ModelSelector] Checking model for ORPO compatibility...
2025-08-21 13:49:41,963 — INFO — [ModelSelector] ✅ Model 'TinyLlama/TinyLlama-1.1B-Chat-v1.0' is ORPO-compatible.
2025-08-21 13:49:41 - INFO - ✅ Base model prepared locally: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/models/TinyLlama__TinyLlama-1.1B-Chat-v1.0


## Adaptive Model Registration Service

This section demonstrates how to register the **adaptive** LLM comparison model that automatically adjusts to different hardware and memory constraints. The model provides a single API endpoint that works efficiently across various deployment environments.

### Key Adaptive Features:
- **Automatic Device Selection**: Intelligently chooses between CPU and GPU based on availability
- **Dynamic Memory Management**: Adapts memory usage patterns based on available resources  
- **Smart Device Mapping**: Uses transformers' auto device mapping for optimal model distribution
- **Precision Optimization**: Automatically selects FP16 on GPU, FP32 on CPU for best performance
- **Robust Error Handling**: Graceful fallbacks when advanced features aren't available
- **Universal Compatibility**: Works in both memory-constrained and resource-rich environments

The service provides:

- **Base Model Inference**: Access to the original pre-trained model
- **Fine-Tuned Model Inference**: Access to the ORPO fine-tuned model  
- **Comparison Mode**: Switch between models using the `use_finetuning` parameter
- **Adaptive Performance**: Automatically optimizes for the deployment environment
- **Flexible Input**: Support for custom prompts and generation parameters

In [11]:
# Set MLflow tracking URI and experiment
mlflow.set_tracking_uri('/phoenix/mlflow')
mlflow.set_experiment(MLFLOW_EXPERIMENT_NAME)

if assets_verified and base_model_path:
    try:
        logger.info(f"📝 Registering comparison model with:")
        logger.info(f"   • Base model path: {base_model_path}")
        logger.info(f"   • Fine-tuned model: {FINE_TUNED_MODEL_NAME}")
        
        # Register the adaptive LLM comparison model
        register_llm_comparison_model(
            model_base_path=base_model_path,
            model_finetuned_path=FINE_TUNED_MODEL_NAME,
            experiment=MLFLOW_EXPERIMENT_NAME,
            run_name=MODEL_SERVICE_RUN_NAME,
            registry_name=MODEL_SERVICE_NAME,
            config_path=CONFIG_PATH,
            demo_folder=DEMO_FOLDER
        )
        
        logger.info("✅ Adaptive LLM comparison model registered successfully!")
        logger.info(f"Model name: {MODEL_SERVICE_NAME}")
        logger.info(f"Experiment: {MLFLOW_EXPERIMENT_NAME}")
        logger.info("This model automatically adapts to memory constraints and available hardware.")
        
    except Exception as e:
        logger.error(f"❌ Failed to register comparison model: {str(e)}")
        logger.info("Please check the error details above and ensure all dependencies are installed")
        
else:
    logger.error("❌ Cannot register model - required assets not found or not prepared")
    if not assets_verified:
        logger.info("Please run the run-workflow.ipynb notebook first to create the fine-tuned model")
    if not base_model_path:
        logger.info("Base model could not be downloaded or prepared locally")

2025-08-21 13:49:42 - INFO - 📝 Registering comparison model with:
2025-08-21 13:49:42 - INFO -    • Base model path: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/models/TinyLlama__TinyLlama-1.1B-Chat-v1.0
2025-08-21 13:49:42 - INFO -    • Fine-tuned model: Orpo-TinyLlama-1.1B-Chat-v1.0-FT
2025-08-21 13:49:42,085 — INFO — Resolved base model path: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/models/TinyLlama__TinyLlama-1.1B-Chat-v1.0
2025-08-21 13:49:42,086 — INFO — Resolved fine-tuned model path: /home/jovyan/AI-Blueprints/generative-ai/fine-tuning-with-orpo/output/fine_tuned_models/Orpo-TinyLlama-1.1B-Chat-v1.0-FT


Downloading artifacts:   0%|          | 0/31 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/6 [00:00<?, ?it/s]

Successfully registered model 'AIStudio-Fine-Tuning-Model'.
Created version '1' of model 'AIStudio-Fine-Tuning-Model'.
2025-08-21 13:52:12,895 — INFO — ✅ Adaptive LLM comparison model registered as `AIStudio-Fine-Tuning-Model` (run b5871a06b8c546419197166f3e5f634b)
2025-08-21 13:52:12 - INFO - ✅ Adaptive LLM comparison model registered successfully!
2025-08-21 13:52:12 - INFO - Model name: AIStudio-Fine-Tuning-Model
2025-08-21 13:52:12 - INFO - Experiment: AIStudio-Fine-Tuning-Experiment
2025-08-21 13:52:12 - INFO - This model automatically adapts to memory constraints and available hardware.


## Usage Instructions

Once the adaptive model is registered, you can use it through the MLflow model serving interface. The adaptive version automatically optimizes performance and memory usage based on your environment.

### Input Format
The model expects a pandas DataFrame with the following columns:
- `prompt` (string): The text prompt to generate from
- `use_finetuning` (boolean): Whether to use the fine-tuned model (True) or base model (False)
- `max_tokens` (integer, optional): Maximum number of tokens to generate (default: 128)

### Example Usage
```python
import pandas as pd
import mlflow

# Load the registered adaptive model
model = mlflow.pyfunc.load_model(f"models:/{MODEL_SERVICE_NAME}/latest")

# Create input data
input_data = pd.DataFrame({
    "prompt": ["Explain the importance of sustainable agriculture."],
    "use_finetuning": [True],  # Use fine-tuned model
    "max_tokens": [200]
})

# Generate response (automatically optimized)
response = model.predict(input_data)
print(response["response"].iloc[0])
```

### Adaptive Comparison Mode
The adaptive version efficiently handles model switching with automatic optimization:

```python
# Compare base vs fine-tuned (adaptive optimization)
prompts = ["Your test prompt here"]

for use_ft in [False, True]:
    input_data = pd.DataFrame({
        "prompt": prompts,
        "use_finetuning": [use_ft],
        "max_tokens": [150]
    })
    response = model.predict(input_data)
    model_type = "Fine-tuned" if use_ft else "Base"
    print(f"{model_type} Model: {response['response'].iloc[0]}")
    # Model automatically handles device placement and memory management
```

### Adaptive Benefits
- **Environment Detection**: Automatically detects available hardware and memory
- **Performance Optimization**: Uses best settings for each deployment environment
- **Memory Safety**: Prevents OOM errors through intelligent memory management
- **Hardware Efficiency**: Leverages GPU acceleration when available, graceful CPU fallback
- **Robust Operation**: Handles various deployment scenarios without configuration changes

In [12]:
end_time: float = time.time()
elapsed_time: float = end_time - start_time
elapsed_minutes: int = int(elapsed_time // 60)
elapsed_seconds: float = elapsed_time % 60

logger.info(f"⏱️ Total execution time: {elapsed_minutes}m {elapsed_seconds:.2f}s")
logger.info("✅ Model registration notebook execution completed successfully.")

2025-08-21 13:52:12 - INFO - ⏱️ Total execution time: 3m 30.36s
2025-08-21 13:52:12 - INFO - ✅ Model registration notebook execution completed successfully.


Built with ❤️ using [**HP AI Studio**](https://hp.com/ai-studio).