# Huggingface PT Models
This notebook registers hugging face models to Unity Catalog and deploys it via model serving

## Load Model from HuggingFace
Our serving journey starts with how we load the model from huggingface. We leverage the 'Auto' library from the HuggingFace transformers package because of its compatibility with MLFLow and Unity Catalog.

In [0]:
# Load model from Hugging Face
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    'braindao/Qwen2.5-14B',
    device_map='auto'
)

tokenizer = AutoTokenizer.from_pretrained('braindao/Qwen2.5-14B')

In [0]:
# Log model to MLflow and register in Unity Catalog
import mlflow
mlflow.set_registry_uri("databricks-uc")

with mlflow.start_run():
    # Define input example
    input_example = {"prompt": "What is machine learning?"}
    
    # Log model with Unity Catalog format
    mlflow.transformers.log_model(
        transformers_model={"model": model, "tokenizer": tokenizer},
        artifact_path="model",
        input_example=input_example,
        registered_model_name="prod.ml_team.llama7b_chat"
    )

In [0]:
# Deploy to Model Serving using MLflow Deployments SDK
from mlflow.deployments import get_deploy_client
client = get_deploy_client("databricks")

endpoint_name = "llama7b-chat-endpoint"
client.create_endpoint(
    name=endpoint_name,
    config={
        "served_entities": [{
            "entity_name": "prod.ml_team.llama7b_chat",
            "entity_version": "1",
            "workload_size": "Medium",
            "scale_to_zero_enabled": False
        }]
    }
)