
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>



# LAB: Batch Inference Using SLM

In this lab, you will learn how to implement a batch inference pipeline using a Small Language Model (SLM) in a production environment. The objective is to follow a structured approach to develop, test, and deploy a language model-based pipeline using tools such as MLflow, and Unity Catalog. This process focuses on effective model management and operational strategies, facilitating batch inference using Spark DataFrames, and managing model life cycles via model registration and querying.


**Lab Outline:**

*In this lab, you will need to complete the following tasks:*

1. **Task 1:** Create a Hugging Face question-answering pipeline and test it.
2. **Task 2:** Track and register the model using MLflow and Unity Catalog.
3. **Task 3:** Manage the registered model's state.
4. **Task 4:** Perform single-node and multi-node batch inference.
5. **Task 5:** Perform batch inference using SQL `ai_query`.

## REQUIRED - SELECT CLASSIC COMPUTE
Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:
1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.
   
   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Requirements

Please review the following requirements before starting the lesson:

* To run this notebook, you need to use one of the following Databricks runtime(s): **15.4.x-cpu-ml-scala2.12**


## Classroom Setup

Install required libraries.

In [0]:
%pip install -qq -U huggingface-hub
dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


Before starting the Lab, run the provided classroom setup script. This script will define configuration variables necessary for the lab. Execute the following cell:

In [0]:
%run ../Includes/Classroom-Setup-01

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m



The examples and models presented in this course are intended solely for demonstration and educational purposes.
 Please note that the models and prompt examples may sometimes contain offensive, inaccurate, biased, or harmful content.


**Other Conventions:**

Throughout this lab, we'll refer to the object `DA`. This object, provided by Databricks Academy, contains variables such as your username, catalog name, schema name, working directory, and dataset locations. Run the code block below to view these details:

In [0]:
print(f"Username:          {DA.username}")
print(f"Catalog Name:      {DA.catalog_name}")
print(f"Schema Name:       {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"Dataset Location:  {DA.paths.datasets}")

Username:          labuser11195156_1755057414@vocareum.com
Catalog Name:      dbacademy
Schema Name:       labuser11195156_1755057414
Working Directory: /Volumes/dbacademy/ops/labuser11195156_1755057414@vocareum_com
Dataset Location:  NestedNamespace (arxiv='/Volumes/dbacademy_arxiv/v01')


## Dataset Overview

In this Lab, you will be using the SQuAD dataset hosted on HuggingFace. This is a reading comprehension dataset which consists of questions and answers based on the provided context. Let's load and inspect the structure of the SQuAD dataset.

In [0]:
from datasets import load_dataset
from delta.tables import DeltaTable

prod_data_table_name = f"{DA.catalog_name}.{DA.schema_name}.m4_1_lab_prod_data"
squad_dataset = load_dataset("squad")
test_spark_df = spark.createDataFrame(squad_dataset["validation"].to_pandas())
test_spark_df.write.mode("overwrite").saveAsTable(prod_data_table_name)




## Task 1: Develop a LLM Pipeline

Create a language model pipeline that efficiently answers questions by leveraging pre-trained model.

###1.1: Create a Hugging Face Q&A Pipeline
Initialize a QA pipeline using a specified model tailored for question answering. This step involves selecting a model that has been optimized for the "`question-answering`" task.

In [0]:
##
## Import the pipeline function from the transformers library
from transformers import pipeline  
## Define variables for the model name, device mapping, and cache directory
hf_model_name = "distilbert-base-cased-distilled-squad"  
device_map = "auto"  ## Automatically use the best available device (CPU or GPU)
cache_dir = "/hf_cache"  ## Path for caching data

## Initialize a question-answering pipeline with the specified model
qa_pipeline = pipeline(
    task="question-answering",  ## Specify the task type as 'question-answering'
    model=hf_model_name,  ## Model to be loaded
    model_kwargs={"cache_dir": cache_dir},  
)

2025-08-13 04:14:57.035030: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-13 04:14:57.039721: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-13 04:14:57.090679: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.



###1.2: Test Question-Answering Pipeline
Validate the pipeline's functionality by running a predefined question and context to observe how the model interprets and responds.

In [0]:
##
## Define the context string where the model will search for answers
context = """Marie Curie was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity. She was the first woman to win a Nobel Prize and the first person and only woman to win the Nobel prize twice in different scientific fields."""

## Define the question to be answered based on the given context
question = "Why is Marie Curie famous?"

## Use the question-answering pipeline to find an answer to the question from the context
answer = qa_pipeline(question=question, context=context, token_type_ids=None)

## Print the question and answer
print(f"Question: {question}")

print(f"Answer: {answer['answer']}")
print("===============================================")

## Print the context to show the content the model used to find the answer
print(f"Context: {context}")

Question: Why is Marie Curie famous?
Answer: conducted pioneering research on radioactivity
Context: Marie Curie was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity. She was the first woman to win a Nobel Prize and the first person and only woman to win the Nobel prize twice in different scientific fields.


## Task 2: Model Development and Registering
Track the developed model using MLflow and register it in the Unity Catalog for lifecycle management.

### 2.1: Track LLM Development with MLflow

Log the model's parameters, configuration, and outputs to MLflow for tracking experiments, versioning, and reproducibility.

In [0]:
##

## Import necessary MLflow and related library modules for model tracking
import mlflow
from mlflow.models import infer_signature
from mlflow.transformers import generate_signature_output

## Generate a model output using the QA pipeline for a given input to use in the model signature
output = generate_signature_output(qa_pipeline, {"question": question, "context": context})

## Infer a model signature that defines the input and output schema of the model
signature = infer_signature({"question": question, "context": context}, output)

## Set the name of the experiment in MLflow
experiment_name = f"/Users/{DA.username}/GenAI-As-04-Batch-Demo"
mlflow.set_experiment(experiment_name)

## Define a path within the MLflow Artifacts repository to store the model
model_artifact_path = "qa_pipeline"

## Start an MLflow run to log parameters, artifacts, and models
with mlflow.start_run():
    ## Log parameters used in the model; here, the model name
    mlflow.log_params({
        "hf_model_name": hf_model_name,
    })

    ## Define inference configuration for logging purposes, could include other configurations
    inference_config = {
        "hf_model_name": hf_model_name,
    }

    ## Log the model along with its configuration, signature, and an example for use
    model_info = mlflow.transformers.log_model(
        transformers_model=qa_pipeline,
        artifact_path=model_artifact_path,
        task="question-answering",  ## Type of task for the model
        inference_config=inference_config,  ## Configuration used for inference
        signature=signature,  ## Signature that defines model input and output
        input_example={"question": "Why is Marie Curie famous?", "context": context},  ## Example of input
    )

[2025-08-13 04:15:02,632] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)




Uploading artifacts:   0%|          | 0/20 [00:00<?, ?it/s]

🏃 View run thoughtful-vole-186 at: https://dbc-ef7c3468-ef98.cloud.databricks.com/ml/experiments/198782775599319/runs/856cd2bf78134468bacf6ca078862cb7
🧪 View experiment at: https://dbc-ef7c3468-ef98.cloud.databricks.com/ml/experiments/198782775599319


### 2.2: Query the MLflow Tracking Server
Retrieve information about the model's performance and other metrics from the MLflow tracking server.

In [0]:
##
## Retrieve the experiment ID using the experiment name
experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id
## Search for all runs in the experiment using the experiment ID
runs = mlflow.search_runs([experiment_id])
## Sort the runs by their start time in descending order and get the run ID of the latest run
last_run_id = runs.sort_values("start_time", ascending=False).iloc[0].run_id
## Construct the model URI using the last run ID and the specified artifact path
model_uri = f"runs:/{last_run_id}/{model_artifact_path}"

###2.3: Load Model Back as a Pipeline
Load the registered model from MLflow to verify its performance and integration capabilities post-registration.


In [0]:
##
loaded_qa_pipeline = mlflow.pyfunc.load_model(model_uri=model_uri)
loaded_qa_pipeline.predict({"question": question, "context": context})

Downloading artifacts:   0%|          | 0/20 [00:00<?, ?it/s]

['conducted pioneering research on radioactivity']

### 2.4: Register the Model to Unity Catalog
Register the model in the Unity Catalog for better version control and to facilitate the deployment process.

In [0]:
##
from mlflow import MlflowClient
## Define the model name
model_name = f"{DA.catalog_name}.{DA.schema_name}.qa_pipeline"
## Set the MLflow registry URI
mlflow.set_registry_uri("databricks-uc")
## Register the model in the MLflow model registry under the specified name and model URI
mlflow.register_model(model_uri=model_uri, name=model_name)

Successfully registered model 'dbacademy.labuser11195156_1755057414.qa_pipeline'.


Downloading artifacts:   0%|          | 0/20 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/20 [00:00<?, ?it/s]

Created version '1' of model 'dbacademy.labuser11195156_1755057414.qa_pipeline'.


<ModelVersion: aliases=[], creation_timestamp=1755058530346, current_stage=None, description='', last_updated_timestamp=1755058533145, name='dbacademy.labuser11195156_1755057414.qa_pipeline', run_id='856cd2bf78134468bacf6ca078862cb7', run_link=None, source='dbfs:/databricks/mlflow-tracking/198782775599319/856cd2bf78134468bacf6ca078862cb7/artifacts/qa_pipeline', status='READY', status_message='', tags={}, user_id='labuser11195156_1755057414@vocareum.com', version='1'>

## Task 3: LLM Model State Management
In this task, you'll manage your model's lifecycle across different stages using MLflow and Unity Catalog. By leveraging MLflow's Model Registry, you will update and maintain the model's state to enhance tracking, version control, and deployment efficiency.

###3.1: Search and Inspect Registered Model
Identify and inspect the latest version of your registered model to ensure you are managing the most current and relevant iteration. This step is crucial as it determines the baseline for setting model stages or aliases.

- Retrieve the Latest Model Version
- Set Model Alias

In [0]:
##
def get_latest_model_version(model_name_in):
    ## Initialize the MLflow Client to interact with the MLflow server
    client = MlflowClient()
    
    ## Search for all versions of the specified model in the Model Registry
    model_version_infos = client.search_model_versions("name = '%s'" % model_name_in)
    
    ## Extract the version numbers and return the highest (latest) version
    return max([model_version_info.version for model_version_info in model_version_infos])

## Initialize the MLflow Client for further operations
client = mlflow.tracking.MlflowClient()

## Get the latest version number of the specified model
current_model_version = get_latest_model_version(model_name)

## Set an alias 'champion' for the latest version of the model
client.set_registered_model_alias(
    name=model_name, 
    alias="champion", 
    version=current_model_version
)

## Task 4: Batch Inference
Perform inference using the registered model on new data, both in single-node and multi-node environments.

###4.1: Load the Model for Batch Inference
Prepare the environment and load the model from Unity Catalog for batch processing.

In [0]:
prod_data_table = f"{DA.catalog_name}.{DA.schema_name}.m4_1_lab_prod_data"
## Read data from the specified Spark table and limit the results to the first 100 rows
prod_data_df = spark.read.table(prod_data_table).limit(100)
## Display the DataFrame to visualize the top 100 rows of the dataset
display(prod_data_df)

id,title,context,question,answers
56be4db0acb8001400a502ec,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team represented the AFC at Super Bowl 50?,"List(List(177, 177, 177), List(Denver Broncos, Denver Broncos, Denver Broncos))"
56be4db0acb8001400a502ed,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team represented the NFC at Super Bowl 50?,"List(List(249, 249, 249), List(Carolina Panthers, Carolina Panthers, Carolina Panthers))"
56be4db0acb8001400a502ee,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Where did Super Bowl 50 take place?,"List(List(403, 355, 355), List(Santa Clara, California, Levi's Stadium, Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.))"
56be4db0acb8001400a502ef,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team won Super Bowl 50?,"List(List(177, 177, 177), List(Denver Broncos, Denver Broncos, Denver Broncos))"
56be4db0acb8001400a502f0,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What color was used to emphasize the 50th anniversary of the Super Bowl?,"List(List(488, 488, 521), List(gold, gold, gold))"
56be8e613aeaaa14008c90d1,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What was the theme of Super Bowl 50?,"List(List(487, 521, 487), List(""golden anniversary"", gold-themed, ""golden anniversary))"
56be8e613aeaaa14008c90d2,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What day was the game played on?,"List(List(334, 334, 334), List(February 7, 2016, February 7, February 7, 2016))"
56be8e613aeaaa14008c90d3,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What is the AFC short for?,"List(List(133, 133, 133), List(American Football Conference, American Football Conference, American Football Conference))"
56bea9923aeaaa14008c91b9,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What was the theme of Super Bowl 50?,"List(List(487, 521, 521), List(""golden anniversary"", gold-themed, gold))"
56bea9923aeaaa14008c91ba,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What does AFC stand for?,"List(List(133, 133, 133), List(American Football Conference, American Football Conference, American Football Conference))"


###4.2: Single-node Batch Inference
Conduct inference tests on a limited dataset to validate the model's response accuracy and speed in a single-node setup.


In [0]:
display(prod_data_df)

id,title,context,question,answers
56be4db0acb8001400a502ec,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team represented the AFC at Super Bowl 50?,"List(List(177, 177, 177), List(Denver Broncos, Denver Broncos, Denver Broncos))"
56be4db0acb8001400a502ed,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team represented the NFC at Super Bowl 50?,"List(List(249, 249, 249), List(Carolina Panthers, Carolina Panthers, Carolina Panthers))"
56be4db0acb8001400a502ee,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Where did Super Bowl 50 take place?,"List(List(403, 355, 355), List(Santa Clara, California, Levi's Stadium, Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.))"
56be4db0acb8001400a502ef,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team won Super Bowl 50?,"List(List(177, 177, 177), List(Denver Broncos, Denver Broncos, Denver Broncos))"
56be4db0acb8001400a502f0,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What color was used to emphasize the 50th anniversary of the Super Bowl?,"List(List(488, 488, 521), List(gold, gold, gold))"
56be8e613aeaaa14008c90d1,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What was the theme of Super Bowl 50?,"List(List(487, 521, 487), List(""golden anniversary"", gold-themed, ""golden anniversary))"
56be8e613aeaaa14008c90d2,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What day was the game played on?,"List(List(334, 334, 334), List(February 7, 2016, February 7, February 7, 2016))"
56be8e613aeaaa14008c90d3,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What is the AFC short for?,"List(List(133, 133, 133), List(American Football Conference, American Football Conference, American Football Conference))"
56bea9923aeaaa14008c91b9,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What was the theme of Super Bowl 50?,"List(List(487, 521, 521), List(""golden anniversary"", gold-themed, gold))"
56bea9923aeaaa14008c91ba,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What does AFC stand for?,"List(List(133, 133, 133), List(American Football Conference, American Football Conference, American Football Conference))"


In [0]:
##
## Load the latest version of the model from MLflow using the provided model URI
latest_model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{current_model_version}")

## Convert the first two rows of the DataFrame to a Pandas DataFrame for easier manipulation
prod_data_sample_pdf = prod_data_df.limit(2).toPandas()

## Define a list of questions to be answered by the model
questions = ["Which NFL team represented the AFC at Super Bowl 50?", "What is the AFC short for?"]

## Generate answers for each question by applying the loaded model on the context provided in the DataFrame
qa_results = [latest_model.predict({"question": q, "context": doc}) for q, doc in zip(questions, prod_data_sample_pdf["context"])]

## Import the pprint function for formatted display of objects
from pprint import pprint

## Print each result in a formatted manner using pprint for better readability
print(qa_results)

Downloading artifacts:   0%|          | 0/20 [00:00<?, ?it/s]



[['Denver Broncos'], ['American Football Conference']]


###4.3: Multi-node Batch Inference
Scale the inference process using Spark to simulate real-world, large-scale data handling scenarios.


In [0]:
##
from pyspark.sql.functions import col

## Ensure that the input DataFrame contains 'question' and 'context' columns
prod_data_df = prod_data_df.withColumn("question", col("question"))
prod_data_df = prod_data_df.withColumn("context", col("context"))

prod_model_udf = mlflow.pyfunc.spark_udf(
    spark,
    model_uri=f"models:/{model_name}@champion",
    env_manager="local",
    result_type="string",
)
batch_inference_results_df = prod_data_df.withColumn("generated_answer", prod_model_udf("question", "context"))
## Display the DataFrame containing the results of the batch inference with generated answers
display(batch_inference_results_df)

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/20 [00:00<?, ?it/s]



Downloading artifacts:   0%|          | 0/21 [00:00<?, ?it/s]

2025/08/13 04:15:45 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'


id,title,context,question,answers,generated_answer
56be4db0acb8001400a502ec,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team represented the AFC at Super Bowl 50?,"List(List(177, 177, 177), List(Denver Broncos, Denver Broncos, Denver Broncos))",Denver Broncos
56be4db0acb8001400a502ed,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team represented the NFC at Super Bowl 50?,"List(List(249, 249, 249), List(Carolina Panthers, Carolina Panthers, Carolina Panthers))",Carolina Panthers
56be4db0acb8001400a502ee,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Where did Super Bowl 50 take place?,"List(List(403, 355, 355), List(Santa Clara, California, Levi's Stadium, Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.))","Levi's Stadium in the San Francisco Bay Area at Santa Clara, California"
56be4db0acb8001400a502ef,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",Which NFL team won Super Bowl 50?,"List(List(177, 177, 177), List(Denver Broncos, Denver Broncos, Denver Broncos))",Carolina Panthers
56be4db0acb8001400a502f0,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What color was used to emphasize the 50th anniversary of the Super Bowl?,"List(List(488, 488, 521), List(gold, gold, gold))",gold
56be8e613aeaaa14008c90d1,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What was the theme of Super Bowl 50?,"List(List(487, 521, 487), List(""golden anniversary"", gold-themed, ""golden anniversary))",golden anniversary
56be8e613aeaaa14008c90d2,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What day was the game played on?,"List(List(334, 334, 334), List(February 7, 2016, February 7, February 7, 2016))","February 7, 2016"
56be8e613aeaaa14008c90d3,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What is the AFC short for?,"List(List(133, 133, 133), List(American Football Conference, American Football Conference, American Football Conference))",American Football Conference
56bea9923aeaaa14008c91b9,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What was the theme of Super Bowl 50?,"List(List(487, 521, 521), List(""golden anniversary"", gold-themed, gold))",golden anniversary
56bea9923aeaaa14008c91ba,Super_Bowl_50,"Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ""golden anniversary"" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ""Super Bowl L""), so that the logo could prominently feature the Arabic numerals 50.",What does AFC stand for?,"List(List(133, 133, 133), List(American Football Conference, American Football Conference, American Football Conference))",American Football Conference


###4.4: Write Inference Results to Delta Table
Store the inference results in a Delta table to ensure data integrity and enable further analysis.



In [0]:
prod_data_summaries_table_name = f"{DA.catalog_name}.{DA.schema_name}.m4_1_lab_batch_inference"
batch_inference_results_df.write.mode("append").saveAsTable(prod_data_summaries_table_name)

##Task 5: Batch Inference Using `ai_query()`

Utilize SQL capabilities to perform batch inference directly using SQL queries, integrating AI functions for broader accessibility and efficiency.

### 5.1: Run SQL Batch Inference

Create a SQL query that executes an AI model inference directly within the SQL. This approach utilizes the `ai_query()` function in SQL to process batch queries against the dataset.


###Step 1: Run SQL Batch Inference

In [0]:
%sql
CREATE OR REPLACE TABLE ai_query_inference AS (
  SELECT
    id,
    ai_query(
      "databricks-meta-llama-3-3-70b-instruct",
      CONCAT("Asking question: ", question, " Answer: ", CAST(answers AS STRING))
    ) as generated_answer
  FROM m4_1_lab_prod_data LIMIT 100
);

num_affected_rows,num_inserted_rows


###5.2: Query Inference Results
Query the generated table to view the inference results.

In [0]:
%sql
---- Retrieve all records from the 'ai_query_inference' table to view the results
SELECT * FROM ai_query_inference;


id,generated_answer
56be4db0acb8001400a502ec,The Denver Broncos represented the AFC at Super Bowl 50.
56be4db0acb8001400a502ed,The Carolina Panthers represented the NFC at Super Bowl 50.
56be4db0acb8001400a502ee,"Super Bowl 50 took place at Levi's Stadium in Santa Clara, California, in the San Francisco Bay Area."
56be4db0acb8001400a502ef,The Denver Broncos won Super Bowl 50.
56be4db0acb8001400a502f0,The color used to emphasize the 50th anniversary of the Super Bowl was gold.
56be8e613aeaaa14008c90d1,"The theme of Super Bowl 50 was a ""golden anniversary"" theme, which was reflected in the gold-themed branding and decor used throughout the event."
56be8e613aeaaa14008c90d2,"The game was played on February 7, 2016."
56be8e613aeaaa14008c90d3,The AFC is short for American Football Conference.
56bea9923aeaaa14008c91b9,"The theme of Super Bowl 50 was a ""golden anniversary"" theme, which incorporated gold-themed elements to commemorate the 50th edition of the Super Bowl."
56bea9923aeaaa14008c91ba,AFC stands for American Football Conference.


## Conclusion

In this lab, you successfully implemented a batch inference workflow using a small language model. You created a question-answering pipeline, tracked and registered the model using MLflow, managed model versions and stages with Unity Catalog, and performed both single-node and multinode batch inference. Finally, you explored an alternative method for batch inference using the `ai_query` SQL function.


&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="blank">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy" target="blank">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use" target="blank">Terms of Use</a> | 
<a href="https://help.databricks.com/" target="blank">Support</a>