
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# LAB - Online Monitoring

In this lab, you will create an online monitor for a sample inference table using Databricks Lakehouse Monitoring. A sample inference table, extracted from a deployed Model Serving Endpoint, has been imported for you to use for monitoring.

**Lab Outline:**

*In this lab, you will need to complete the following tasks:*

* **Task 1:** Define Evaluation Metrics
* **Task 2:** Unpack the Request Payload
* **Task 3:** Compute Metrics
* **Task 4:** Save the Processed Inference Table
* **Task 5:** Create a Monitor on the Inference Table
* **Task 6:** Review the Monitor Details
* **Task 7:** View the Monitor Dashboard

## REQUIRED - SELECT CLASSIC COMPUTE
Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:
1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.
   
   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Requirements

Please review the following requirements before starting the lesson:

* To run this notebook, you need to use one of the following Databricks runtime(s): **15.4.x-cpu-ml-scala2.12**

## Classroom Setup

Install required libraries and load classroom configuration.

In [0]:
%pip install -U -qq databricks-sdk textstat tiktoken evaluate

dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%run ../Includes/Classroom-Setup-03

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m



The examples and models presented in this course are intended solely for demonstration and educational purposes.
 Please note that the models and prompt examples may sometimes contain offensive, inaccurate, biased, or harmful content.


## Inference Table

You are going to use the same inference table that we used for the demo. The inference table is pre-loaded and ready to be used.

In [0]:
inference_table_name = f"{DA.catalog_name}.{DA.schema_name}.rag_app_realtime_payload"
display(spark.sql(f"SELECT * FROM {inference_table_name}"))

client_request_id,databricks_request_id,date,timestamp_ms,status_code,execution_time_ms,request,response,sampling_fraction,request_metadata
,5dca1319-2a0d-4d82-bbac-a10bde2c2e84,2025-08-03,1754174434710,200,3066.0,"{""inputs"": [{""query"": ""Can you discuss the main challenges and opportunities for future research of very large models in the big data era, as mentioned in the text?""}]}","{""predictions"": [""Sure, I'd be happy to discuss the main challenges and opportunities for future research of very large models in the big data era.\n\nChallenges:\n\n1. Limited context size: Even though the size of language models has been increasing, most models still use a relatively small context size compared to the potentially large context needed to correctly perform language modeling. This can lead to issues when the necessary knowledge is not present in the context.\n2. Storage of knowledge: Massive scale is required to store knowledge that is not present in the context but necessary to perform the task at hand. This can be a significant challenge in terms of both storage and computation.\n3. Non-parametric models: By equipping language models with modules that retrieve information from external databases, it is possible to match certain capabilities of some of the largest language models while having fewer parameters. However, this results in non-parametric models, which can introduce new challenges in terms of data management and retrieval.\n\nOpportunities:\n\n1. Improved context relevance: By increasing the relevance of the context provided to language models, it is possible to improve their performance on certain tasks. This can be achieved through strategies such as adding information extracted from relevant external documents or allowing language models to leverage external tools.\n2. Reasoning strategies: Language models can improve their context via reasoning strategies, which can lead to more relevant contexts and improved performance on certain tasks.\n3. External tools: Allowing language models to leverage external tools can augment the current context with important missing information that""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,b66fd6af-c9a4-4672-ba39-9853a8ade8d4,2025-08-03,1754176878826,200,3341.0,"{""inputs"": [{""query"": ""How does the study's exposure measure reflect the technical capacity to make human labor more efficient, and what factors imply that technical feasibility does not guarantee labor productivity or automation outcomes?""}]}","{""predictions"": [""The study's exposure measure reflects the technical capacity to make human labor more efficient by quantifying the exposure of different occupations to large language models (LLMs) like GPT-4. This measure is based on the assumption that as LLMs improve, they will be able to perform an increasing number of tasks currently done by humans, thereby making human labor more efficient.\n\nHowever, there are several factors that imply technical feasibility does not guarantee labor productivity or automation outcomes. First, the study finds that the AI Occupational Exposure Score is not correlated with the study's exposure measures, suggesting that the potential impact of LLMs on labor productivity and automation may not be fully captured by existing measures of AI exposure.\n\nSecond, the study finds that manual work is not exposed to LLMs or even LLMs with additional systems integration for the time being. This suggests that the technical feasibility of automating manual tasks with LLMs may not be sufficient to guarantee labor productivity or automation outcomes, as there may be other factors such as the cost of implementation or the availability of complementary technologies that also need to be considered.\n\nFinally, the study notes that the potential impact of LLMs expands significantly when complementary technologies are taken into account. This suggests that the technical feasibility of automating tasks with LLMs may depend on the development of other technologies, and that the relationship between technical feasibility and labor productivity or automation outcomes may be complex and multifaceted.""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,b4bb72de-bd46-4f73-91f1-4d5774b56c08,2025-08-03,1754176880893,200,1968.0,"{""inputs"": [{""query"": ""How do human assessments compare to GPT-4 classifications in the study's primary exposure dataset, and what is the main finding regarding the potential impact of LLMs like GPT-4 when considering both current model capabilities and anticipated tools built upon them?""}]}","{""predictions"": [""Human assessments and GPT-4 classifications show a high degree of agreement in the primary exposure dataset. The main finding regarding the potential impact of LLMs like GPT-4 suggests that while current model capabilities are significant, the impact on occupations may be even greater when considering anticipated tools built upon them. This is because LLMs have the potential to automate a wide range of tasks, including those that are currently considered non-automatable. However, the study also notes some limitations, such as the validity of the task-based framework and the lack of expertise and task interpretation by human annotators.""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,199e33d2-6ead-4a70-88af-2c0c29cf5a7e,2025-08-03,1754176884462,200,3408.0,"{""inputs"": [{""query"": ""What are the key findings regarding the relationship between exposure to LLMs and various factors such as wages, skillsets, and job preparation difficulties, and how do these findings compare to previous efforts documenting the distribution of automation exposure in the economy?""}]}","{""predictions"": [""1. Relationship between LLM exposure and wages: The study finds that occupational exposure to LLMs weakly increases with the difficulty of job preparation, but the relationship between LLM exposure and wages is not explicitly mentioned. It suggests that workers facing higher barriers to entry in their jobs tend to experience more exposure to LLMs.\n\n2. Skillsets and LLM exposure: The study does not provide a direct comparison between skillsets and LLM exposure. However, it mentions that LLMs like GPT-4 can assess tasks for exposure and automation potential, which could be used to evaluate skillsets in the future.\n\n3. Comparison to previous automation exposure efforts: The study's measurements are broadly consistent with previous efforts documenting the distribution of automation exposure in the economy. Most other technology exposure measures examined in the study are statistically significantly correlated with the preferred exposure measure, while measures of manual routineness and robotics exposure show negative correlations.\n\n4. Industry-specific LLM exposure: Information processing industries exhibit high exposure to LLMs, while manufacturing, agriculture, and mining demonstrate lower exposure.\n\n5. Productivity growth and LLM exposure: The connection between productivity growth in the past decade and overall LLM exposure appears weak, suggesting a potential optimistic case that future productivity gains from LLMs may not exacerbate possible cost disease effects.\n\n6. Pervasive impacts of LLMs: The study indicates that the impacts of LLMs are likely to be pervasive, with the potential impact expanding significantly when""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,ee3ba5f7-376d-436d-a816-4077a63690c8,2025-08-03,1754175288565,200,3723.0,"{""inputs"": [{""query"": ""Can you discuss the history and evolution of PFMs in the NLP, CV, and Graph Learning (GL) domains?""}]}","{""predictions"": [""Sure, I'd be happy to discuss the history and evolution of Pretrained Foundation Models (PFMs) in the Natural Language Processing (NLP), Computer Vision (CV), and Graph Learning (GL) domains.\n\nIn the NLP domain, PFMs have evolved significantly over the years. Traditional models focused on static feature learning, but the introduction of the Transformer model revolutionized the field. The Transformer, which relies solely on attention mechanisms and doesn't use recurrent or convolutional architectures, allowed for the transfer of weighted representation knowledge between various neural units. This led to the development of models like GPT-3, a generative model based on the Transformer.\n\nIn the CV domain, the Vision Transformer (ViT) was proposed to represent an image as a series of image patches, similar to a series of word embeddings. This was a significant evolution from traditional CV models that relied on convolutional neural networks.\n\nIn the GL domain, Graph Transformer Networks (GTN) were employed to learn new graph structures and powerful node representations without domain knowledge. This was a significant advancement from traditional GL models that relied on graph convolutional networks.\n\nIn summary, the evolution of PFMs in these three domains has been marked by a shift from static feature learning to dynamic feature learning, and the Transformer model has played a crucial role in this evolution. The Transformer's ability to facilitate the transfer of weighted representation knowledge between various neural units has made it a key component in PFMs in NLP, CV, and GL.""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,d0aa9544-4183-45ae-954e-4cff1b5c05e8,2025-08-03,1754174131304,200,3996.0,"{""inputs"": [{""query"": ""What are the major advantages of PFMs, and how have they inspired research in model efficiency, security, and compression?""}]}","{""predictions"": [""1. Major Advantages of PFMs:\nPFMs (Pretrained Foundation Models) have several major advantages:\n a. Versatility: PFMs can be adapted to a wide range of tasks and domains, including natural language processing (NLP), computer vision (CV), and graph learning (GL).\n b. Improved Performance: PFMs have shown significant improvements in performance across various tasks compared to models trained from scratch.\n c. Transfer Learning: PFMs enable transfer learning, where a model pretrained on a large dataset can be fine-tuned for a specific task with less data and computation.\n\n2. Inspiration in Model Efficiency, Security, and Compression:\nThe success of PFMs has inspired research in model efficiency, security, and compression:\n a. Model Efficiency: Researchers are exploring ways to make PFMs more efficient, such as developing more efficient model architectures, optimizing training algorithms, and reducing the computational requirements for fine-tuning.\n b. Security: PFMs have raised concerns about security and privacy, leading to research on secure and private model training, such as federated learning and differential privacy.\n c. Compression: To make PFMs more practical for real-world applications, researchers are working on model compression techniques, such as pruning, quantization, and knowledge distillation, to reduce the size and computational requirements of PFMs.\n\nIn summary, PFMs have shown significant potential in various domains, and their success has inspired research in model efficiency, security, and compression to make them more""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,fafeadc1-1fd3-4fe5-a2f7-b446486ec53a,2025-08-03,1754173582302,200,3636.0,"{""inputs"": [{""query"": ""How does the pretraining technique used in PFMs differ between the NLP, CV, and GL domains?""}]}","{""predictions"": [""In the NLP domain, pretraining techniques used in PFMs involve training a general model on large amounts of unlabeled text data, which can then be fine-tuned for various downstream tasks. This approach allows well-trained language models to capture rich knowledge, such as long-term dependencies and hierarchical relationships, that can be beneficial for downstream tasks. Early pretraining techniques in NLP were static, but dynamic pretraining techniques, such as BERT and XLNet, have since been proposed to better adapt to different semantic environments.\n\nIn the CV domain, pretraining techniques involve training a model on a large dataset of images, such as ImageNet, and then fine-tuning it for specific tasks, such as image classification or object detection. Pretrained models in CV can capture features such as shapes, textures, and colors, which can be useful for various computer vision tasks.\n\nIn the GL domain, pretraining techniques involve training a model on a large graph dataset, such as citation networks or social networks, and then fine-tuning it for specific tasks, such as node classification, link prediction, or graph classification. Pretrained models in GL can capture structural and relational information in graphs, which can be useful for various graph-based tasks.\n\nOverall, while the specific pretraining techniques used in PFMs may differ between the NLP, CV, and GL domains, they all involve training a general model on large amounts of data and then fine-tuning it for specific tasks. The main difference lies in the type of""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,a8b2637e-93fe-429f-908a-adf2e7da50ce,2025-08-03,1754176635195,200,2319.0,"{""inputs"": [{""query"": ""How does the proposed assessing LLM capabilities and their potential effects on jobs complement predictions of technology's impacts on work, and what specific criteria does it include for measuring exposure to LLMs?""}]}","{""predictions"": [""The proposed rubric for assessing LLM capabilities and their potential effects on jobs complements predictions of technology's impacts on work by providing a framework to understand the evolving landscape of language models and their associated technologies. It measures the overall exposure of tasks to LLMs, which serves as a proxy for potential economic impact without distinguishing between labor-augmenting or labor-displacing effects.\n\nThe specific criteria included in the rubric for measuring exposure to LLMs are:\n\n1. The extent to which the task can be performed using text-based inputs and outputs.\n2. The level of creativity, critical thinking, and complex communication required for the task.\n3. The degree to which the task can be broken down into smaller, repeatable steps.\n4. The availability and quality of data required to train LLMs for the task.\n5. The potential for integrating LLMs with complementary technologies to enhance their capabilities.\n\nThese criteria help determine the level of exposure a task has to LLMs, considering both the current capabilities of LLMs and anticipated LLM-powered software.""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,c2339dd8-edfe-4013-a33c-a047a7a124f5,2025-08-03,1754176326963,200,1488.0,"{""inputs"": [{""query"": ""How does CaP compare against baselines in simulated language-instructed manipulation tasks?""}]}","{""predictions"": [""In the study, CaP (Language-Model-Assisted Policy Learning) was compared against several baselines in simulated language-instructed manipulation tasks. The results demonstrated that CaP outperformed the baselines in terms of success rate and sample efficiency. CaP was able to interpret natural language instructions, process perception outputs, and parameterize low-dimensional inputs to control primitives, which contributed to its superior performance. However, CaP is still limited by the scope of available perception APIs and control primitives, and it may struggle with significantly longer or more complex commands.""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"
,b1d0fc49-143f-4836-9cf0-a29272a04427,2025-08-03,1754176332023,200,2901.0,"{""inputs"": [{""query"": ""What are the limitations of CaP in the context of the experiments?""}]}","{""predictions"": [""1. Limited Generalization to Unseen Tasks: The performance of CaP degrades when it encounters tasks with unseen attributes or instructions, as observed in the table-top manipulation domain. This is because end-to-end systems like CLIPort struggle to generalize to new tasks that require precise numerical spatial-geometric reasoning.\n\n2. Dependence on Perception APIs: CaP relies on object detection APIs for perception, which may limit its performance if the APIs are not accurate or if they fail to detect certain objects.\n\n3. Limited Ability to Perform Precise Numerical Computations: While CaP outperforms natural-language planners in tasks that require precise numerical spatial-geometric reasoning, it still has limitations in its ability to perform precise numerical computations compared to direct coding.\n\n4. Limited Ability to Parse Complex Language Commands: CaP may struggle to parse complex language commands that require multiple steps or that involve ambiguous or unclear instructions.\n\n5. Limited Ability to Maintain a Session: CaP relies on a session-based system to maintain context between language commands, but this system may have limitations in its ability to maintain context over long periods of time or in complex scenarios.\n\n6. Limited Ability to Generate New Functions: While CaP can generate new functions, this ability is limited by the complexity of the functions and the availability of APIs to perform the required actions.\n\n7. Limited Ability to Draw Entirely New Shapes: While CaP can draw entirely new shapes from language commands, this ability is limited by the complexity of the shapes""]}",1.0,"Map(model_name -> amine_elhelou_ud8n_da.default.rag_app, endpoint_name -> amine_elhelou_ud8n_da_gaisd_endpoint, model_version -> 1)"


## Task 1: Define Evaluation Metrics
In this task, you will define evaluation metrics such as toxicity, perplexity, and readability, which will be used to analyze the inference table data.

-  Define the evaluation metrics functions using `pandas_udf`.

In [0]:
##

## Import necessary libraries
import tiktoken, textstat, evaluate
import pandas as pd
from pyspark.sql.functions import pandas_udf

## Define a pandas UDF to compute the number of tokens in the text
@pandas_udf("int")
def compute_num_tokens(texts: pd.Series) -> pd.Series:
  encoding = tiktoken.get_encoding("cl100k_base")
  return pd.Series(map(len, encoding.encode_batch(texts)))

## Define a pandas UDF to compute the toxicity of the text
@pandas_udf("double")
def compute_toxicity(texts: pd.Series) -> pd.Series:
  ## Omit entries with null input from evaluation
  toxicity = evaluate.load("toxicity", module_type="measurement", cache_dir="/tmp/hf_cache/")
  return pd.Series(toxicity.compute(predictions=texts.fillna(""))["toxicity"]).where(texts.notna(), None)

## Define a pandas UDF to compute the perplexity of the text
@pandas_udf("double")
def compute_perplexity(texts: pd.Series) -> pd.Series:
  ## Omit entries with null input from evaluation
  perplexity = evaluate.load("perplexity", module_type="measurement", cache_dir="/tmp/hf_cache/")
  return pd.Series(perplexity.compute(data=texts.fillna(""), model_id="gpt2")["perplexities"]).where(texts.notna(), None)

2025-08-13 04:32:46.231730: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-13 04:32:46.236035: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-13 04:32:46.289272: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Task 2: Unpack the Request Payload
In this task, you will unpack the request payload from the inference table and prepare it for processing.

**Steps:**

- Unpack the requests as a stream.
- Drop unnecessary columns for monitoring jobs.

In [0]:
## 
import os

## Reset checkpoint [for demo purposes ONLY]
checkpoint_location = os.path.join(DA.paths.working_dir, "checkpoint")
dbutils.fs.rm(checkpoint_location, True)

## Define the JSON path and type for the input requests
INPUT_REQUEST_JSON_PATH = "inputs[*].query"
INPUT_JSON_PATH_TYPE = "array<string>"
KEEP_LAST_QUESTION_ONLY = False

## Define the JSON path and type for the output responses
OUTPUT_REQUEST_JSON_PATH = "predictions"
OUPUT_JSON_PATH_TYPE = "array<string>"

## Unpack the requests as a stream.
requests_raw_df = spark.readStream.table(inference_table_name)
requests_processed_df = unpack_requests(
    requests_raw_df,
    INPUT_REQUEST_JSON_PATH,
    INPUT_JSON_PATH_TYPE,
    OUTPUT_REQUEST_JSON_PATH,
    OUPUT_JSON_PATH_TYPE,
    KEEP_LAST_QUESTION_ONLY
)

## Drop un-necessary columns for monitoring jobs
requests_processed_df = requests_processed_df.drop("date", "status_code", "sampling_fraction", "client_request_id", "databricks_request_id")

## Task 3: Compute Metrics

In this task, you will compute the defined evaluation metrics for the unpacked request payloads.

- Compute the toxicity, perplexity, and token count for the input and output columns.

In [0]:
## Define the columns to measure
column_to_measure = ["input", "output"]

## Iterate over each column to measure
for column_name in column_to_measure:
    ## Compute the metrics and add them as new columns to the DataFrame
    requests_df_with_metrics = (
      requests_processed_df
                 .withColumn(f"toxicity({column_name})", compute_toxicity(col(column_name))) 
                 .withColumn(f"perplexity({column_name})", compute_perplexity(col(column_name))) 
                 .withColumn(f"token_count({column_name})", compute_num_tokens(col(column_name))) 
    )

## Task 4: Save the Processed Inference Table

In this task, you will save the processed inference table with the computed metrics to a Delta table.

**Steps:**

- Create the processed inference table if it doesn't exist.
- Append the new unpacked payloads and metrics to the processed table.

In [0]:
##

from delta.tables import DeltaTable
## Define the name of the processed table
processed_table_name = f"{DA.catalog_name}.{DA.schema_name}.rag_app_processed_inferences_lab"

## Create the table if it does not exist
(DeltaTable.createOrReplace(spark)
        .tableName(processed_table_name)
        .addColumns(requests_df_with_metrics.schema)
        .property("delta.enableChangeDataFeed", "true")
        .property("delta.columnMapping.mode", "name")
        .execute())
## Write the requests_df_with_metrics DataFrame to the processed table as a stream
(requests_df_with_metrics.writeStream
                      .trigger(availableNow=True)
                      .format("delta")
                      .outputMode("append")
                      .option("checkpointLocation", checkpoint_location)
                      .toTable(processed_table_name).awaitTermination())

## Task 5: Create a Monitor on the Inference Table

In this task, you will create a monitor on the processed inference table using Databricks Lakehouse Monitoring.

- Create a monitor using the `databricks-sdk`.

In [0]:
##

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.catalog import MonitorTimeSeries
## Initialize the workspace client
w = WorkspaceClient()

try:
  ## Create a monitor using the workspace client's quality_monitors service
  lhm_monitor = w.quality_monitors.create(
    table_name=processed_table_name,
    time_series = MonitorTimeSeries(
      timestamp_col = "timestamp",
      granularities = ["5 minutes"],
    ),
    assets_dir = os.getcwd(),
    slicing_exprs = ["model_id"],
    output_schema_name=f"{DA.catalog_name}.{DA.schema_name}"
  )

## Handle any exceptions that occur during monitor creation
except Exception as lhm_exception:
  print(lhm_exception)

In [0]:
##

from databricks.sdk.service.catalog import MonitorInfoStatus

## Get the monitor information for the processed table
monitor_info = w.quality_monitors.get(processed_table_name)
print(monitor_info.status)

## Check if the monitor status is pending
if monitor_info.status == MonitorInfoStatus.MONITOR_STATUS_PENDING:
    print("Wait until monitor creation is completed...")

MonitorInfoStatus.MONITOR_STATUS_PENDING
Wait until monitor creation is completed...


## Task 6: Review the Monitor Details

In this task, you will review the details of the monitor created in the previous step. This will involve checking the **Quality** tab for the monitor details and reviewing the metrics tables generated by the monitor.

**Steps:**


Complete following steps:


1. **Review Monitor Details in Quality Tab**
   - Go to the **[Catalog](explore/data)** and find the table you monitored.
   - Click on the **Quality** tab to view the monitor details.

2. **Review Metrics Tables**
   - Examine the metrics tables (`*_processed_profile_metrics` and `*_processed_drift_metrics`).


**🚨Note:** Ensure that the refresh process is completed and the metrics tables are ready before reviewing the details.


## Task 7: View the Monitor Dashboard

In this task, you will view the Databricks SQL dashboard generated by Lakehouse Monitoring to review the data and metrics of your monitoring solution.

**Steps:**

Complete following steps:

1. **View the SQL Dashboard**
   - Click on **View Dashboard** to open the SQL dashboard from the **Quality** tab.

2. **Inspect Overall Summary Statistics**
   - Examine the overall summary statistics presented in the dashboard.

3. **Review the Created Metrics**
   - Review the metrics that were created in the first step of this lab to understand the data quality and model performance over time.


**🚨Note:** Make sure there is an accessible DBSQL cluster up and running to ensure dashboard creation.

## Conclusion

In this lab, you created an online monitor using Databricks Lakehouse Monitoring. First, you defined evaluation metrics and computed these metrics for the inference table. Then, you created a monitor on the inference table. Lastly, you reviewed the monitor details and the auto-created Databricks SQL dashboard. After successfully completing this lab, you should be able to create online monitoring for an inference table that captures the inference requests of deployed AI models.


&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="blank">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy" target="blank">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use" target="blank">Terms of Use</a> | 
<a href="https://help.databricks.com/" target="blank">Support</a>