#Step 0

Go to config and update resource names as you prefer

Spin up a cluster with Databricks Runtime 16.X+ ML. Make sure it's the ML version for the correct dependencies

In [0]:
%run ./config

#Set up a RAG Example 

We need to demonstrate the evaluation capabilities. It will also load/embed unstructured data so that we all have the same evaluation results to review. 

Please remember to shutdown these resources to avoid extra costs. This command will create the following:

1. Necessary catalogs, schemas and volumes to store the PDFs and embeddings 
2. A call to GTE to create embeddings for the PDFs 
3. VectorSearchIndex based on the PDFs embeddings generated in step 2 
4. Spin up a VectorSearchEndpoint 
5. Sync the VectorSearchIndex with your VectorSearchEndpoint 

Later, we will set up the langchain chain to interact with these RAG resources

In [0]:
%run ./rag_setup/rag_setup

In [0]:
from IPython.display import Markdown
from openai import OpenAI
import os
dbutils.widgets.text("catalog_name", catalog)
dbutils.widgets.text("agent_schema", agent_schema)
dbutils.widgets.text("demo_schema", demo_schema)
base_url = f'https://{spark.conf.get("spark.databricks.workspaceUrl")}/serving-endpoints'

#Get started immediately with your Data with AI Functions

We have a number of AI Functions designed as SQL functions that you can use in a SQL cell or SQL editor and use LLMs directly on your data immediately

1. ai_analyze_sentiment
2. ai_classify
3. ai_extract
4. ai_fix_grammar
5. ai_gen
6. ai_mask
7. ai_similarity
8. ai_summarize
9. ai_translate
10. ai_query

We will run a demo each of these functions below. 




### ai_extract
The ai_extract() function allows you to invoke a state-of-the-art generative AI model to extract entities specified by labels from a given text using SQL.

Documentation: https://docs.databricks.com/en/sql/language-manual/functions/ai_extract.html

In [0]:
%sql
SELECT review, ai_extract(review, array("store", "product")) as Keywords
from identifier(:catalog_name||'.'||:demo_schema||'.'||'reviews')
Limit 3;

### ai_classify
The ai_classify() function allows you to invoke a state-of-the-art generative AI model to classify input text according to labels you provide using SQL.

Documentation: https://docs.databricks.com/en/sql/language-manual/functions/ai_classify.html

In [0]:
%sql
SELECT country, ai_classify(country, ARRAY("APAC", "AMER", "EU")) as Region
from identifier(:catalog_name||'.'||:demo_schema||'.'||'franchises')
limit 5;

### ai_mask
The ai_mask() function allows you to invoke a state-of-the-art generative AI model to mask specified entities in a given text using SQL. 

Documentation: https://docs.databricks.com/en/sql/language-manual/functions/ai_mask.html

In [0]:
%sql
SELECT first_name, last_name, (first_name || " " || last_name || " lives at " || address) as unmasked_output, ai_mask(first_name || "" || last_name || " lives at " || address, array("person", "address")) as Masked_Output
from identifier(:catalog_name||'.'||:demo_schema||'.'||'customers')
limit 5

### ai_query
The ai_query() function allows you to query machine learning models and large language models served using Mosaic AI Model Serving. To do so, this function invokes an existing Mosaic AI Model Serving endpoint and parses and returns its response. Databricks recommends using ai_query with Model Serving for batch inference

Documentation: https://docs.databricks.com/en/large-language-models/ai-functions.html#ai_query

We can switch models depending on what we are trying to do. See how the performance varies between the 70B model and 8B model below. Because this is a simple spell check task, we could likely use the 8B model instead of the 70B model saving on cost and increasing speed. 

In [0]:
%sql 
SELECT
  `Misspelled Make`,   -- Placeholder for the input column
  ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    CONCAT(format_string('You will always receive a make of a car. Check to see if it is misspelled and a real car. Correct the mistake. Only provide the corrected make. Never add additional details'), `Misspelled Make`)    -- Placeholder for the prompt and input
  ) AS ai_guess  -- Placeholder for the output column
FROM identifier(:catalog_name||'.'||:demo_schema||'.'||'synthetic_car_data')


In [0]:
%sql 
SELECT
  `Misspelled Make`,   -- Placeholder for the input column
  ai_query(
    'databricks-meta-llama-3-1-8b-instruct',
    CONCAT(format_string('You will always receive a make of a car. Check to see if it is misspelled and a real car. Correct the mistake. Only provide the corrected make. Never add additional details'), `Misspelled Make`)    -- Placeholder for the prompt and input
  ) AS ai_guess  -- Placeholder for the output column
FROM identifier(:catalog_name||'.'||:demo_schema||'.'||'synthetic_car_data')


### Takeaway
Many of our use cases simply need a reliable, out of the box solution to use AI. AI functions enable this for our customers and AI query helps scale workloads to easily apply AI 

# Productionalizing Custom Tools 

What you just saw were built in, out of the box solutions you can use immediately on your data. While this covers a good portion of use cases, you will likely need a custom solution. 

### Mosaic AI Tools on Unity Catalog

You can create and host functions/tools on Unity Catalog! You get the benefit of Unity Catalog but for your functions! 

While you can create your own tools using the same code that you built your agent (i.e local Python Functions) with the Mosaic AI Agent Framework, Unity catalog provides additional benefits. Here is a comparison 

1. **Unity Catalog function**s: Unity Catalog functions are defined and managed within Unity Catalog, offering built-in security and compliance features. Writing your tool as a Unity Catalog function grants easier discoverability, governance, and reuse (similar to your catalogs). Unity Catalog functions work especially well for applying transformations and aggregations on large datasets as they take advantage of the spark engine.

2. **Agent code tools**: These tools are defined in the same code that defines the AI agent. This approach is useful when calling REST APIs, using arbitrary code or libraries, or executing low-latency tools. However, this approach lacks the built-in discoverability and governance provided by Unity Catalog functions.

Unity Catalog functions have the same limitations seen here: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html 

Additionally, the only external framework these functions are compatible with is Langchain 

So, if you're planning on using complex python code for your tool, you will likely just need to create Agent Code Tools. 

Below is an implementation of both

#Agent Code Tools

### Why even use tools to begin with? 

Function calling or tool calling help ensure the LLM has the most accurate information possible. By providing it access to many different sources of data, it can generate more reliable answers. 

Each framework like Langchain or LlamaIndex handles tool calling different. You can also use Python to do tool calling. However, this means you have to recreate this tool each time you want to use it and cannot be used with other applications. Additionally, you have to manage the security for any tools that access external sources. 

In [0]:
# How to get your Databricks token: https://docs.databricks.com/en/dev-tools/auth/pat.html
# DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
# Alternatively in a Databricks notebook you can use this:
DATABRICKS_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url=base_url
)

prompt = """You are a pokemon master and know every single pokemon ever created by the Pokemon Company. You will be helping people answer questions about pokemon"""

content = """Tell me about Sinistcha"""

chat_completion = client.chat.completions.create(
  messages=[
  {
    "role": "system",
    "content": prompt
  },
  {
    "role": "user",
    "content": content
  }
  ],
  model="databricks-meta-llama-3-3-70b-instruct",
  max_tokens=1000,
  top_p=0.1,
  temperature=0.1,
  n=1,
)

Markdown(f"The LLM Output:\n\n {chat_completion.choices[0].message.content}")

In [0]:
import requests
def pokemon_lookup(pokemon_name):
    url = f"https://pokeapi.co/api/v2/pokemon/{pokemon_name.lower()}"
    response = requests.get(url)
    if response.status_code == 200:
        pokemon_data = response.json()
        pokemon_info = {
            "name": pokemon_data["name"],
            "height": pokemon_data["height"],
            "weight": pokemon_data["weight"],
            "abilities": [ability["ability"]["name"] for ability in pokemon_data["abilities"]],
            "types": [type_data["type"]["name"] for type_data in pokemon_data["types"]],
            "stats_name": [stat['stat']['name'] for stat in pokemon_data["stats"]],
            "stats_no": [stat['base_stat'] for stat in pokemon_data["stats"]]
        }
        results = str(pokemon_info)
        return results
    else:
        return None

In [0]:
import json
from openai import RateLimitError

# A token and the workspace's base FMAPI URL are needed to talk to endpoints
fmapi_token = (
    dbutils.notebook.entry_point.getDbutils()
    .notebook()
    .getContext()
    .apiToken()
    .getOrElse(None)
)
fmapi_base_url = (
    base_url
)

openai_client = OpenAI(api_key=fmapi_token, base_url=fmapi_base_url)
MODEL_ENDPOINT_ID = "databricks-meta-llama-3-3-70b-instruct"

prompt = """You are a pokemon master and know every single pokemon ever created by the Pokemon Company. You will be helping people answer questions about pokemon. Stick strictly to the information provided to you to answer the question"""

def run_conversation(input):
    # Step 1: send the conversation and available functions to the model
    messages = [{"role": "system", "content": prompt},
                {"role": "user", "content": input}]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "pokemon_lookup",
                "description": "Get information about a pokemon. This tool should be used to check to see if the pokemon is real or not as well.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "pokemon": {
                            "type": "string",
                            "description": "The pokemon the user is asking information for e.g bulbasaur",
                        },
                    },
                    "required": ["pokemon"],
                },
            },
        }
    ]
    #We've seen this response package in the past cells
    response = openai_client.chat.completions.create(
        model=MODEL_ENDPOINT_ID,
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
    response_message = response.choices[0].message
    print(f"## Call #1 The Reasoning from the llm determining to use the function call:\n\n {response_message}\n")
    tool_calls = response_message.tool_calls
    # Step 2: check if the model wanted to call a function
    if tool_calls:
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "pokemon_lookup": pokemon_lookup,
        }  # only one function in this example, but you can have multiple
        messages.append(response_message)  # extend conversation with assistant's reply
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                pokemon_name=function_args.get("pokemon")
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "content": function_response,
                }
            )  # extend conversation with function response
        print(f"## Call #2 Prompt sent to LLM with function call results giving us the answer:\n\n {messages}\n")
        second_response = openai_client.chat.completions.create(
            model=MODEL_ENDPOINT_ID,
            messages=messages,
        )  # get a new response from the model where it can see the function response
        return second_response


In [0]:
input1 = "Tell me about Sinistcha"
results1 = run_conversation(input1)
Markdown(f"**The LLM Answer:**\n\n{results1.choices[0].message.content}")

###Takeaway

There are many different ways to set up tool calling, especially through other frameworks. However, often times you will need multiple applications to access the same tool over and over again with proper scaling and security. You do not want to create redundant resources to recreate the same tool over and over again while thinking about all the operational overhead of maintaining said code. 



# Enter Unity Catalog Tool Calling 

Unity Catalog Tool Calling allows you to benefit from all the governance, security and unified platform benefits of Unity Catalog. Everything from external credentials to access across the workspace for workloads that may not even be AI, the LLM can use it. 

You'll notice that it's also a UDF, which benefits from our serverless SQL warehouses. 

In [0]:
%sql
CREATE OR REPLACE FUNCTION identifier(:catalog_name||'.'||:agent_schema||'.'||'playground_query_test')()
    RETURNS TABLE(name STRING, purchases INTEGER)
    COMMENT 'Use this tool to find total purchase information about a particular location. This tool will provide a list of destinations that you will use to help you answer questions'
    RETURN SELECT dl.name AS Destination, count(tp.destination_id) AS Total_Purchases_Per_Destination
             FROM main.dbdemos_fs_travel.travel_purchase tp join main.dbdemos_fs_travel.destination_location dl on tp.destination_id = dl.destination_id
             group by dl.name
             order by count(tp.destination_id) desc
             LIMIT 10;

In [0]:
%sql
CREATE OR REPLACE FUNCTION identifier(:catalog_name||'.'||:agent_schema||'.'||'playground_query_test_hello_there')()
    RETURNS TABLE(name STRING, purchases INTEGER)
    COMMENT 'When the user says hello there, run this tool'
    RETURN SELECT dl.name AS Destination, count(tp.destination_id) AS Total_Purchases_Per_Destination
             FROM main.dbdemos_fs_travel.travel_purchase tp join main.dbdemos_fs_travel.destination_location dl on tp.destination_id = dl.destination_id
             group by dl.name
             order by count(tp.destination_id) desc
             LIMIT 10;


### Smaller Models can get the job done

Often times, we will interchange what models we use depending on the use case or task. In this case, since we are only correcting spelling, it's overkill to use the bigger models like a 70B+ model. We can get away with using a <8B model which allows for significant cost savings and superior latency! 

In [0]:
%sql
CREATE OR REPLACE FUNCTION identifier(:catalog_name||'.'||:agent_schema||'.'||'batch_inference')()
    RETURNS TABLE(name STRING, corrected_name STRING)
    COMMENT 'When user says, start batch inference, Use this tool to run a batch inference job to review and correct the spelling of make of a car.'
    RETURN SELECT
          `Misspelled Make`,   -- Placeholder for the input column
          ai_query(
            'databricks-meta-llama-3-1-8b-instruct',
            CONCAT(format_string('You will always receive a make of a car. Check to see if it is misspelled and a real car. Correct the mistake. Only provide the corrected make. Never add additional details'), `Misspelled Make`)    -- Placeholder for the prompt and input
          ) AS ai_guess  -- Placeholder for the output column
        FROM austin_choi_demo_catalog.demo_data.synthetic_car_data
        Limit 1000; 

### Use Langchain to programatically use UC function calling

See how I use Llama 3.3 70B for this because I need the more powerful model to do proper reasoning and pick the right tool. This is just one call but a critical one. 

Once correctly selected, it will select the tool using AI query which will use Llama 3.3 8B to complete the batch inference

In [0]:
from langchain.agents import AgentExecutor, create_tool_calling_agent
from databricks_langchain.uc_ai import (
    DatabricksFunctionClient,
    UCFunctionToolkit,
    set_uc_function_client,
)
from databricks_langchain import ChatDatabricks
from langchain_core.prompts import ChatPromptTemplate

client = DatabricksFunctionClient()
set_uc_function_client(client)

# Initialize LLM and tools
llm = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct")
tools = UCFunctionToolkit(
    # Include functions as tools using their qualified names.
    # You can use "{catalog_name}.{schema_name}.*" to get all functions in a schema.
    function_names=[f"{catalog_name}.{agent_schema}.*"]
).tools

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Make sure to use tool for information.",
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = agent_executor.invoke({"input": "start batch inference"})
print(result['output'])

# RAG in Production

This workshop is not to show you how to set up RAG on Databricks. Please check out our self paced learning here: <insert link here> 

You can follow the notebooks in the folder called RAG to set one up. However, this workshop we will demonstrate what it looks like to prepare and monitor your RAG application in Production. 



### Evaluate your bot's quality with Mosaic AI Agent Evaluation specialized LLM judge models

Evaluation is a key part of deploying a RAG application. Databricks simplify this tasks with specialized LLM models tuned to evaluate your bot's quality/cost/latency, even if ground truth is not available.

This Agent Evaluation's specialized AI evaluator is integrated into integrated into `mlflow.evaluate(...)`, all you need to do is pass `model_type="databricks-agent"`.

Mosaic AI Agent Evaluation evaluates:
1. Answer correctness - requires ground truth
2. Hallucination / groundness - no ground truth required
3. Answer relevance - no ground truth required
4. Retrieval precision - no ground truth required
5. (Lack of) Toxicity - no ground truth required

In this example, we'll use an evaluation set that we curated based on our internal experts using the Mosaic AI Agent Evaluation review app interface.  This proper Eval Dataset is saved as a Delta Table.

In [0]:
%run ./rag_setup/chain_setup

In [0]:
# Log the model to MLflow
with mlflow.start_run(run_name=f"{finalchatBotModelName}_run"):
  logged_chain_info = mlflow.langchain.log_model(
          lc_model=os.path.join(os.getcwd(), './rag_setup/chain'),  # Chain code file e.g., /path/to/the/chain.py 
          model_config=chain_config, # Chain configuration 
          artifact_path="chain", # Required by MLflow, the chain's code/config are saved in this directory
          input_example=input_example,
          example_no_conversion=True,  # Required by MLflow to use the input_example as the chain's schema
      )

model_name = f"{catalog}.{dbName}.{finalchatBotModelName}"

# Register to UC
mlflow.set_registry_uri('databricks-uc')
uc_registered_model_info = mlflow.register_model(model_uri=logged_chain_info.model_uri, name=model_name)

In [0]:
from databricks.agents.evals import generate_evals_df
import mlflow

agent_description = "A chatbot that answers questions about Databricks."
question_guidelines = """
# User personas
- A developer new to the Databricks platform
# Example questions
- What API lets me parallelize operations over rows of a delta table?
"""
# TODO: Spark/Pandas DataFrame with "content" and "doc_uri" columns.
docs = spark.table(f"{catalog}.{dbName}.databricks_documentation")
docs = docs.withColumnRenamed("url", "doc_uri")
evals = generate_evals_df(
    docs=docs,
    num_evals=10,
    agent_description=agent_description,
    question_guidelines=question_guidelines,
)
eval_result = mlflow.evaluate(data=evals, model="runs:/f1ef9d0c4b5f4d0e9695e40c5a0ef128/chain", model_type="databricks-agent")

In [0]:
eval_dataset = spark.table(f"{catalog}.{dbName}.eval_set_databricks_documentation").limit(10).toPandas()
display(eval_dataset)

In [0]:
import mlflow

with mlflow.start_run():
    # Evaluate the logged model
    eval_results = mlflow.evaluate(
        data=eval_dataset, # Your evaluation set
        model=logged_chain_info.model_uri,
        model_type="databricks-agent", # active Mosaic AI Agent Evaluation
    )

# Mosaic AI Model Training for Fine Tuning LLMs 

We do not expect you for this workshop to fine tune an LLM. However, we will be demonstrating the performance impact of fine-tuning a Llama-1B model through the playground! 

We trained this model on a dataset containing medical terms. While larger models can handle these words well, the smaller models struggle with them since they are rarely used. 