# Using Generative AI in your projects

## Overview

1. Query the foundation model APIs
    - With the OpenAI client
    - With the Databricks GenAI Inference SDK
    - With AI SQL functions
2. Getting Started with Databricks Vector Search
3. RAG: Vector Search + LLMs
4. Next steps: LLM Chatbot With Retrieval Augmented Generation (RAG) and DBRX Demo

# 1. Query the Foundation Model APIs
- AI playground is useful for comparing models, but we need other approaches to integrating GenAI in our projects
- The Foundation Model APIs provide an easy way to use state-of-the-art open models without the need to provision the computational resources yourself.

## Using the OpenAI Python SDK
- The Foundation Model APIs are compatible with OpenAI's Python client
- Easy to switch OpenAI-based projects over to using the Databricks Foundation Model APIs

### Setup
- Set `OPENAI_BASE_URL` to `your_databricks_workspace_url`/`serving-endpoints`
- Set `OPENAI_API_KEY` to a databricks Personal Access Token, which you can obtain from User Settings > Developer > Access Tokens > Generate New Token.

In [0]:
# setup
%pip install --upgrade OpenAI


In [0]:
dbutils.library.restartPython()

In [0]:
import os

# os.environ["OPENAI_BASE_URL"] = "<your_workspace_url>/serving-endpoints/"
# os.environ["OPENAI_API_KEY"] = "<your_personal_access_token>"

os.environ["OPENAI_BASE_URL"] = "https://adb-3989190133890802.2.azuredatabricks.net/serving-endpoints/databricks-dbrx-instruct/invocations"
os.environ["OPENAI_API_KEY"] = "sk-proj-dRl3zkD4w93f9U4qtDYfT3BlbkFJPdqoVMJg6DBDEHFo5du6"

In [0]:
from openai import OpenAI

os.environ["DATABRICKS_TOKEN"] = "dapi89dbb6f67448e2f0c1e0d1db6ce213d9-3"

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
# Alternatively in a Databricks notebook you can use this:
# DATABRICKS_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

# client = OpenAI() 

client = OpenAI(api_key=DATABRICKS_TOKEN,  base_url="https://adb-3989190133890802.2.azuredatabricks.net/serving-endpoints")

chat_completion = client.chat.completions.create(
    model="databricks-dbrx-instruct",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant",
        },
        {
            "role": "user",
            "content": "What is the relationship between Delta Lake and Parquet?",
        },
    ],
)

print(chat_completion.choices[0].message.content)

In [0]:
from openai import OpenAI

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://adb-3989190133890802.2.azuredatabricks.net/serving-endpoints"
)

chat_completion = client.chat.completions.create(
  messages=[
  {
    "role": "system",
    "content": "You are an AI assistant"
  },
  {
    "role": "user",
    "content": "Tell me about Large Language Models"
  }
  ],
  model="databricks-dbrx-instruct",
  max_tokens=256
)

print(chat_completion.choices[0].message.content)

In [0]:
from openai import OpenAI, APIError
# client = OpenAI()

try:
    client = OpenAI(api_key="sk-proj-dRl3zkD4w93f9U4qtDYfT3BlbkFJPdqoVMJg6DBDEHFo5du6", base_url = "https://api.openai.com")

    response = client.chat.completions.create(
    model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant",
            },
            {
                "role": "user",
                "content": "What is the relationship between Delta Lake and Parquet?",
            },
        ],
    temperature=1,
    max_tokens=256,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
    )
    print(response)

except APIError as e:
    print(f"An error occurred: {e}")

## Using the Databricks GenAI SDK
- Similar functionality to the OpenAI SDK
- Designed for use in Databricks notebooks
- Handles authentication automatically
- `ChatSession` interface makes it east to manage multi-turn chat exchanges.

In [0]:
%pip install databricks-genai-inference
dbutils.library.restartPython()

Set up the chat session:

In [0]:
from databricks_genai_inference import ChatSession

chat = ChatSession(
    model="databricks-dbrx-instruct", system_message="You are a helpful assistant."
)

Send a User message to the model:

In [0]:
chat.reply("What is the relationship between Delta Lake and Parquet?")

Get the response:

In [0]:
print(chat.last)

You can keep repeating this procedure to carry on a chat exchange with the model. You can obtain the whole history as follows:

In [0]:
chat.history

There are several more methods to query the foundation model APIs; you can learn more about them [here](https://docs.databricks.com/en/machine-learning/model-serving/score-foundation-models.html). These include:
- REST API
- MLFlow Deployments SDK
- [SQL Functions](https://docs.databricks.com/en/large-language-models/ai-functions.html): these functions use the Foundation Model APIs to accomplish specific tasks such as AI summarization, extraction, and grammar correction. Depending on your project, AI SQL functions might be the best and simplest way for you to use AI!
    - ai_analyze_sentiment
    - ai_classify
    - ai_extract
    - ai_fix_grammar
    - ai_gen
    - ai_mask
    - ai_similarity
    - ai_summarize
    - ai_translate

In [0]:
%sql
SELECT ai_extract(
    'John Doe lives in New York and works for Acme Corp.',
    array('person', 'location', 'organization')
  );

# 2. Using Databricks Vector Search
- Examples so far: We send a prompt to the model, and the model generates a response based only on its training data.
- Retrieval-Augmented Generation (RAG): *Augment* prompts with information *retrieved* from an outside data source, enabling the model to *generate* responses informed by the retrieved information. Why?
    - Access to proprietary or domain-specific information.
    - Access to timely information.
    - Fine-grained permissions control.

## Vector Search
- A special type of language model called an *embedding model* translates each of the texts we want to search into a numeric representations called embeddings that capture the meaning of the text
- We can efficiently search for similar embeddings, which lets us use natural-language queries to find relevant texts in a vector database.

### Databricks Vector Search
- Databricks Vector Search is a vector database integrated into the Databricks platform for storing and retrieving embeddings.
    - A vector search endpoint is a scalable compute resource that handles API requests to query and update the search index.
    - A vector search index is a data structure optimized for efficient similarity search, created from a Delta table. The Delta table provides a reliable and scalable storage format for the original data. The index is a separate data structure that enables fast retrieval of similar vectors.
    - The index can be kept in sync with the delta table automatically or manually

## Vector Search Demo
- Scrape documentation from a website
- Split it into chunks
- Save it to the Delta table
- Synchronize the Delta table with the Vector Search Index
- Query the Index

### Setup

In [0]:
%pip install --upgrade databricks-vectorsearch databricks-genai-inference llama-index llama-index-readers-web
dbutils.library.restartPython()

**Initialize the vector search client:**

In [0]:
from databricks.vector_search.client import VectorSearchClient
vsc = VectorSearchClient()

**Set up a vector search endpoint**
- Can use the UI, Python SDK, or REST API
- We set up the example endpoint in advance of this demo as it takes a few minutes to provision the endpoint.

**Create a source Delta table**
- Simple example: columns for unique ID and text
- Databricks Vector Search also enables searching/filtering on metadata, so you can add more columns for e.g. document title, date, or whatever else you might want.

In [0]:
# Set up schema/volume/table
from pyspark.sql.types import StructType, StructField, StringType, ArrayType, FloatType

spark.sql("CREATE SCHEMA IF NOT EXISTS genai_exploration.hackathon_demo")
spark.sql(
    """
    CREATE TABLE IF NOT EXISTS genai_exploration.hackathon_demo.source_table (
        id STRING,
        text STRING
    )
    USING delta 
    TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true')
    """
)

In [0]:
# Initialize the Vector Search client


# Create the Vector Search endpoint
vsc.create_endpoint(
    name="rag_demo_endpoint",
    endpoint_type="STANDARD"
)

**Create the vector search index**
- to save time, we created the index in advance

In [0]:
index=vsc.create_delta_sync_index(
    endpoint_name="rag_demo_endpoint",
    index_name="genai_exploration.hackathon_demo.vs_index",
    source_table_name="genai_exploration.hackathon_demo.source_table",
    pipeline_type="CONTINUOUS", # or TRIGGERED
    primary_key="id",
    embedding_source_column="text",
    embedding_model_endpoint_name="databricks-bge-large-en" # or provide an embedding_vector_column
)

**Retrieve the index**

In [0]:
index = vsc.get_index(
    endpoint_name="rag_demo_endpoint",
    index_name="genai_exploration.hackathon_demo.vs_index",
    )

**Obtain and process the data:**
- Databricks GenAI tools work well with OSS libraries such as LlamaIndex and Langchain.
- Here, we use LlamaIndex to obtain, parse, and split the webpage data.
- Once we load them to the source Delta table, they will be synchronized with the Vector Search index automatically.

In [0]:
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.node_parser import SentenceSplitter

# the URLs we're getting data from
urls = ["https://docs.databricks.com/en/generative-ai/vector-search.html",
        "https://docs.databricks.com/en/generative-ai/create-query-vector-search.html",
        "https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html"]

# llamaindex tools for getting and splitting the text
reader = SimpleWebPageReader(html_to_text=True)
parser = SentenceSplitter.from_defaults()

# text chunks for indexing
chunks = parser.get_nodes_from_documents(reader.load_data(urls))

# the schema of our source delta table
schema = StructType(
    [
        StructField("id", StringType(), True),
        StructField("text", StringType(), True),
    ]
)

# Initialize an empty DataFrame with the defined schema
df = spark.createDataFrame([], schema)

# Iterate through the list of chunks
for chunk in chunks:
    chunk = chunk.dict()
    chunk_id = chunk["id_"]
    chunk_text = chunk["text"]
    
    # Create a new row with the extracted properties
    new_row = spark.createDataFrame([(chunk_id, chunk_text)], schema)
    
    # Append the new row to the DataFrame
    df = df.union(new_row)

# Save the DataFrame as a table
df.write.format("delta").mode("append").saveAsTable("shared.hackathon_demo.source_table")

And now we can query the index!

In [0]:
# query
x = index.similarity_search(columns=["text"],
                    query_text="Represent this sentence for searching relevant passages: What are the different methods for creating a vector search endpoint?",
                    num_results = 3)

In [0]:
print(x["result"]["data_array"][0][0])


# 3. Putting them together: Vector Search + Foundation Model API
- Raw results from the vector search are not very useful: they do not directly answer the user query and they may require searching the retrieved results
- This is where the "Generation" part of RAG comes in. We combine Vector Search's dynamic information retrieval capabilities with LLMs' language aptitude to generate clear and cohensive answers informed by the retrieved information

In [0]:
from databricks_genai_inference import ChatSession, Embedding
from databricks.vector_search.client import VectorSearchClient

class RAG:
    def __init__(
        self,
        model="databricks-dbrx-instruct",
        system_message="You are a helpful assistant. Answer the user's question. If context is provided, you must answer based on the context.",
        max_tokens=2048,
        index_name="shared.hackathon_demo.vs_index",
        endpoint="rag_demo_endpoint",
    ):
        self.chat_session = ChatSession(
            model=model, system_message=system_message, max_tokens=max_tokens,
        )
        self.vsc = VectorSearchClient(disable_notice=True)
        self.endpoint = endpoint
        self.index_name = index_name

    def query_index(self, query, num_results=3):
        """
        Queries the vector search index to retrieve relevant passages based on the given query.
        Returns the concatenated text of the retrieved passages.
        """
        index = self.vsc.get_index(
            endpoint_name=self.endpoint,
            index_name=self.index_name,
        )

        query_resp = index.similarity_search(
            query_text= "Represent this sentence for searching relevant passages: " + query,
            columns=["text"],
            num_results=num_results,
        )

        results = query_resp["result"]["data_array"]
        concatenated_text = ""

        for i, result in enumerate(results):
            concatenated_text += result[0]  
            if i < len(results) - 1:
                concatenated_text += "\n---\n" 

        return concatenated_text

    def query_rag(self, user_input):
        """
        Performs Retrieval-Augmented Generation (RAG) using the provided user input.
        Retrieves relevant context from the index and generates a response using the chat model.
        """
        ctx_chunks = self.query_index(user_input)
        ctx = (
            "Answer the question based on the provided context. Context:\n\n"
            + ctx_chunks
            + "\n\n"
        )
        self.chat_session.reply(ctx + '\n\n' + user_input)
        bot_response = self.chat_session.last.strip()
        return bot_response

    def clear_chat(self):
        """Clears the chat session history."""
        self.chat_session.history.clear()

In [0]:
rag = RAG()

In [0]:
print(rag.query_rag("What are the different ways I can create a vector search endpoint?"))

In [0]:
print(rag.query_rag("What is the difference between a delta sync index and a direct vector access index?"))

In [0]:
rag.clear_chat()

In [0]:
print(rag.query_rag("Give me python code to create a new continuous sync vector index called MY_INDEX on delta table MY_TABLE and endpoint MY_ENDPOINT using the python sdk. Use the databricks-bge-large-en embeddings model."))

# Learn More

https://notebooks.databricks.com/demos/llm-rag-chatbot/index.html

In [0]:
%pip install dbdemos

In [0]:
import dbdemos
dbdemos.install('llm-rag-chatbot')

# Utils

In [0]:
%sql
truncate table shared.hackathon_demo.source_table;