![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Deployable Function Notebook for RAG-Grounded Chat
This notebook contains steps and code to test, promote, and deploy a Python function
capturing logic to implement RAG pattern for grounded chats. It introduces Python API commands
for authentication using API key and prompt inferencing using WML API.

**Note:** Notebook code generated using Prompt Lab will execute successfully.
If code is modified or reordered, there is no guarantee it will successfully execute.
For details, see: <a href="/docs/content/wsj/analyze-data/fm-prompt-save.html?context=wx" target="_blank">Saving your work in Prompt Lab as a notebook.</a>

Some familiarity with Python is helpful. This notebook uses Python 3.10.

## Contents
This notebook contains the following parts:

1. Setup
2. Get an ID for a function deployment
3. Initialize all the variables needed by the function
4. Create a deployable function
5. Test the deployed function

## 1. Set up the environment

Before you can run this notebook, you must perform the following setup tasks:

### Connection to WML
This cell defines the credentials required to work with watsonx API for both the execution in the project, 
as well as the deployment and runtime execution of the function.

**Action:** Provide the IBM Cloud personal API key. For details, see
<a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank">documentation</a>.


In [None]:
import os
import getpass
import requests

def get_credentials():
    return {
        "url" : "https://us-south.ml.cloud.ibm.com",
        "apikey" : getpass.getpass("Please enter your api key (hit enter): ")
    }

def get_bearer_token():
    url = "https://iam.cloud.ibm.com/identity/token"
    headers = {"Content-Type": "application/x-www-form-urlencoded"}https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/86488f09-6626-47f0-8f6c-736cc3541afc?projectid=b43a5e28-8c49-4cc1-94e5-ce92969f6217&context=cpdaas#
    data = f"grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey={credentials['apikey']}"

    response = requests.post(url, headers=headers, data=data)
    return response.json().get("access_token")

credentials = get_credentials()

In [None]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials)

### Connecting to a space
A space will be be used to host the promoted function.

In [None]:
space_id = "a06ae04e-5905-41af-b9cd-f097e02e1f63"
client.set.default_space(space_id)

### Promote asset(s) to space
We will now promote assets we will need to stage in the space so that we can access their data from the deployed function.

In [None]:
promoted_assets = {}
promoted_assets["vector_index_id"] = client.spaces.promote("61fe620d-db36-4a6a-89c6-4c093a1b3e47", "b43a5e28-8c49-4cc1-94e5-ce92969f6217", "a06ae04e-5905-41af-b9cd-f097e02e1f63")
print(promoted_assets)

## 2. Create a deployable interactive chain function
We first need to define the chain function.

### 2.1 Define the function

In [None]:
deploy_credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "token": get_bearer_token()
}

ai_params = {"credentials": credentials, "space_id": space_id, "promoted_assets": promoted_assets}

def my_deployable_chain_function( params=ai_params ):
    import subprocess
    from ibm_watsonx_ai.foundation_models import Model
    from ibm_watsonx_ai import APIClient

    client = APIClient(params["credentials"])
    space_id = "a06ae04e-5905-41af-b9cd-f097e02e1f63"
    client.set.default_space(space_id)
    
    vector_index_details = client.data_assets.get_details(params["promoted_assets"]["vector_index_id"])
    vector_index_properties = vector_index_details["entity"]["vector_index"]

    import gzip
    import json
    import chromadb
    import random
    import string
    from ibm_watsonx_ai.foundation_models.embeddings.sentence_transformer_embeddings import SentenceTransformerEmbeddings

    emb = SentenceTransformerEmbeddings('sentence-transformers/all-MiniLM-L6-v2')

    vector_index_id = params["promoted_assets"]["vector_index_id"]
    vector_index_details = client.data_assets.get_details(vector_index_id)
    vector_index_properties = vector_index_details["entity"]["vector_index"]

    def hydrate_chromadb():
        data = client.data_assets.get_content(vector_index_id)
        content = gzip.decompress(data)
        stringified_vectors = str(content, "utf-8")
        vectors = json.loads(stringified_vectors)
        
        chroma_client = chromadb.Client()
        
        # make sure collection is empty if it already existed
        collection_name = "my_collection"
        try:
            collection = chroma_client.delete_collection(name=collection_name)
        except:
            print("Collection didn't exist - nothing to do.")
        collection = chroma_client.create_collection(name=collection_name)
        
        vector_embeddings = []
        vector_documents = []
        vector_metadatas = []
        vector_ids = []
        
        for vector in vectors:
            vector_embeddings.append(vector["embedding"])
            vector_documents.append(vector["content"])
            metadata = vector["metadata"]
            lines = metadata["loc"]["lines"]
            clean_metadata = {}
            clean_metadata["asset_id"] = metadata["asset_id"]
            clean_metadata["asset_name"] = metadata["asset_name"]
            clean_metadata["url"] = metadata["url"]
            clean_metadata["from"] = lines["from"]
            clean_metadata["to"] = lines["to"]
            vector_metadatas.append(clean_metadata)
            asset_id = vector["metadata"]["asset_id"]
            random_string = ''.join(random.choices(string.ascii_uppercase + string.digits, k=10))
            id = "{}:{}-{}-{}".format(asset_id, lines["from"], lines["to"], random_string)
            vector_ids.append(id)

        collection.add(
            embeddings=vector_embeddings,
            documents=vector_documents,
            metadatas=vector_metadatas,
            ids=vector_ids
        )
        return collection
    
    chroma_collection = hydrate_chromadb()

    def proximity_search( question ):
        query_vectors = emb.embed_query(question)
        query_result = chroma_collection.query(
            query_embeddings=query_vectors,
            n_results=vector_index_properties["settings"]["top_k"],
            include=["documents", "metadatas", "distances"]
        )
        
        documents = list(reversed(query_result["documents"][0]))
        metadatas = reversed(query_result["metadatas"][0])
        distances = reversed(query_result["distances"][0])
        results = []
        for metadata, distance in zip(metadatas, distances):
            results.append({
                "metadata": metadata,
                "score": distance
            })

        return {
            "results": results,
            "documents": documents
        }

    def format_input(messages, documents):
        grounding = "\n".join(documents)
        system = f"""<|start_header_id|>system<|end_header_id|>

You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is correct given the context and user query, and that it is grounded in the context. Furthermore, make sure that the response is supported by the given document or context. When the question cannot be answered using the context or document, output the following response: "I cannot answer that question based on the provided document." Always make sure that your response is relevant to the question. If an explanation is needed, first provide the explanation or reasoning, and then give the final answer.


Whenever you want to provide answer, make sure that you privide the response in the following format:

     *** Issue title ***:  Subject
     *** Conclusions ***: What are the key takeaways or conclusions from the discussion in the document, provide via bullet point?
     *** Next Actions ***: What are the next steps that need to be taken in the document, provide via bullet point?
     *** Key Stakeholders ***: Who are the main stakeholders and individuals involved in the document, provide personal name of stakeholders as well with their responsibility?
     *** Related Risks ***: What risks or dependencies are associated with this issue in the document, provide via bullet point?
     *** ETA ***: What is the estimated time for completion of the tasks in the document, provide via bullet point?<|eot_id|><|start_header_id|>user<|end_header_id|>

{grounding}"""
        messages_section = []

        for index,value in enumerate(messages, start=0):
            content = value["content"]
            user_template = f"""<|eot_id|><|start_header_id|>user<|end_header_id|>

{content}"""
            assistant_template = f"""<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{content}"""
            grounded_user_template = f"""<|eot_id|><|start_header_id|>user<|end_header_id|>

{content}"""

            formatted_entry = user_template if value["role"] == "user" else assistant_template
            if (index == len(messages)-1):
                formatted_entry = grounded_user_template
            
            messages_section.append(formatted_entry)

        messages_section = "".join(messages_section)
        prompt = f"""<|begin_of_text|>{system}{messages_section}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
        return prompt
    
    def inference_model( messages, documents, access_token ):
        prompt = format_input(messages, documents)
        model_id = "meta-llama/llama-3-70b-instruct"
        parameters = {
            "decoding_method": "greedy",
            "max_new_tokens": 900,
            "repetition_penalty": 1
        }
        inference_credentials = {
            "url": params["credentials"].get("url"),
            "token": access_token
        }
        model = Model(
            model_id = model_id,
            params = parameters,
            credentials = inference_credentials,
            space_id = params["space_id"]
        )
         # Generate grounded response
        generated_response = model.generate_text(prompt=prompt, guardrails=False)
        return generated_response

    def execute( payload ):
        messages = payload.get("input_data")[0].get("values")[0]
        access_token = payload.get("input_data")[0].get("values")[1][0]
 
        # Proximity search
        search_result = proximity_search(messages[-1].get("content"))
        
        # Grunded inferencing
        generated_response = inference_model(messages, search_result["documents"], access_token)
        
        execute_response = {
            "predictions": [{"fields": ["Proximity search result", "Generated response"],
                             "values": [search_result["results"], generated_response]
                             }]
        }
        return execute_response

    return execute


### 2.2 Test locally

In [None]:
# Initialize deployable chain function locally

local_function = my_deployable_chain_function()
messages = []

In [None]:
# Chose to retain the history
retain_history = False
if (retain_history == False):
    messages = []

local_question = "Change this question to test your function"

messages.append({ "role" : "user", "content": local_question })

func_result = local_function({"input_data": [{"fields": [ "Search" "access_token"],
                                                              "values": [messages, [get_bearer_token()]]
                                                             }
                                                            ]
                                             })

response = func_result["predictions"][0]["values"][1]
messages.append({"role": "assistant", "content": response })
print(func_result)

## 3. Store and deploy the function
Before you can deploy the function, you must store the function in your watsonx.ai repository.

In [None]:
# Look up software specification for the deployable function

software_spec_uid = client.software_specifications.get_id_by_name("vector-index-genai-flow-v2-software-specification-memory")

if (software_spec_uid == "Not Found"):
    software_spec_uid = client.spaces.promote("dc64d3c6-0bf5-4477-9b56-14504753f82c", "b43a5e28-8c49-4cc1-94e5-ce92969f6217", "a06ae04e-5905-41af-b9cd-f097e02e1f63")

In [None]:
# Store the function in the repository

function_metadata = {
    client.repository.FunctionMetaNames.NAME: "test notebook",
    client.repository.FunctionMetaNames.DESCRIPTION: "",
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: software_spec_uid
}

function_details = client.repository.store_function(meta_props=function_metadata, function=my_deployable_chain_function)

In [None]:
# Get published function ID

function_id = client.repository.get_function_id(function_details)

In [None]:
# Deploy the stored function

deployment_metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "test notebook",
    client.deployments.ConfigurationMetaNames.DESCRIPTION: "",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: { "name": "S" }
}

function_deployment_details = client.deployments.create(function_id, meta_props=deployment_metadata)

## 4. Test deployed function

In [None]:
# Get the endpoint URL of the function deployment just created

function_deployment_id = client.deployments.get_uid(function_deployment_details)
function_deployment_endpoint_url = client.deployments.get_scoring_href(function_deployment_details)
print(function_deployment_id)
print(function_deployment_endpoint_url)

In [None]:
messages = []
remote_question = "Change this question to test your function"
messages.append({ "role" : "user", "content": remote_question })
payload = {"input_data": [{"fields": [ "Search", "access_token" ],
                            "values": [messages, [get_bearer_token()]]
                            }
                        ]
}

In [None]:
result = client.deployments.score(function_deployment_id, payload)
if "error" in result:
    print(result["error"])
else:
    print(result)

# Next steps
You successfully deployed and tested the RAG chain function! You can now view
your deployment and test it as a REST API endpoint.

<a id="copyrights"></a>
### Copyrights

Licensed Materials - Copyright © 2024 IBM. This notebook and its source code are released under the terms of the ILAN License.
Use, duplication disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

**Note:** The auto-generated notebooks are subject to the International License Agreement for Non-Warranted Programs (or equivalent) and License Information document for watsonx.ai Auto-generated Notebook (License Terms), such agreements located in the link below. Specifically, the Source Components and Sample Materials clause included in the License Information document for Watson Studio Auto-generated Notebook applies to the auto-generated notebooks.  

By downloading, copying, accessing, or otherwise using the materials, you agree to the <a href="https://www14.software.ibm.com/cgi-bin/weblap/lap.pl?li_formnum=L-AMCU-BYC7LF" target="_blank">License Terms</a>  