### Retrieval Augmented Generation (RAG) PoC with RestAPIs and SAP HANA Vector DB


#### Pre-requisite:

Use the secrets folder to store your service key credentials. Credentials required for:  
* Access to the SAP GenAI XL
* Access to the HanaDB

This guide does not illustrate how to generate embeddings using the AI Core proxy embedding model. This part is coverd in the other notebooks: 

* Generate-and-store-embeddings_with-HanaDB-AICore-RestAPI.ipynb
* Generate-and-store-embeddings_with-HanaDB-AICore-PythonSDK.ipynb

#### Introduction: 
In this guide we will be using a dataset that already includes the embeddings in the 'VECTOR_STR' column which has been generated for the 'TEXT' column using the text-embedding-ada-002 model. 
We will store these embeggings as REAL_VECTORS inside the SAP HANA DB and use the vector search functionality to build a Retrieval Augmented Generation (RAG) usecase with AI Core proxy LLMs. 

#### Step-by-step guide:
* Loading your data from csv 
* Connection with Hana database
* Create a new Hana table and push data into it
* Add a new column of data type REAL_VECTOR to your data table 
* Use the TO_REAL_VECTOR function to convert the embeggings to Real Vectors. (This is necessary for the HANA DB to understand the embeddings.) and Update the data table.
* Connection with AI Core proxy LLMs through REST API
* Levarage the similarity search functions which HANA DB offers for retreving relevant context based on a query.
* The context can then be used to formulate a prompt which is fed to an AI Core proxy Chat LLM.

In [3]:
# !pip install pandas
# !pip install hana_ml
# !pip install "sap-llm-commons[all]" see https://github.tools.sap/AI-Playground-Projects/llm-commons
# !pip install langchain
# add config.json according to https://github.tools.sap/AI-Playground-Projects/llm-commons/tree/main/docs/proxy

#### 1. Loading your data

In [2]:
# import some vector data from csv
import pandas as pd
import hana_ml
df = pd.read_csv('./data/GRAPH_DOCU_QRC3.csv', low_memory=False)
df.head(3)

ModuleNotFoundError: No module named 'shapely'


Unnamed: 0,ID,L1,L2,L3,FILENAME,HEADER1,HEADER2,TEXT,VECTOR_STR
0,273,90,40,0,090-040-000-Appendix_C_-_GraphScript_Cheat_She...,Appendix C - GraphScript Cheat Sheet,Weighted Path Functions,<!--! subsection -->\n### WEIGHT \n```graphsc...,"[0.015699435,0.020284351,0.0003677337,-0.00413..."
1,52,60,20,30,060-020-030-Basic_Vertex_Operations.md,Basic Vertex Operations,DEGREE,Returns the number of incoming and outgoing ed...,"[0.018821003,0.012627394,-0.007940338,-0.00959..."
2,44,60,20,20,060-020-020-Basic_Graph_Operations.md,Basic Graph Operations,EDGES,Returns all edges in a graph. \n- EDGES(GRAPH...,"[-0.013607875,0.009249507,-0.03403819,-0.03394..."


#### 2. Connection to HANA Database

In [3]:
import json
with open('data/secrets/ies-hana-vectordb-schema-poc-sk.json', 'r') as f:
    hana_service_key = json.load(f)

In [4]:
from hana_ml import ConnectionContext

# cc = ConnectionContext(userkey='VDB_BETA', encrypt=True)
cc= ConnectionContext(
    address=hana_service_key['host'],
    port=hana_service_key['port'],
    user=hana_service_key['user'],
    password=hana_service_key['password'],
    currentSchema=hana_service_key['schema'],
    encrypt=True
    )
print(cc.hana_version())
print(cc.get_current_schema())

4.00.000.00.1710842063 (CE2024.10)
USR_5SKS2ZNTSKBBRAFPULSZIT6NR


#### 3. Create a new Hana table and push data into it

In [5]:
# Create a table
cursor = cc.connection.cursor()
sql_command = '''CREATE TABLE GRAPH_DOCU_QRC3(ID BIGINT, L1 NVARCHAR(3), L2 NVARCHAR(3), L3 NVARCHAR(3), FILENAME NVARCHAR(100), HEADER1 NVARCHAR(5000), HEADER2 NVARCHAR(5000), TEXT NCLOB, VECTOR_STR NCLOB);'''
cursor.execute(sql_command)
cursor.close()

In [6]:
# Upload data into the hana table
from hana_ml.dataframe import create_dataframe_from_pandas
v_hdf = create_dataframe_from_pandas(
    connection_context=cc,
    pandas_df=df,
    table_name="GRAPH_DOCU_QRC3",
    allow_bigint=True,
    append=True
    )

100%|██████████| 1/1 [00:00<00:00,  2.22it/s]


#### 4. Add a new column of data type REAL_VECTOR to your data table ((Let us call this column: VECTOR))

In [7]:
# Add REAL_VECTOR column
cursor = cc.connection.cursor()
sql_command = '''ALTER TABLE GRAPH_DOCU_QRC3 ADD (VECTOR REAL_VECTOR(1536));'''
cursor.execute(sql_command)
cursor.close()

#### 5. Use the TO_REAL_VECTOR function to convert the embeggings to Real Vectors and Update the data table.

In [8]:
# Create vectors from embedding strings
cursor = cc.connection.cursor()
sql_command = '''UPDATE GRAPH_DOCU_QRC3 SET VECTOR = TO_REAL_VECTOR(VECTOR_STR);'''
cursor.execute(sql_command)
cursor.close()

#### 6. Connection with AI Core proxy LLMs through REST API 

In [9]:
# Read ML deployed model
with open('data/secrets/genai-xl-test-instance.json') as f:
    sk = json.load(f)

In [10]:
import requests
import os

response = requests.post(
    sk['url']+"/oauth/token",
    data={"grant_type": "client_credentials"},
    auth=(sk['clientid'], sk['clientsecret']),
    timeout=8000,
)
auth_token = response.json()["access_token"]

In [11]:
# Defining the get_embedding function for Embedding model to retrieve embeddings for the input

def get_embedding(input: str, embedder_deployment_id: str, model="text-embedding-ada-002") -> str:
    # Preparing the input for inference

    test_input = {
        "model" : "text-embedding-ada-002",
        "input" : input
    }

    deployment_url_prefix = 'https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/'
    deployment_url = deployment_url_prefix + embedder_deployment_id
    endpoint = f"{deployment_url}/embeddings?api-version=2023-05-15" # endpoint implemented in serving engine
    print(endpoint)
    headers = {"Authorization": f"Bearer {auth_token}",
               'ai-resource-group': embedder_deployment_id,
               "Content-Type": "application/json"}
    response = requests.post(endpoint, headers=headers, json=test_input)
    response_json = response.json()
    print(response_json)
    return response_json['data'][0]['embedding']

In [12]:
# Defining the get_response function for Chat model to retrieve chat response

def get_response(input: str, chat_deployment_id: str, model="gpt-35-turbo") -> str:
    # Preparing the input for inference
    test_chat_input = {
        "model" : model,
        "messages" : [{ "content": input, 
                       "role": "user"}],
        "temperature": 0.,
        "max_tokens": 250,
    }

    deployment_url_prefix = 'https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/'
    deployment_url2 = deployment_url_prefix + chat_deployment_id
    endpoint = f"{deployment_url2}/chat/completions?api-version=2023-05-15" # endpoint implemented in serving engine
    print(endpoint)
    headers = {"Authorization": f"Bearer {auth_token}",
               'ai-resource-group': chat_deployment_id,
               "Content-Type": "application/json"}

    chat_response = requests.post(endpoint, headers=headers, json=test_chat_input)
    chat_response_json = chat_response.json()
    print(chat_response_json)
    return chat_response_json['choices'][0]['message']['content']

#### 7. Levarage the similarity search functions which HANA DB offers for retreving relevant context based on a query.

SAP HANA VectrDB provides two distance calculating similarity search functions: L2Distance() and cosine_similarity(), to enhance the platform's capability to compute vector similarity.

In [13]:
# Wrapping HANA vector search in a function: Here we are using the Cosine Similarity as the Similarity Search function. 
def run_vector_search(query: str, embedding_deployment_id: str, metric="COSINE_SIMILARITY", k=4):
    if metric == 'L2DISTANCE':
        sort = 'ASC'
    else:
        sort = 'DESC'
    query_vector = get_embedding(input=query, embedder_deployment_id=embedding_deployment_id)
    sql = '''SELECT TOP {k} "ID", "HEADER1", "HEADER2", "TEXT"
        FROM "GRAPH_DOCU_QRC3"
        ORDER BY "{metric}"("VECTOR", TO_REAL_VECTOR('{qv}')) {sort}'''.format(k=k, metric=metric, qv=query_vector, sort=sort)
    hdf = cc.sql(sql)
    df_context = hdf.head(k).collect()
    # context = ' '.join(df_context['TEXT'].astype('string'))
    return df_context

In [14]:
# Test the vector search
query = "How can I run a shortest path algorithm?"
embedding_deployment_id = 'da6a26f83dc8e241'
df_context = run_vector_search(query=query, embedding_deployment_id=embedding_deployment_id, metric="COSINE_SIMILARITY",k=4)
df_context

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/da6a26f83dc8e241/embeddings?api-version=2023-05-15
{'data': [{'embedding': [-0.008959918, 0.014933197, -0.014513645, -0.0027146419, -0.004817734, 0.02847263, -0.012430109, -0.02269846, -0.017706506, -0.018645164, 0.020024706, 0.014108316, 0.015004308, 0.0025617545, 0.015374082, 0.011064788, 0.01447809, -0.00036044116, 0.018090501, 0.025173103, -0.024134891, 0.0135963205, 0.023964226, -0.025713544, -0.00095910236, 0.0115270065, 0.026893977, -0.0136532085, 0.022300242, -0.018161612, 0.04838356, -0.016028298, -0.02861485, -0.018574053, -0.0054968386, 0.023452232, 0.02392156, -0.006154611, 0.020892255, -0.012444331, 0.006837271, 0.0035750784, -0.0050239544, -0.006979492, 0.00069821585, 0.013830985, 0.0115270065, -0.01352521, -0.021361584, 0.008440812, -0.01053146, 0.028017523, -0.03071972, -0.009393692, -0.011320786, 0.0068479376, 0.012010558, -0.0053901733, -0.00062754983, 0.017692283, 0.01807628, -0.001038

Unnamed: 0,ID,HEADER1,HEADER2,TEXT
0,211,Complex GraphScript Examples,GraphScript Procedure Example,The following example depicts a more complex e...
1,90,Graph Traversal Statements,Dijkstra's Algorithm (DIJKSTRA),DIJKSTRA searches for shortest paths in a weig...
2,83,Built-In Graph Algorithms,Shortest Path,```bnf\n<sssp_function> ::= SHORTEST_PATH '(' ...
3,65,Basic Weighted Path Operations,(Constructors),WEIGHTEDPATH objects can’t be constructed dire...


#### 8. Using langchain framework for prompt templates

In [15]:
# Prompt. Do also use your knowledge from outside the given context.
promptTemplate_fstring = """
You are an SAP HANA Cloud expert.
You are provided multiple context items that are related to the prompt you have to answer.
Use the following pieces of context to answer the question at the end.

Context:
{context}

Question:
{query}
"""

In [16]:
from langchain.prompts import PromptTemplate
promptTemplate = PromptTemplate.from_template(promptTemplate_fstring)

#### 9. The context can then be used to formulate a prompt which is fed to an AI Core proxy Chat LLM.

In [17]:
# The ask_llm function takes in the user query and converts them to embeddings first. Then a vector search is performed using the chosen metric and a context is retrieved.
# The context is then leveraged to create a prompt using the langchains PrompTemplate class. The propt is then fed as input to a chat completion LLM which provides relevant response.

def ask_llm(query: str, embedding_deployment_id: str, chat_deployment_id: str, retrieval_augmented_generation: bool, metric='COSINE_SIMILARITY', k=4) -> str:

    class color:
        RED = '\033[91m'
        BLUE = '\033[94m'
        BOLD = '\033[1m'
        END = '\033[0m'
    context = ''
    if retrieval_augmented_generation == True:
        print(color.RED + 'Running retrieval augmented generation.' + color.END)
        print(color.RED + '\nEmbedding the query string and running HANA vector search.' + color.END)
        context = run_vector_search(query, embedding_deployment_id, metric, k)
        print(context)
        print(color.RED + '\nHANA vector search returned {k} best matching documents.'.format(k=k) + color.END)
        print(color.RED + '\nGenerating LLM prompt using the context information.' + color.END)
    else:
        print(color.RED + 'Generating LLM prompt WITHOUT context information.' + color.END)
    prompt = promptTemplate.format(query=query, context=' '.join(df_context['TEXT'].astype('string')))
    #prompt = promptTemplate.format(query=query, context=context)    
    print(color.RED + '\nAsking LLM...' + color.END)

    response = get_response(input=prompt, chat_deployment_id=chat_deployment_id, model="gpt-35-turbo")
    
    print(color.RED + '...completed.' + color.END)
    print(color.RED + '\nQuery: ' + color.END, query)
    print(color.BLUE + '\nResponse:' + color.BLUE)
    print(response)

In [18]:
embedding_deployment_id = 'da6a26f83dc8e241'
chat_deployment_id = 'deac9533e2d3dc51'

In [19]:
# query = "Can you define a HANA graph workspace on a JSON document store collection?"
#query = "How can I define a HANA graph workspace on a JSON document store collection?"
#query = "How do you run a shortest path algorithm in SAP HANA Graph engine?"
# query = "How can I run community detection Louvain in SAP HANA Graph?"
# query = "How can I run a BFS traversal in HANA Cloud"
query = "I want to calculate a shortest path. How do I do that?"

response = ask_llm(query=query, 
                   embedding_deployment_id=embedding_deployment_id, 
                   chat_deployment_id=chat_deployment_id, 
                   retrieval_augmented_generation=True
                   )

[91mRunning retrieval augmented generation.[0m
[91m
Embedding the query string and running HANA vector search.[0m
https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/da6a26f83dc8e241/embeddings?api-version=2023-05-15
{'data': [{'embedding': [0.002383484, -0.0016379196, -0.00037755084, -0.004939468, -0.001483665, 0.010403071, -0.0080809565, -0.02541056, -0.006750717, -0.01759499, 0.01601595, 0.019731333, -0.005131872, -0.0015160086, 0.018218642, 0.008087591, 0.012453165, -0.006110477, 0.011451338, 0.02373864, -0.034367286, 0.00955384, 0.027016137, -0.018218642, -0.0033936035, 0.013534606, 0.026113829, -0.0053342273, 0.02234537, -0.019253641, 0.031129595, -0.014012299, -0.02489306, -0.03330575, -0.016347682, 0.017462296, 0.017223451, -0.007696149, 0.0058185537, -0.019664988, 0.008565283, 0.004528122, -0.006564948, -0.0020550708, 0.001199206, 0.005012449, -0.0046077375, -0.024494985, -0.020779602, 0.01201528, -0.0030054788, 0.019532295, -0.0406569, -0.01