# Implement the Retrieval for a Retrieval Augmented Generation (RAG) Use Case

Now that you have all your context information stored in the SAP HANA Cloud Vector Store, you can start asking the LLM questions about the orchestration service of Generative AI Hub. This time, the model will not respond from its knowledge base—what it learned during training—but instead, the retriever will search for relevant context information in your vector database and send the appropriate text chunk to the LLM to review before responding.

👉 Change the `LLM_DEPLOYMENT_ID` in [variables.py](variables.py) to your deployment ID from exercise [01-explore-genai-hub](01-explore-genai-hub.md). For that go to **SAP AI Launchpad** application and navigate to **ML Operations** > **Deployments**.

☝️ The `LLM_DEPLOYMENT_ID` is the deployment ID of the chat model you want to use e.g. **gpt-4o-mini**.

In [1]:
# import init_env
# import variables

#init_env.set_environment_variables()

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain_community.vectorstores.hanavector import HanaDB
from hdbcli import dbapi

You are again connecting to our shared SAP HANA Cloud Vector Engine.

In [2]:
# # connect to HANA instance
# connection = init_env.connect_to_hana_db()
# connection.isconnected()

In [3]:
def get_hana_connection():
    conn = dbapi.connect(
        address='ec41b786-96de-467b-9ff5-db725945f89c.hna0.prod-us10.hanacloud.ondemand.com',
        port='443',
        user='DBADMIN',
        password='9hEW4UK86Fdt',
        encrypt=True
    )
    return conn

In [4]:
import json
import os
from ai_core_sdk.ai_core_v2_client import AICoreV2Client
# Inline credentials
with open('creds.json') as f:
    credCF = json.load(f)
 
# Set environment variables
def set_environment_vars(credCF):
    env_vars = {
        'AICORE_AUTH_URL': credCF['url'] + '/oauth/token',
        'AICORE_CLIENT_ID': credCF['clientid'],
        'AICORE_CLIENT_SECRET': credCF['clientsecret'],
        'AICORE_BASE_URL': credCF["serviceurls"]["AI_API_URL"] + "/v2",
        'AICORE_RESOURCE_GROUP': "grounding"
    }
 
    for key, value in env_vars.items():
        os.environ[key] = value
        print(value)
 
# Create AI Core client instance
def create_ai_core_client(credCF):
    set_environment_vars(credCF)  # Ensure environment variables are set
    return AICoreV2Client(
        base_url=os.environ['AICORE_BASE_URL'],
        auth_url=os.environ['AICORE_AUTH_URL'],
        client_id=os.environ['AICORE_CLIENT_ID'],
        client_secret=os.environ['AICORE_CLIENT_SECRET'],
        resource_group=os.environ['AICORE_RESOURCE_GROUP']
    )
 
ai_core_client = create_ai_core_client(credCF)

https://israel-fsvdxbsq.authentication.eu11.hana.ondemand.com/oauth/token
sb-49ec08a9-d325-4480-9418-ad8801558203!b28574|aicore!b18
96d2ba69-3289-4190-ad82-c174e50f9f17$8C_adlgCYD6AscPgIKtLXJkIj1AL6i8p9Opw1JJZ0o8=
https://api.ai.prodeuonly.eu-central-1.aws.ml.hana.ondemand.com/v2
grounding


In [5]:
EMBEDDING_DEPLOYMENT_ID = "d1ae8a45e3d60115"

In [6]:
# Reference which embedding model, DB connection and table to use
embeddings = OpenAIEmbeddings(deployment_id=EMBEDDING_DEPLOYMENT_ID)
db = HanaDB(
    embedding=embeddings, connection=get_hana_connection(), table_name="EMBEDDINGS_CODEJAM_"+"Testing"
)

In this step you are defining which LLM to use during the retrieving process. You then also assign which database to retrieve information from. 

In [7]:
LLM_DEPLOYMENT_ID = "d6dd6e483c92f88c"

In [8]:
#Need to fix with the name mentioned in deployement id
# Define which model to use
chat_llm = ChatOpenAI(deployment_id=LLM_DEPLOYMENT_ID)

# Create a retriever instance of the vector store
retriever = db.as_retriever(search_kwargs={"k": 2})

👉 Instead of sending the query directly to the LLM, you will now create a `RetrievalQA` instance and pass both the LLM and the database to be used during the retrieval process. Once set up, you can send your query to the `Retriever`.

👉 Try out different queries. Feel free to ask anything you'd like to know about the Models that are available in Generative AI Hub.

In [9]:
# Create the QA instance to query llm based on custom documents
qa = RetrievalQA.from_llm(llm=chat_llm, retriever=retriever, return_source_documents=True)

# Send query
query = "What is data masking in the orchestration service?"

answer = qa.invoke(query)
display(answer)

{'query': 'What is data masking in the orchestration service?',
 'result': "I don't know.",
 'source_documents': [Document(metadata={'producer': 'Skia/PDF m131', 'creator': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36', 'creationdate': '2025-02-02T21:04:28+00:00', 'title': 'Orchestration Service | Generative AI hub SDK 4.1.1 documentation', 'moddate': '2025-02-02T21:04:28+00:00', 'source': 'documents\\Orchestration_Service_Generative_AI_hub_SDK.pdf', 'total_pages': 14, 'page': 0, 'page_label': '1'}, page_content='/Examples/Orchestration Service\nOrchestration Service\nThis notebook demonstrates how to use the SDK to interact with the Orchestration\nService, enabling the creation of AI-driven workflows by seamlessly integrating various\nmodules, such as templating, large language models (LLMs), data masking and content\nfiltering. By leveraging these modules, you can build complex, automated workflows that\nenhanc

In [10]:
for document in answer['source_documents']:
    display(document.metadata)   
    print(document.page_content)

{'producer': 'Skia/PDF m131',
 'creator': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
 'creationdate': '2025-02-02T21:04:28+00:00',
 'title': 'Orchestration Service | Generative AI hub SDK 4.1.1 documentation',
 'moddate': '2025-02-02T21:04:28+00:00',
 'source': 'documents\\Orchestration_Service_Generative_AI_hub_SDK.pdf',
 'total_pages': 14,
 'page': 0,
 'page_label': '1'}

/Examples/Orchestration Service
Orchestration Service
This notebook demonstrates how to use the SDK to interact with the Orchestration
Service, enabling the creation of AI-driven workflows by seamlessly integrating various
modules, such as templating, large language models (LLMs), data masking and content
filtering. By leveraging these modules, you can build complex, automated workflows that
enhance the capabilities of your AI solutions. For more details on configuring and using


{'producer': 'Skia/PDF m131',
 'creator': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
 'creationdate': '2025-02-02T21:04:28+00:00',
 'title': 'Orchestration Service | Generative AI hub SDK 4.1.1 documentation',
 'moddate': '2025-02-02T21:04:28+00:00',
 'source': 'documents\\Orchestration_Service_Generative_AI_hub_SDK.pdf',
 'total_pages': 14,
 'page': 13,
 'page_label': '14'}

https://help.sap.com/doc/generative-ai-hub-sdk/CLOUD/en-US/_reference/orchestration-service.html 14/14


👉 Go back to [05-store-embeddings-hana](05-store-embeddings-hana.ipynb) and try out different chunk sizes and/or different values for overlap. Store these chunks in a different table by adding a new variable to [variables.py](variables.py) and run this script again using the newly created table.

[Next exercise](07-deploy-orchestration-service.md)