# Implement the Retrieval for a Retrieval Augmented Generation (RAG) Use Case

Now that you have all your context information stored in the SAP HANA Cloud Vector Store, you can start asking the LLM questions about SAP AI Services. This time, the model will not respond from its knowledge base—what it learned during training—but instead, the retriever will search for relevant context information in your vector database and send the appropriate text chunk to the LLM to review before responding.

In [None]:
import init_env
import variables

init_env.set_environment_variables()

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings

from langchain.chains import RetrievalQA

from langchain_community.vectorstores.hanavector import HanaDB

👉 SET the `EMBEDDING_TABLE` to `"EMBEDDINGS_CODEJAM_>add your name here<"` like in the previous exercise.

You are again connecting to our shared SAP HANA Cloud Vector Engine.

In [None]:
# connect to HANA instance
connection = init_env.connect_to_hana_db()
connection.isconnected()

In [None]:
# Create embeddings for custom documents
embeddings = OpenAIEmbeddings(deployment_id=variables.EMBEDDING_DEPLOYMENT_ID)
db = HanaDB(
    embedding=embeddings, connection=connection, table_name=variables.EMBEDDING_TABLE
)

In this step you are defining which LLM to use during the retrieving process. You then also assign which database to retrieve information from. 

In [None]:
# Define which model to use
chat_llm = ChatOpenAI(deployment_id=variables.LLM_DEPLOYMENT_ID)

# Create a retriever instance of the vector store
retriever = db.as_retriever(search_kwargs={"k": 2})

👉 Instead of sending the query directly to the LLM, you will now create a `RetrievalQA` instance and pass both the LLM and the database to be used during the retrieval process. Once set up, you can send your query to the `Retriever`.

👉 Try out different queries. Feel free to ask anything you'd like to know about the Models that are available in Generative AI Hub.

In [None]:
# Create the QA instance to query llm based on custom documents
qa = RetrievalQA.from_llm(llm=chat_llm, retriever=retriever, return_source_documents=True)

# Send query
query = "What is the rate limit for the gpt-4o-mini model via Generative AI Hub on SAP AI Core?"

answer = qa.invoke(query)
display(answer)

In [None]:
for document in answer['source_documents']:
    display(document.metadata)   
    print(document.page_content)

👉 Go back to [05-store-embeddings-hana](05-store-embeddings-hana.ipynb) and try out different chunk sizes and/or different values for overlap. Store these chunks in a different table by adding a new variable to [variables.py](variables.py) and run this script again using the newly created table.

[Next exercise](07-use-multimodal-models.ipynb)