# Implement the Retrieval for a Retrieval Augmented Generation (RAG) Use Case

Now that you have all your context information stored in the SAP HANA Cloud Vector Store, you can start asking the LLM questions about SAP AI Services. This time, the model will not respond from its knowledge base—what it learned during training—but instead, the retriever will search for relevant context information in your vector database and send the appropriate text chunk to the LLM to review before responding.

In [54]:
import os
import json

with open('/home/user/projects/generative-ai-codejam/.aicore-config.json', 'r') as config_file:
    config_data = json.load(config_file)

os.environ["AICORE_AUTH_URL"]=config_data["url"]+"/oauth/token"
os.environ["AICORE_CLIENT_ID"]=config_data["clientid"]
os.environ["AICORE_CLIENT_SECRET"]=config_data["clientsecret"]
os.environ["AICORE_BASE_URL"]=config_data["serviceurls"]["AI_API_URL"]

# Change the value of the resource group to yours
os.environ["AICORE_RESOURCE_GROUP"]="team-sap"

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings

from langchain.chains import RetrievalQA

from langchain_community.vectorstores.hanavector import HanaDB
from hdbcli import dbapi

import configparser
import variables

👉 SET the `EMBEDDING_TABLE` to `"EMBEDDINGS_SAP_AI_SERVICES_>add your name here<"` like in the previous exercise.

You are again connecting to our shared SAP HANA Cloud Vector Engine.

In [55]:
config = configparser.ConfigParser()
config.read('/home/user/projects/generative-ai-codejam/.user.ini')
connection = dbapi.connect(
    address=config.get('hana', 'url'), 
    port=config.get('hana', 'port'), 
    user=config.get('hana', 'user'),
    password=config.get('hana', 'passwd'),
    autocommit=True,
    sslValidateCertificate=False
)

In [56]:
# Create embeddings for custom documents
embeddings = OpenAIEmbeddings(deployment_id=variables.EMBEDDING_DEPLOYMENT_ID)
db = HanaDB(
    embedding=embeddings, connection=connection, table_name=variables.EMBEDDING_TABLE
)

In this step you are defining which LLM to use during the retrieving process. You then also assign which database to retrieve information from. 

In [57]:
# Define which model to use
chat_llm = ChatOpenAI(deployment_id=variables.LLM_DEPLOYMENT_ID)

# Create a retriever instance of the vector store
retriever = db.as_retriever(search_kwargs={"k": 2})

👉 Instead of sending the query directly to the LLM, you will now create a `RetrievalQA` instance and pass both the LLM and the database to be used during the retrieval process. Once set up, you can send your query to the `Retriever`.

👉 Try out different queries. Feel free to ask anything you'd like to know about the SAP AI Service – Data Attribute Recommendation.

In [58]:
# Create the QA instance to query llm based on custom documents
qa = RetrievalQA.from_llm(llm=chat_llm, retriever=retriever, return_source_documents=True)

# Send query
query = "What is the machine learning model behind the regression model template of Data Attribute Recommendation?"

answer=qa.invoke(query)

In [59]:
display(answer)

{'query': 'What is the machine learning model behind the regression model template of Data Attribute Recommendation?',
 'result': 'The machine learning model behind the regression model template of Data Attribute Recommendation is a generic neural network that seeks to minimize the mean squared error (MSE). However, the specific architecture and preprocessing rules of the neural network are not detailed in the provided context.',
 'source_documents': [Document(metadata={'source': 'documents/SAP-Help-Data-Attribute-Recommendation.pdf', 'page': 59}, page_content='10.3.1\xa0\xa0Model Templates\nEach model template consists of a machine learning pipeline, where specific  data preprocessing rules and\nmodel architectures are defined.  The model templates for Data Attribute Recommendation are generic\ntechnical components that can be used in a variety of use cases, covering classification  and regression\nscenarios. See below the available classification  and regression model templates.\nNam

In [60]:
for document in answer['source_documents']:
    display(document.metadata)   
    print(document.page_content)

{'source': 'documents/SAP-Help-Data-Attribute-Recommendation.pdf', 'page': 59}

10.3.1  Model Templates
Each model template consists of a machine learning pipeline, where specific  data preprocessing rules and
model architectures are defined.  The model templates for Data Attribute Recommendation are generic
technical components that can be used in a variety of use cases, covering classification  and regression
scenarios. See below the available classification  and regression model templates.
Name ID Description
AutoML 188df8b2-795a-48c1-8297
-37f37b25ea00A set of generic and traditional machine learning models for sin-
gle-label, multi-class classification  tasks. This template automati-
cally starts several experiments and searches for the best data
preparation and machine learning algorithms for a given dataset
within the defined  algorithm space. Use this model template for
small to medium sized classification  tasks.
See also the blog post How does AutoML work in Data Attribute
Recommendation
 .
Generic 223abe0f-3b52-446f-927
3-f3ca39619d2cGeneric neural netw

{'source': 'documents/SAP-Help-Data-Attribute-Recommendation.pdf', 'page': 3}

1 What Is Data Attribute Recommendation?
Apply machine learning to predict and classify data records.
Data Attribute Recommendation uses free text, numbers and categories as input to classify entities such as
products, stores and users into multiple classes and also to predict the value of missing numerical attributes
in your data records. Y ou can use Data Attribute Recommendation, for example, to classify incoming product
information and predict the price of commodities based on their description.
With Data Attribute Recommendation you can:
•Automate and speed up data management processes
•Reduce errors and manual efforts  in data maintenance
•Increase data consistency and accuracy
Features
Manage training dataPerform tasks related to the dataset that will be used to train the machine
learning model.
Manage machine learning
modelPerform tasks related to the machine learning model that will be used to classify
entities and predict the missing attributes in your data records.
Classify 

👉 Go back to [06-store-embeddings-hana](06-store-embeddings-hana.ipynb) and try out different chunk sizes or different values for overlap. Store these chunks in a different table by adding a new variable to [variables.py](variables.py) and run this script again using the newly created table.

[Next exercise](08-semantic-chunking.ipynb)