# Store Embeddings for a Retrieval Augmented Generation (RAG) Use Case
RAG is especially useful for question answering use cases that involve a lot of unstructured documents with important information. Let's implement a RAG use case so that next time we ask about an **SAP AI Service** we get the correct response! Therefore, we need to vectorize our context documents. You can find the documents to vectorize and store as embeddings in **SAP HANA Cloud Vector Engine** in the directory **documents**.

## LangChain
The **Generative AI Hub Python SDK** is compatible with the [**LangChain**](https://python.langchain.com/v0.1/docs/get_started/introduction) **library**. **LangChain** is a tool for building apps that use large language models, like GPT models. It's useful because it helps manage and link different models, tools and data, making it easier to create complex AI workflows.

In [None]:
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader

from langchain_community.vectorstores.hanavector import HanaDB
from hdbcli import dbapi

import configparser

👉 Change the **EMBEDDING_DEPLOYMENT_ID** to your deployment ID from exercise [01-deploy-model](01-deploy-model.md).

👉 SET the **EMBEDDING_TABLE** to **"EMBEDDINGS_SAP_AI_SERVICES_>add your name here<"**

In [None]:

EMBEDDING_DEPLOYMENT_ID = "d7b8e46fc3d5c25f"
EMBEDDING_TABLE = ""

👉 Create a **.user.ini** file with the HANA login information.

In [None]:
config = configparser.ConfigParser()
config.read('.user.ini')
connection = dbapi.connect(
    address=config.get('hana', 'url'), 
    port=config.get('hana', 'port'), 
    user=config.get('hana', 'user'),
    password=config.get('hana', 'passwd'),
    autocommit=True,
    sslValidateCertificate=False
)

## Chunking of the documents
Before we can create embeddings for our documents, we need to chunk them up into smaller text pieces, so called **chunks**. We are using the **Recursive Character Text Splitter** from **langchain** which will chunk the documents into chunks of size 500 using the default characters to split: ["\n\n", "\n", " ", ""].

In [None]:

# Load custom documents
loader = TextLoader('documents/SAP-Help-Data-Attribute-Recommendation.pdf')
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(f"Number of document chunks: {len(texts)}")


Now we can connect to our **SAP HANA Cloud Vector Engine** and store the embeddings for our text chunks.

In [None]:
# Create embeddings for custom documents
embeddings = OpenAIEmbeddings(deployment_id=EMBEDDING_DEPLOYMENT_ID)
db = HanaDB(
    embedding=embeddings, connection=connection, table_name=EMBEDDING_TABLE
)

# Delete already existing documents from the table
db.delete(filter={})

# add the loaded document chunks
db.add_documents(texts)

## Check the embeddings in SAP HANA Cloud Vector Engine

👉 Open the [HANA Cloud Database Explorer](https://central-hana-cloud-instance-us-77belmsm.hana-tooling.ingress.orchestration.prod-us10.hanacloud.ondemand.com/hrtt/sap/hana/cst/catalog/cockpit-index.html?target=sqlconsole&databaseid=C2366),

👉 log in with the **Default Identity Provider** and the **HANA user and password** provided by the instructor

👉 and run the following query:

```sql
SELECT VEC_TEXT, VEC_META, TO_NVARCHAR(VEC_VECTOR) FROM EMBEDDINGS_SHAWKING
```
![HANA Vector Engine](images/select_embedding_table.png)
