## Semantic chunker
Now do the same thing using the `Semantic Chunker`. You will store the embeddings in a different table by adding `+"_SEMANTIC"` to the table name. LangChains implementation of the `Semantic Chunker` is based on [Greg Kamradt's work](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb), there you can also find more information on the `Semantic Chunker` and other chunking techniques.

👉 For the semantic chunker you need to install a new package: [langchain-experimental](https://pypi.org/project/langchain-experimental/). You already installed this package in exercise 03.


In [1]:
import init_env
import variables

init_env.set_environment_variables()

from langchain_community.document_loaders import PyPDFDirectoryLoader
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from langchain_experimental.text_splitter import SemanticChunker

from langchain_community.vectorstores.hanavector import HanaDB

from hdbcli import dbapi
import configparser

In [2]:
# Connect to HANA
connection = init_env.connect_to_hana_db()

In [3]:
# Load custom documents
loader = PyPDFDirectoryLoader('documents/')
documents = loader.load()

In [4]:
# embedding instance is used during semantic chunking
embeddings = OpenAIEmbeddings(deployment_id=variables.EMBEDDING_DEPLOYMENT_ID)

In [None]:
# create semantic text chunks
text_splitter = SemanticChunker(embeddings)
texts = text_splitter.split_documents(documents)
print(f"Number of document chunks: {len(texts)}")

In [None]:

db = HanaDB(
    embedding=embeddings, connection=connection, table_name=variables.SEMANTIC_EMBEDDING_TABLE
)

# Delete already existing documents from the table
db.delete(filter={})

# add the loaded document chunks
db.add_documents(texts)

## Check the embeddings in SAP HANA Cloud Vector Engine

👉 Check the chunks that were created with the semantinc chunker and compare them to the previously created chunks from exercise 6.

In [None]:
cursor = connection.cursor()
embeddings = cursor.execute(f'SELECT VEC_TEXT, VEC_META, TO_NVARCHAR(VEC_VECTOR) FROM "{db.table_name}"')
print(embeddings)
for row in cursor:
    print(row)
cursor.close()

[Next exercise](09-use-multimodal-models.ipynb)