# Semantic Chunking

### Imports and configs

In [1]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.ingestion import IngestionPipeline
from llama_index.vector_stores.faiss import FaissVectorStore
from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
import faiss
import os
import sys
from dotenv import load_dotenv
from utils import TextCleaner


sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

EMBED_DIMENSION = 512
CHUNK_SIZE = 250
CHUNK_OVERLAP = 25

load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small", dimensions=EMBED_DIMENSION)

path = "../data/"
node_parser = SimpleDirectoryReader(input_dir=path, required_exts=['.txt', '.pdf'])
documents = node_parser.load_data()




### Semantic Chunking

In [9]:

print("Creating new vector store...")
faiss_index = faiss.IndexFlatL2(EMBED_DIMENSION)
vector_store = FaissVectorStore(faiss_index=faiss_index)

base_text_splitter = SentenceSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)

pipeline = IngestionPipeline(
    transformations=[
        TextCleaner(),
        base_text_splitter,
    ],
    vector_store=vector_store,
)

nodes = pipeline.run(documents=documents)
vector_store_index = VectorStoreIndex(nodes)

retriever = vector_store_index.as_retriever(similarity_top_k=1)

Creating new vector store...


In [10]:
test_query = "What is the SNP's position on the EU?"
results = retriever.retrieve(str_or_query_bundle=test_query)
print(f"Results: {results[0].text}")

Results: DEFENDING DEMOCRACY  AND HUMAN RIGHTS
The SNP stands on a strong record of defending Scotland’s democratic functions and institutions, and we will always stand up to promote and protect Scotland’s democracy and make sure that the people of Scotland’s voices are heard. SNP MPs will demand the UK Government:
Give the people of Scotland a say on their future. Demand the permanent transfer of legal power to the Scottish Parliament to determine  how Scotland is governed, including the  transfer of power to enable it to legislate for  a referendum.
End Westminster’s power grab by demanding the UK government repeal the reprehensible Internal Market Act. We are clear that UK ministers must not be able to act unilaterally across policy areas that are within devolved competencies, and will push for the Sewel Convention to be put on a proper statutory footing.
Support abolition of the undemocratic House of Lords.


In [7]:
semantic_splitter = SemanticSplitterNodeParser(
    buffer_size=1,  # number of sentences to group together when evaluating semantic similarity
    breakpoint_percentile_threshold=95,  # The percentile of cosine dissimilarity that must be exceeded between a group of sentences and the next to form a node. The smaller this number is, the more nodes will be generated
    embed_model=Settings.embed_model
)
pipeline = IngestionPipeline(
    transformations=[
        TextCleaner(),
        semantic_splitter,
    ],
    vector_store=vector_store,
)

nodes = pipeline.run(documents=documents)
vector_store_index = VectorStoreIndex(nodes)

retriever = vector_store_index.as_retriever(similarity_top_k=1)

In [8]:
test_query = "What is the SNP's position on the EU?"
results = retriever.retrieve(str_or_query_bundle=test_query)
print(f"Results: {results[0].text}")

Results:  DECISIONS MADE IN SCOTLAND, FOR SCOTLAND.      27SNP General Election Manifesto  2024
SCOTLAND’S PLACE IN THE WORLD
We want to see an independent Scotland take its place in the international community; alongside the 193 other United Nations member states, able to join the European Union, with the powers necessary to protect our citizens and prosper in the global economy. We are determined that Scotland plays a positive and progressive role in international affairs through action and leadership. SNP MPs will call on the UK Government to:
Demand an immediate ceasefire in Gaza, release of hostages and end arms sales to Israel. We will continue to call on the UK Government to follow the lead of Ireland, Norway and Spain by immediately recognising Palestine as a state. We believe that recognising Palestine as a state in its own right is the only way to move towards a just and durable long-term peace, in the interests of both Palestinians and Israelis.
Stand by Ukraine and continue