# Chunking Optimizations 📃

In our case, the recursive text splitter will suffice for our chunking strategy. This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

You can read more about the recursive text splitter [here](https://python.langchain.com/v0.2/docs/how_to/recursive_text_splitter/) in the LangChain docs.

## Import Libraries 🧑‍💻

In [None]:
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_openai import AzureOpenAIEmbeddings
load_dotenv()

## Connect to existing Azure OpenAI 🤖 & Azure Search Instances 🔎

In [None]:
embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(
    azure_deployment="embeddings",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY")
)

index_name: str = "products-optimized"
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=os.getenv("AZURE_SEARCH_ENDPOINT"),
    azure_search_key=os.getenv("AZURE_SEARCH_KEY"),
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

## Chunk, Vectorize, and Upsert our product documents

In [None]:
for filename in os.listdir('..\sample-docs'):
    if filename.endswith(".txt"):  # Adjust the file extension as needed
        file_path = os.path.join('..\sample-docs', filename)
        
        # Load the document
        loader = TextLoader(file_path, encoding="utf-8")
        document = loader.load()
        
        # Split the document
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=300)
        docs = text_splitter.split_documents(document)
        vector_store.add_documents(documents=docs)

## Question #1 ❓

In [None]:
docs = vector_store.similarity_search(
    query="what smart phones do you sell?",
    k=3,
    search_type="similarity",
)
print(docs[0].page_content)

## Question #2 ❓

In [None]:
docs = vector_store.similarity_search(
    query="how much is the NexTech phone?",
    k=3,
    search_type="similarity",
)
print(docs[0].page_content)

## Question #3 ❓

In [None]:
docs = vector_store.similarity_search(
    query="what laptops do you have?",
    k=3,
    search_type="similarity",
)
print(docs[0].page_content)

## Question #4 ❓

If you recall back to the previous module when we created our intial RAG application, the bot could not answer this question.

In [None]:
docs = vector_store.similarity_search(
    query="How much is the home theater system?",
    k=3,
    search_type="similarity",
)
print(docs[0].page_content)