# Comparing splitters for RAG

Let's look at sentence splitting in detail.

## But first, let's learn how to learn LlamaIndex.

The LlamaIndex documentation is here: https://docs.llamaindex.ai/en/stable/ There are several important tabs:
- Component Guides - explains how to use the various components: loaders, splitters, indexers, etc.
- API Reference - explains each component in detail
- Examples - many examples showcasing different components
- Search bar at the upper right - pretty good way to find more information (like examples) about a specific component

In addition to the documentation, you can find the source code here:
- https://github.com/run-llama/llama_index
- Search bar at the top - great way to find the source for a specific component

### Let's use the documentation to learn more about Splitters

- Start with the Component Guide to learn about various kinds of splitters
- Then go to the API reference to learn about a specific splitter (SentenceSplitter)
- Go to github to read the full source file
- Go back to the docs and search for examples

In [1]:
%load_ext autoreload
%autoreload 2
%load_ext dotenv
%dotenv

In [2]:
from llama_index.core import Document
from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
    MarkdownNodeParser,
)
from llama_index.embeddings.openai import OpenAIEmbedding

In [3]:
# configure
filename = 'sleeping_gods.md'

In [4]:
def print_nodes(nodes):
    for ix, node in enumerate(nodes):
        print(f"\n\n>>> Node {ix} length={len(node.text)}\n")
        print(node.text)

## Load document

In [5]:
with open(f'data/{filename}', 'r', encoding='utf-8') as file:
    document = Document(
        text = file.read(),
        metadata = {"filename": filename},
    )
print(len(document.text))

77034


In [6]:
cutoff = 8000
print(document.text[:cutoff])

## Overview (page 1)

1-4 players, ages 13+, 1-20 hours

"This is the Wandering Sea. The gods have brought you here, and you must wake them if you wish to return home."

In Sleeping Gods, you and up to three friends become Captain Sofi Odessa and her crew, lost in a strange world in 1929 on your steamship, the Manticore. You must work together to survive, exploring exotic islands, meeting new characters, and seeking out the totems of the gods so that you can return home.

Sleeping Gods is a campaign game. Each session can last as long as you want. When you are ready to take a break, you mark your progress on a journey log sheet, making it easy to return to the same place in the game the next time you play. You can play alone or with friends throughout your campaign— it's easy to swap players in and out at will. Your goal is to find as many totems as possible, which are hidden throughout the world. Like reading a book, you'll complete this journey one or two hours at a time, discovering

In [7]:
# let's just focus on the first part of the document to make this manageable
document.text = document.text[:cutoff]
print(len(document.text))

8000


## Sentence Splitter

In [11]:
chunk_size = 300
chunk_overlap = 50
splitter = SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

In [12]:
nodes = splitter.get_nodes_from_documents([document], show_progress=False)
print(len(nodes))

8


In [13]:
print_nodes(nodes[:10])



>>> Node 0 length=1168

## Overview (page 1)

1-4 players, ages 13+, 1-20 hours

"This is the Wandering Sea. The gods have brought you here, and you must wake them if you wish to return home."

In Sleeping Gods, you and up to three friends become Captain Sofi Odessa and her crew, lost in a strange world in 1929 on your steamship, the Manticore. You must work together to survive, exploring exotic islands, meeting new characters, and seeking out the totems of the gods so that you can return home.

Sleeping Gods is a campaign game. Each session can last as long as you want. When you are ready to take a break, you mark your progress on a journey log sheet, making it easy to return to the same place in the game the next time you play. You can play alone or with friends throughout your campaign— it's easy to swap players in and out at will. Your goal is to find as many totems as possible, which are hidden throughout the world. Like reading a book, you'll complete this journey one or two ho

## SemanticSplitterNodeParser

In [18]:
breakpoint_percentile_threshold=80
embed_model = OpenAIEmbedding(model='text-embedding-3-small')
splitter = SemanticSplitterNodeParser(breakpoint_percentile_threshold=breakpoint_percentile_threshold, embed_model=embed_model)

In [19]:
nodes = splitter.get_nodes_from_documents([document], show_progress=False)
print(len(nodes))

26


In [20]:
print_nodes(nodes[:10])



>>> Node 0 length=554

## Overview (page 1)

1-4 players, ages 13+, 1-20 hours

"This is the Wandering Sea. The gods have brought you here, and you must wake them if you wish to return home."

In Sleeping Gods, you and up to three friends become Captain Sofi Odessa and her crew, lost in a strange world in 1929 on your steamship, the Manticore. You must work together to survive, exploring exotic islands, meeting new characters, and seeking out the totems of the gods so that you can return home.

Sleeping Gods is a campaign game. Each session can last as long as you want. 


>>> Node 1 length=357

When you are ready to take a break, you mark your progress on a journey log sheet, making it easy to return to the same place in the game the next time you play. You can play alone or with friends throughout your campaign— it's easy to swap players in and out at will. Your goal is to find as many totems as possible, which are hidden throughout the world. 


>>> Node 2 length=423

Like reading

## MarkdownNodeParser

In [21]:
splitter = MarkdownNodeParser()

In [22]:
nodes = splitter.get_nodes_from_documents([document], show_progress=False)
print(len(nodes))

11


In [23]:
print_nodes(nodes[:10])



>>> Node 0 length=1575

Overview (page 1)

1-4 players, ages 13+, 1-20 hours

"This is the Wandering Sea. The gods have brought you here, and you must wake them if you wish to return home."

In Sleeping Gods, you and up to three friends become Captain Sofi Odessa and her crew, lost in a strange world in 1929 on your steamship, the Manticore. You must work together to survive, exploring exotic islands, meeting new characters, and seeking out the totems of the gods so that you can return home.

Sleeping Gods is a campaign game. Each session can last as long as you want. When you are ready to take a break, you mark your progress on a journey log sheet, making it easy to return to the same place in the game the next time you play. You can play alone or with friends throughout your campaign— it's easy to swap players in and out at will. Your goal is to find as many totems as possible, which are hidden throughout the world. Like reading a book, you'll complete this journey one or two hours