# Splitters and types  
**It is tools used to divided a large text documnets into smaller,more manageable chunks**

## 1. CharacterTextSplitter  
**Description**: Splits text into chunks based on character count.  
**Use Case**: Ideal for general-purpose splitting where tokenization or semantic structure is not crucial.  
**Parameters:**  
**chunk_size**:** Maximum size of each chunk.  
**chunk_overlap**:** Number of overlapping characters between consecutive chunks.

In [11]:
#ignore the warnings
import warnings
warnings.filterwarnings('ignore')

In [6]:
from langchain.text_splitter import CharacterTextSplitter

#Define the chunks size (chunk refers to a smaller, manageable segment of a larger piece of text)

chunks_size = 20
chunks_overlap =4

c_splitter = CharacterTextSplitter(chunk_size = chunks_size,
                                #separator  = "\n\n",
                                chunk_overlap = chunks_overlap
                                )


In [7]:
#Other method 

text = "LangChain is a framework for building LLM applications. It supports memory, retrieval, and more."
texts = "LangChain is a framework for building applications powered by LLMs. It simplifies the process of integrating memory, tools, and retrieval into intelligent workflows."

splitter = CharacterTextSplitter(chunk_size = 8,chunk_overlap =2)
chunks = splitter.split_text(texts)
print(chunks)

['LangChain is a framework for building applications powered by LLMs. It simplifies the process of integrating memory, tools, and retrieval into intelligent workflows.']


## RecursiveCharacterTextSplitter  
**Description:** Splits text based on characters but preserves semantic boundaries such as sentences or paragraphs.  
**Use Case:** Useful for maintaining coherence in split text chunks.

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
r_splitter = RecursiveCharacterTextSplitter(chunk_size = chunks_size,
                                #separator  = ["\n","\n\n"," ",""], #by default separator
                                chunk_overlap = chunks_overlap
                                )

In [9]:
text = """LangChain helps build intelligent applications. 
It integrates retrieval, memory, and external tools."""
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)
chunks = splitter.split_text(text)
print(chunks)


['LangChain helps build intelligent applications.', 'It integrates retrieval, memory, and external', 'external tools.']


# SpacyTextSplitter

In [12]:
from langchain.text_splitter import SpacyTextSplitter

splitter = SpacyTextSplitter(chunk_size =50,chunk_overlap = 10)
chunk =splitter.split_text(text)
print(chunk)

['LangChain helps build intelligent applications.', 'It integrates retrieval, memory, and external tools.']


# NLTKTextSplitter  
**Description**: Splits text into sentences using the Natural Language Toolkit (NLTK) library.  
**Use Case:** Suitable for sentence-level splitting, especially for summarization or question answering tasks.

In [13]:
from langchain.text_splitter import NLTKTextSplitter

text = "LangChain is great. It simplifies working with LLMs. Try it out!"
splitter = NLTKTextSplitter()
chunks = splitter.split_text(text)
print(chunks)


['LangChain is great.\n\nIt simplifies working with LLMs.\n\nTry it out!']
