## Character Text Splitter

### How to split by character
This is the simplest method. This splits based on a given character sequence, which defaults to "\n\n". Chunk length is measured by number of characters.

1. How the text is split: by single character separator.
2. How the chunk size is measured: by number of characters.


In [None]:
from langchain_community.document_loaders import TextLoader

loader=TextLoader('speech.txt')
docs=loader.load()
docs

In [None]:
docs[0].page_content

🧠 What’s Happening Internally?
LangChain’s CharacterTextSplitter logic works like this:

Split the input text using your separator ("\n\n") — so it first breaks the document into paragraphs.

Then it groups together those split chunks (paragraphs) until the total character count reaches or exceeds chunk_size.

It tries to create chunks as close to chunk_size as possible but will not break inside a paragraph.

If a paragraph alone is larger than chunk_size, it keeps it as-is — leading to messages like:

Created a chunk of size 470, which is longer than the specified 100

In [None]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n\n",       # Split on double newlines (typically paragraph breaks)
    chunk_size=100,         # Target chunk size
    chunk_overlap=20        # Overlap between chunks to preserve context
)

text_splitter.split_documents(docs)


In [None]:
speech=""
with open("speech.txt") as f:
    speech=f.read()


text_splitter=CharacterTextSplitter(chunk_size=100,chunk_overlap=20)
text=text_splitter.create_documents([speech])
print(text[0])
print(text[1])