# Handling Long Text in LLMs
This notebook demonstrates practical techniques to handle long text.

## Chunking

In [14]:
def chunk_text(text, chunk_size=100, overlap=20):
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)

    return chunks


# Example long text
long_text = """
Large Language Models have a limited context window.
When documents exceed this limit, they must be split into chunks.
Chunking helps process long documents like PDFs, logs, and books.
Each chunk is processed independently by the LLM.
"""

chunks = chunk_text(long_text, chunk_size=20, overlap=5)

for i, c in enumerate(chunks, 1):
    print(f"\nChunk {i}:\n{c}")



Chunk 1:
Large Language Models have a limited context window. When documents exceed this limit, they must be split into chunks. Chunking

Chunk 2:
be split into chunks. Chunking helps process long documents like PDFs, logs, and books. Each chunk is processed independently by

Chunk 3:
chunk is processed independently by the LLM.


## Sliding Window

In [15]:
def sliding_window(text, window_size=15, stride=5):
    words = text.split()
    windows = []

    for i in range(0, len(words) - window_size + 1, stride):
        window = " ".join(words[i:i + window_size])
        windows.append(window)

    return windows


windows = sliding_window(long_text)

for i, w in enumerate(windows, 1):
    print(f"\nWindow {i}:\n{w}")



Window 1:
Large Language Models have a limited context window. When documents exceed this limit, they must

Window 2:
limited context window. When documents exceed this limit, they must be split into chunks. Chunking

Window 3:
exceed this limit, they must be split into chunks. Chunking helps process long documents like

Window 4:
be split into chunks. Chunking helps process long documents like PDFs, logs, and books. Each

Window 5:
helps process long documents like PDFs, logs, and books. Each chunk is processed independently by


## Hierarchical Map Reduce

In [16]:
chunk_summaries = []

for chunk in chunks:
    summary = f"Summary: {chunk[:60]}..."
    chunk_summaries.append(summary)

final_summary = " ".join(chunk_summaries)

print("Final Summary:\n", final_summary)


Final Summary:
 Summary: Large Language Models have a limited context window. When do... Summary: be split into chunks. Chunking helps process long documents ... Summary: chunk is processed independently by the LLM....


## Retrieval-Based Long Text Handling (Mini RAG)

In [17]:
def retrieve_chunks(query, chunks):
    relevant = []
    for chunk in chunks:
        if any(word.lower() in chunk.lower() for word in query.split()):
            relevant.append(chunk)
    return relevant


query = "context window"
results = retrieve_chunks(query, chunks)

print("Retrieved Chunks:")
for r in results:
    print("-", r)


Retrieved Chunks:
- Large Language Models have a limited context window. When documents exceed this limit, they must be split into chunks. Chunking


## Long Conversation Memory Handling

In [18]:
conversation = [
    "User: Explain LLMs",
    "Assistant: LLMs are large neural networks",
    "User: What is context window?",
    "Assistant: It limits how much text can be read"
]

def summarize_conversation(messages):
    return "Conversation summary: User asked about LLMs and context window."

summary = summarize_conversation(conversation)

print(summary)


Conversation summary: User asked about LLMs and context window.


## Combined Real-World Pipeline

In [19]:
chunks = chunk_text(long_text)
retrieved = retrieve_chunks("documents", chunks)

final_answer = " ".join(retrieved)
print("Final Answer:\n", final_answer)

Final Answer:
 Large Language Models have a limited context window. When documents exceed this limit, they must be split into chunks. Chunking helps process long documents like PDFs, logs, and books. Each chunk is processed independently by the LLM.
