#Text Chunking Strategies with Gemini API

âœ… Fixed-Length Chunking

âœ… Sliding Window Chunking

âœ… Semantic Chunking (using Gemini)

#âœ… Objective:
To demonstrate Fixed-Length, Sliding Window, and Semantic Chunking on text using Python and Gemini LLM.

âœ… Step-by-Step Code

In [1]:
# Install if not already
!pip install google-generativeai nltk pandas




#ðŸ”¹ Step 1: Install & Import Libraries

In [5]:
 #Import libraries
import nltk
import google.generativeai as genai

nltk.download('punkt_tab')
from nltk.tokenize import sent_tokenize, word_tokenize

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


#ðŸ”¹ Step 2: Configure Gemini API

In [7]:
# Configure Gemini API (replace with your API key)
genai.configure(api_key="AIzaSyDR7ItGwxOcbodnqRZXJQzFN_MVrRWxGaw")

# Load Gemini model
model = genai.GenerativeModel("gemini-1.5-flash-latest")


#ðŸ”¹ Step 3: Input Synthetic Text

In [8]:
# Sample synthetic paragraph
text = """Artificial Intelligence is changing the way we live. From self-driving cars to smart assistants, the technology is evolving rapidly.
It helps in automation, improves decision making, and enhances productivity across many industries."""


#ðŸ”¸ STRATEGY 1: Fixed-Length Chunking
Split text into chunks of N words (e.g., 10-word chunks)

In [9]:
def fixed_length_chunking(text, chunk_size=10):
    words = word_tokenize(text)
    chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
    return chunks

fixed_chunks = fixed_length_chunking(text, chunk_size=10)

print("\nðŸ”¹ Fixed-Length Chunks:")
for i, chunk in enumerate(fixed_chunks):
    print(f"Chunk {i+1}: {chunk}")



ðŸ”¹ Fixed-Length Chunks:
Chunk 1: Artificial Intelligence is changing the way we live . From
Chunk 2: self-driving cars to smart assistants , the technology is evolving
Chunk 3: rapidly . It helps in automation , improves decision making
Chunk 4: , and enhances productivity across many industries .


#ðŸ”¸ STRATEGY 2: Sliding Window Chunking
Creates overlapping chunks using a sliding window approach.

In [10]:
def sliding_window_chunking(text, window_size=10, step_size=5):
    words = word_tokenize(text)
    chunks = []
    for i in range(0, len(words) - window_size + 1, step_size):
        chunk = ' '.join(words[i:i+window_size])
        chunks.append(chunk)
    return chunks

sliding_chunks = sliding_window_chunking(text, window_size=10, step_size=5)

print("\nðŸ”¹ Sliding Window Chunks:")
for i, chunk in enumerate(sliding_chunks):
    print(f"Chunk {i+1}: {chunk}")



ðŸ”¹ Sliding Window Chunks:
Chunk 1: Artificial Intelligence is changing the way we live . From
Chunk 2: way we live . From self-driving cars to smart assistants
Chunk 3: self-driving cars to smart assistants , the technology is evolving
Chunk 4: , the technology is evolving rapidly . It helps in
Chunk 5: rapidly . It helps in automation , improves decision making
Chunk 6: automation , improves decision making , and enhances productivity across


#ðŸ”¸ STRATEGY 3: Semantic Chunking (Using Gemini)
Use Gemini to split the paragraph into semantically meaningful chunks (e.g., based on topic, idea, meaning).

In [11]:
semantic_prompt = f"""
Split the following paragraph into meaningful chunks based on sentence meaning or topic (semantic chunking).
Provide each chunk as a separate bullet point:

Paragraph:
{text}
"""

response = model.generate_content(semantic_prompt)

print("\nðŸ”¹ Semantic Chunking with Gemini:")
print(response.text)



ðŸ”¹ Semantic Chunking with Gemini:
* Artificial Intelligence is changing the way we live.
* Examples include self-driving cars and smart assistants, showcasing the technology's rapid evolution.
* AI contributes to automation, improved decision-making, and increased productivity in various sectors.

