# Method 2: Recursive Character Splitting

This notebook demonstrates **Recursive Character Splitting**, an improvement over fixed-size chunking that attempts to respect some structural boundaries.

## Concept
This method splits text based on a hierarchical list of separator characters. For code, this can be adapted to be language-aware by prioritizing separators like class and function definitions. It recursively works its way down the list of separators until the chunks are a manageable size.

### Pros:
- **Better Semantic Boundaries:** Does a much better job of creating meaningful chunks than the fixed-size version, especially when configured with language-specific delimiters.
- **Good Fallback:** A great strategy when a full AST parser isn't available or fails.

### Cons:
- **Can Be Brittle:** May fail with unusually formatted code and doesn't have the guaranteed understanding of an AST.
- **Requires Customisation:** The separator hierarchy needs to be defined for each programming language.

In [1]:
# We'll use LangChain's implementation for simplicity
%pip install langchain tiktoken

Collecting langchain
  Using cached langchain-0.3.26-py3-none-any.whl.metadata (7.8 kB)
Collecting tiktoken
  Downloading tiktoken-0.9.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (6.7 kB)
Collecting langchain-core<1.0.0,>=0.3.66 (from langchain)
  Downloading langchain_core-0.3.68-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain)
  Using cached langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith>=0.1.17 (from langchain)
  Downloading langsmith-0.4.4-py3-none-any.whl.metadata (15 kB)
Collecting pydantic<3.0.0,>=2.7.4 (from langchain)
  Using cached pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Downloading sqlalchemy-2.0.41-cp312-cp312-macosx_11_0_arm64.whl.metadata (9.6 kB)
Collecting requests<3,>=2 (from langchain)
  Using cached requests-2.32.4-py3-none-any.whl.metadata (4.9 kB)
Collecting PyYAML>=5.3 (from langchain)
  Using cached PyYAML-6.0.2-

## Sample Code for Demonstration
We will use the following block of Python code as the input.

In [2]:
sample_code = """
# Utility functions for string manipulation

def say_hello(name: str):
    \"\"\"A simple function to greet someone.\"\"\"
    print(f"Hello, {name}!")

def say_goodbye(name: str):
    \"\"\"A simple function to say goodbye.\"\"\"
    print(f"Goodbye, {name}!")

class StringHelper:
    \"\"\"A class with advanced string operations.\"\"\"
    def __init__(self, text: str):
        self.text = text

    def reverse(self) -> str:
        # Reverse the text using slicing
        return self.text[::-1]

    def count_vowels(self) -> int:
        vowels = "aeiouAEIOU"
        return sum(1 for char in self.text if char in vowels)

# Standalone script execution
if __name__ == "__main__":
    helper = StringHelper("hello world")
    reversed_text = helper.reverse()
    vowel_count = helper.count_vowels()

    say_hello("Alice")
    print(f"Original: 'hello world', Reversed: '{reversed_text}', Vowels: {vowel_count}")
    say_goodbye("Alice")
"""

## Implementation

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create a splitter specifically for Python using LangChain's presets
python_splitter = RecursiveCharacterTextSplitter.from_language(
    language="python", chunk_size=200, chunk_overlap=30
)

print("--- Method 2: Recursive Character Splitting ---")
recursive_chunks = python_splitter.split_text(sample_code)

for i, chunk in enumerate(recursive_chunks):
    print(f"--- Chunk {i+1} ---")
    print(chunk)

--- Method 2: Recursive Character Splitting ---
--- Chunk 1 ---
# Utility functions for string manipulation

def say_hello(name: str):
    """A simple function to greet someone."""
    print(f"Hello, {name}!")
--- Chunk 2 ---
def say_goodbye(name: str):
    """A simple function to say goodbye."""
    print(f"Goodbye, {name}!")
--- Chunk 3 ---
class StringHelper:
    """A class with advanced string operations."""
    def __init__(self, text: str):
        self.text = text
--- Chunk 4 ---
def reverse(self) -> str:
        # Reverse the text using slicing
        return self.text[::-1]
--- Chunk 5 ---
def count_vowels(self) -> int:
        vowels = "aeiouAEIOU"
        return sum(1 for char in self.text if char in vowels)
--- Chunk 6 ---
# Standalone script execution
if __name__ == "__main__":
    helper = StringHelper("hello world")
    reversed_text = helper.reverse()
    vowel_count = helper.count_vowels()
--- Chunk 7 ---
say_hello("Alice")
    print(f"Original: 'hello world', Reversed