# Method 1: Fixed-Size Chunking

This notebook demonstrates the most basic chunking strategy: **Fixed-Size Chunking**.

## Concept
The code is split into chunks of a fixed number of characters or tokens. An optional overlap can be defined to slide across the text, which helps preserve some context that would otherwise be lost at the boundaries.

### Pros:
- **Simple to Implement:** Requires no understanding of the code's structure.
- **Language Agnostic:** Works on any programming language without special parsers.

### Cons:
- **Arbitrary Boundaries:** Almost always breaks code in syntactically and semantically meaningless places, which can lead to low-quality embeddings.

In [1]:
# No special libraries needed for this basic method
print("Setup complete.")

Setup complete.


## Sample Code for Demonstration
We will use the following block of Python code as the input.

In [2]:
sample_code = """
# Utility functions for string manipulation

def say_hello(name: str):
    \"\"\"A simple function to greet someone.\"\"\"
    print(f"Hello, {name}!")

def say_goodbye(name: str):
    \"\"\"A simple function to say goodbye.\"\"\"
    print(f"Goodbye, {name}!")

class StringHelper:
    \"\"\"A class with advanced string operations.\"\"\"
    def __init__(self, text: str):
        self.text = text

    def reverse(self) -> str:
        # Reverse the text using slicing
        return self.text[::-1]

    def count_vowels(self) -> int:
        vowels = "aeiouAEIOU"
        return sum(1 for char in self.text if char in vowels)

# Standalone script execution
if __name__ == "__main__":
    helper = StringHelper("hello world")
    reversed_text = helper.reverse()
    vowel_count = helper.count_vowels()

    say_hello("Alice")
    print(f"Original: 'hello world', Reversed: '{reversed_text}', Vowels: {vowel_count}")
    say_goodbye("Alice")
"""

## Implementation

In [3]:
def chunk_fixed_size(code: str, chunk_size: int, overlap: int):
    """Chunks code into fixed-size chunks with overlap."""
    chunks = []
    for i in range(0, len(code), chunk_size - overlap):
        chunk = code[i:i + chunk_size]
        chunks.append(chunk)
    return chunks

print("--- Method 1: Fixed-Size Chunking ---")
fixed_size_chunks = chunk_fixed_size(sample_code, chunk_size=200, overlap=30)

for i, chunk in enumerate(fixed_size_chunks):
    print(f"--- Chunk {i+1} ---")
    print(chunk)

--- Method 1: Fixed-Size Chunking ---
--- Chunk 1 ---

# Utility functions for string manipulation

def say_hello(name: str):
    """A simple function to greet someone."""
    print(f"Hello, {name}!")

def say_goodbye(name: str):
    """A simple function
--- Chunk 2 ---
str):
    """A simple function to say goodbye."""
    print(f"Goodbye, {name}!")

class StringHelper:
    """A class with advanced string operations."""
    def __init__(self, text: str):
        self
--- Chunk 3 ---
self, text: str):
        self.text = text

    def reverse(self) -> str:
        # Reverse the text using slicing
        return self.text[::-1]

    def count_vowels(self) -> int:
        vowels = "
--- Chunk 4 ---
lf) -> int:
        vowels = "aeiouAEIOU"
        return sum(1 for char in self.text if char in vowels)

# Standalone script execution
if __name__ == "__main__":
    helper = StringHelper("hello world
--- Chunk 5 ---
er = StringHelper("hello world")
    reversed_text = helper.reverse()
    vowe