Skip to content

Python: Fix inaccurate split flag in TextChunker to prevent redundant re-splitting #12623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mohiuddin-khan-shiam
Copy link

Description

semantic_kernel.text.text_chunker._split_str always returned input_was_split=False even after splitting, causing higher-level routines to keep searching separators and unnecessarily re-split text.
The function now sets input_was_split=True as soon as it performs the initial split and continues to propagate deeper recursive flags, improving performance and preserving intended chunk boundaries.

odiomarcelino and others added 2 commits June 29, 2025 18:46
…tting

`semantic_kernel.text.text_chunker._split_str` always returned `input_was_split=False` even after splitting, causing higher-level routines to keep searching separators and unnecessarily re-split text.
The function now sets `input_was_split=True` as soon as it performs the initial split and continues to propagate deeper recursive flags, improving performance and preserving intended chunk boundaries.

Co-Authored-By: S. M. Mohiuddin Khan Shiam <147746955+mohiuddin-khan-shiam@users.noreply.github.com>
…tting

`semantic_kernel.text.text_chunker._split_str` always returned `input_was_split=False` even after splitting, causing higher-level routines to keep searching separators and unnecessarily re-split text.  
The function now sets `input_was_split=True` as soon as it performs the initial split and continues to propagate deeper recursive flags, improving performance and preserving intended chunk boundaries.
@mohiuddin-khan-shiam mohiuddin-khan-shiam requested a review from a team as a code owner June 29, 2025 12:49
@markwallace-microsoft markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Jun 29, 2025
@github-actions github-actions bot changed the title Fix inaccurate split flag in TextChunker to prevent redundant re-splitting Python: Fix inaccurate split flag in TextChunker to prevent redundant re-splitting Jun 29, 2025
@moonbox3
Copy link
Contributor

Hi @mohiuddin-khan-shiam thanks for the contribution. Can you please have a look at the failing unit tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants