Skip to content

fix: "\%" early stop#160

Merged
4ndrelim merged 3 commits intomainfrom
fix/gpt-early-stop
Apr 24, 2026
Merged

fix: "\%" early stop#160
4ndrelim merged 3 commits intomainfrom
fix/gpt-early-stop

Conversation

@wjiayis
Copy link
Copy Markdown
Member

@wjiayis wjiayis commented Apr 19, 2026

Previously:

  • % is used to remove comments, which interfered with literal \% in LaTeX documents.

Now:

  • Handle % properly. Tested for %, \% and \\%.

@wjiayis wjiayis changed the title fix: "/%" early stop fix: "\%" early stop Apr 19, 2026
@wjiayis wjiayis requested review from 4ndrelim and Junyi-99 April 19, 2026 17:12
4ndrelim
4ndrelim previously approved these changes Apr 22, 2026
Copy link
Copy Markdown
Member

@4ndrelim 4ndrelim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments and suggestions for your consideration, but overall lgtm.

// non-backslash character so that \% (an escaped percent) is preserved. Pairs
// of backslashes (\\) before % are treated as a line-break followed by a real
// comment, matching LaTeX semantics.
var commentRegex = regexp.MustCompile(`(^|[^\\])((?:\\\\)*)%.*$`)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cover most cases. Think its fair not to expect multiple chained \ since it hardly serves practical purpose beyond 2 consecutive backslashes which represents newline. 3 consecutive followed by a % would denote newline followed by % which is hardly the intent.

But a more general solution that guards against malicious input is to count the number of backslashes preceding % and only treating the remaining as comment if the count is even. This logic is actually implemented in the xtramcp backend in LatexParser since it is not uncommon to encounter cases e.g.\\\{. https://github.com/PaperDebugger/xtramcp/blob/main/common/latex_parser.py#L780

That said, I personally feel it is reasonable enough and am also ok with this solution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on a separate note, do you know why are we removing comments before passing to the LLM? I don't know the specifics, but assume this is the case. In general, even if it is a user comment not meant to be displayed, it might serve as good contextual cues or auxiliary information. The output may be informed with user's thoughts in the form of comments.

Might it be because some comments might be too long / redundant? So this is an attempt to save tokens?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@4ndrelim Sure, I can generalise this solution to count odd and even backslashes.

@Junyi-99 Was there more context as to why comments were removed previously?

Copy link
Copy Markdown
Member Author

@wjiayis wjiayis Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@4ndrelim Wait actually this already handles 3 or more backslashes before the % as well. I've improved the test cases to cover 3 or more backslashes regardless.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i did not realise it already covers. nice.

@4ndrelim 4ndrelim merged commit 9ecf372 into main Apr 24, 2026
1 check passed
@4ndrelim 4ndrelim deleted the fix/gpt-early-stop branch April 24, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants