Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: out of bounds panic when diffing large number of words #695

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

JonathanxD
Copy link

@JonathanxD JonathanxD commented Apr 5, 2024

I dropped all the old text because it does not make sense anymore.

The problem is basically that Rust's str::lines() will return the same number of lines regardless of the last character being a new line or not, and the assumptions made on inline and side-by-side display implementations depends on having a clear distinction of those cases.

Those changes introduces a function that will chain an additional element when the string ends with a newline. I'm not aware of any function from stdlib that does what we need (and I couldn't find one by looking at the documentation), the closest one was split, but it has obvious implications (\r\n, for example).

This does solves #688, #694, #682 and #681, I tested all the provided files and they're all working without any panics.

Update 2024-07-08:

I reworked on this patch and now it does not add a new line to the end of the string, therefore, all tests are passing, including the regression ones (which was failing before). In this new revision it fixes #682 again, which involves --display=inline and was not working anymore when I initially merged the changes from master into my branch.

I also tested another case that I found out and it fixes #656.

The new approach is more hacky: in one case it just removes the additional line range when the line is empty, and in another case it just checks if the index is within range before trying to access the index. However, unlike the previous solution, this one does not fail the regression tests.

@JonathanxD JonathanxD force-pushed the fix/out-of-bounds-panic branch 5 times, most recently from f4161f2 to b686f58 Compare April 6, 2024 03:25
@JonathanxD
Copy link
Author

I ran a small script to compare the output of db281c6 with my branch to find the cause of regression, it's all the trailing newlines that were not being printed before.

@Wilfred
Copy link
Owner

Wilfred commented Apr 7, 2024

Thanks for the pull request, and the clear explanation! This sounds like a regression from 8c004be.

@JonathanxD
Copy link
Author

JonathanxD commented Apr 13, 2024

Thanks for the pull request, and the clear explanation!

You're welcome.

This sounds like a regression from 8c004be.

I'm not sure if this is the cause of the regression because unless I'm missing something, the changes in this commit are just inlining the content of split_on_newlines on the call-site, which was already using lines(). I thought it could be different but after trying on playground, both versions yielded the same results.

@Wilfred Do you have any plans to merge my changes in the near future? If so, I can update the regression file so the pipeline succeeds. If you don't want to merge, I can use the examples provided in the reported issues to bisect which commit caused the regression and bring it to the discussion.

@fschoenm
Copy link

What's the status of this PR? I'm running into this issue too.

@JonathanxD
Copy link
Author

What's the status of this PR? I'm running into this issue too.

Idk, everything is fine from my side. It depends on how @Wilfred wants to go from there.

I've been personally just building difftastic with my patches on top since without them difft would crash 90% of the time for my use-case.

This approach removes the additional empty line that would cause
the display logic to disagree in line count.

The major difference is that this new approach does not change the
output of the display implementation, therefore, no test files need to
be updated, and no additional line is printed to the end-user.

As for the cause of the index out-of-bounds, this is simply a
disagreement between line sources, such as the `line-numbers`
crate and Tree Sitter, which all reports the correct number of lines,
with the `str::lines` iterator which does not iterate over the last
new line character if the last line is empty.
This is caused by the same problem as before, but the disagreement
seems to be with the Tree itself and happens inside
`syntax::change_positions`, there may be another way to work around
this by removing the correct entry from `rhs_positions` when
the conditions are met.
@JonathanxD
Copy link
Author

I noticed that #731 does the same thing as I did (but slightly differently), which is to ignore the last empty line, and it will fix most of the out-of-bounds panics except for the yaml one, which is caused by the same problem, but in a different location (mine "fixes" this one by skipping out of bounds indexes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic diffing text file with --display inline (apparently interpreted as sql) Crash on gauche commit
3 participants