Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selection, undo and collaboration produce range errors #892

Closed
johanneswilm opened this Issue Feb 15, 2019 · 13 comments

Comments

Projects
None yet
2 participants
@johanneswilm
Copy link
Contributor

johanneswilm commented Feb 15, 2019

Issue details

We have an automatic test in which two users are collaborating on a document. User 1 is typing. User 2 waits until the document has a size of at least 34. Then user 2 selects from characters 25 to 30, hits backspace and then undo. User 1 continues typing the entire time. Sometimes this leads to a range error for user 2:

Uncaught RangeError: Position 36 out of range, 55, RangeError: Position 36 out of range

Traceback

It is very difficult to isolate this issue, as I have been unable to trigger it on my own laptop. Only on travis I can see it happening, and only about 25-30% of the times I am running the tests. I have added some debug statements to a version of prosemirror-history to be output in travis to get more of an idea of what is going on.

So far I have found this:

The error seems to be triggered by two things happening at the same time:

  • The selection here in most cases has an anchor of 25 and head 30. This is what one would expect as those correspond to the selection initially set for user 2. However, sometimes, the anchor and head are offset by around 5 - so they are 30 and 36 instead of 25 and 30.

  • Sometimes, the document size much longer than the selection head. If the selection head is 36 and the document size is 37, the range error occurs.

  • I can see that the selection anchor/head are already 30/36 here, before the mapping in the following line.

  • The numbers are fairly consistent - while the length of the document varies, the anchor is always either 25 (as it should be) or 30. And the head is either 30 (as it should be) or 36. Seems like something about the mapping of the selection is not working in case of incoming changes from another source sometimes. Notice that the difference of 5 between the correct anchor and actual one happens to be the length of the piece of content that was deleted.

Steps to reproduce

Very difficult. But one can fork the Fidus Writer repository, make some changes and connect it to travis and make the test run a few times. My prosemirror-history with debugging additions can be found here.

ProseMirror version

Latest

Affected platforms

  • Chrome
  • Firefox
  • Internet Explorer
  • Other
@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 16, 2019

Case 1 (working correctly):

At the time when undo is executed by user 2, user 2 has a selection from anchor 25 to head 25. There are no sendable steps and the document size is 32. The last messages sent and received from the server are all by user 2:

  1. a selection change from 25 to 30.
  2. a replace step from 25 to 30.
  3. a confirmation from the server of the replace step having been accepted.

Inside the history plugin, pop.selection.anchor is 25, pop.selection.head is 30, and pop.transform.doc.nodeSize is 37.

Case 2 (working correctly):

At the time when undo is executed by user 2, user 2 has a selection from anchor 25 to head 25. There are no sendable steps and the document size is 33. The last messages sent and received from the server are a mix of changes from user 1 and 2:

  1. A replace step with slice at 31/31 by user 1.
  2. A selection change from 25 to 30 by user 2.
  3. A replace step from 25 to 30 by user 2.
  4. A confirmation to user 2 from the server of the replace step in 3 having been accepted.
  5. A replace step with slice at 27/27 by user 1.

Inside the history plugin, pop.selection.anchor is 25, pop.selection.head is 30, and pop.transform.doc.nodeSize is 38.

Case 3 (Failure):

At the time when undo is executed by user 2, user 2 has a selection from anchor 25 to head 25. There are no sendable steps and the document size is 32. The last messages sent and received from the server are a mix of changes from user 1 and 2:

  1. A replace step with slice at 30/30 by user 1.
  2. A selection change from 25 to 30 by user 2.
  3. A replace step with slice at 31/31 by user 1.
  4. A replace step from 25 to 30 by user 2.
  5. A confirmation to user 2 from the server of the replace step in 4 having been accepted.

Inside the history plugin, pop.selection.anchor is 30, pop.selection.head is 36, and pop.transform.doc.nodeSize is 37.

@marijnh

This comment has been minimized.

Copy link
Member

marijnh commented Feb 18, 2019

The processes inside of ProseMirror are entirely deterministic. As such, it should be possible to record a trace of what, exactly, happens in what order when this error occurs. If looking at that doesn't show the issue to be on your side, please use it to create a (deterministically failing) test case on top of core ProseMirror and I'll take a look.

@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 18, 2019

Are there timeouts in there? That's the only explanation I can find for why it would fail on travis sometimes and never on my own laptop. Previously I thought it was because travis was running the tests remotely via saucelabs, but for the past half year I have been running tests on travis with headless chrome and the error keeps showing up.

@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 18, 2019

Until yesterday, all our replace steps were accompanied by an addmark step that would cover the inserted content (for tracking changes). Yesterday I changed this to just a single replace step which now includes the mark by replacing the original tr rather than using appendTransaction to add the mark in a second step. This change has done that so far I have not been able to trigger this issue again. I cannot say it's gone for good, but at least for now with the new setup I am not likely able to collect enough data to create test cases, etc. . I will close this issue and reopen if I discover it shows up in production usage.

@marijnh

This comment has been minimized.

Copy link
Member

marijnh commented Feb 18, 2019

Are there timeouts in there?

Not in prosemirror-model, transform, or collab. Since you're testing your own integration with collab, which I guess involves some kind of communication with a server, I'd look for the problem there first.

@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 18, 2019

I spent a few days looking at my own code first before contacting you about this last year. I spent another few days last week.

But in the end it comes down to the prosemirror undo function being triggered leading to a range error within prosemirror code. One part of prosemirror history wants to set the selection outside of where it can go and another part is creating a range error (see numbers in the answer here with the examples). Even with other bugs in other parts of my code, it seems like prosemirror's undo function should not under any circumstances return a doc and a selection that is outside of that doc.

If the error is in my code, it must be my code inputting incorrect values into prosemirror in some way -- calling functions with incorrect variables, etc. . It's not impossible, but at this stage, after looking through my own code for a while, the next place I would look first is in the history plugin and what causes it to determine this strange selection position. And I did look in there. It's just that a lot of that reads like assembler to me and I have a hard time getting my head around what the variables mean. I was just thinking that maybe you have more of an idea of what could be causing mapping issues with the selection in the undo function.

I tried recreating the same steps manually, but the same issue doesn't seem to happen then. I will reopen and try to find more details should I run into this again.

@marijnh

This comment has been minimized.

Copy link
Member

marijnh commented Feb 18, 2019

If the error is in my code, it must be my code inputting incorrect values into prosemirror in some way -- calling functions with incorrect variables, etc.

The way you just recently mentioned setting one transaction's meta property to that of another transaction seems like it might be worth investigating (as you noted, that is not safe).

@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 18, 2019

The way you just recently mentioned setting one transaction's meta property to that of another transaction seems like it might be worth investigating (as you noted, that is not safe).

I know it's not safe which is why I am looking for alternatives. But this issue showed up before that, when I was feeding values into ProseMirror according to official recommendations (as far as I know). Since I have switched to the new solution (which involves setting the meta in this unsafe way) I have not been able to trigger this error again.

@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 18, 2019

Maybe this is a better approach: Where in the history plugin could I add logging statements to see what mappings it's trying to map the selection through? Even if the bug is in my code, I think I'll need to log those in order to find it.

@marijnh

This comment has been minimized.

Copy link
Member

marijnh commented Feb 18, 2019

I'm sorry, the history implementation is pretty complicated, and I can't answer that question in a simple way. But I think the first point of order is to try and isolate and reduce the failure case, because debugging a non-reproduceable issue is going to be very painful. For that, I think you'll have to log a lot of information about the transactions as they are applied in your test, and maybe try to mechanically diff failing and succeeding ones.

@johanneswilm johanneswilm reopened this Feb 18, 2019

@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 19, 2019

I think I have been able to reproduce a simple example: https://github.com/johanneswilm/pm-collab-test

marijnh added a commit to ProseMirror/prosemirror-history that referenced this issue Feb 19, 2019

Fix off-by-one error in history rebasing
FIX: Fix a bug that corrupted selection data in the history when
applying remote steps in some cases.

Issue ProseMirror/prosemirror#892
@marijnh

This comment has been minimized.

Copy link
Member

marijnh commented Feb 19, 2019

Ah, thanks, that made it possible to isolate the problem. Attached patch (released as 1.0.4) should fix it. Can you confirm?

@johanneswilm

This comment has been minimized.

Copy link
Contributor Author

johanneswilm commented Feb 19, 2019

Yep, that fixes it! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.