Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mark structs as merged and filter out at the end of bulk operations #542

Closed
wants to merge 3 commits into from

Conversation

NilSet
Copy link
Contributor

@NilSet NilSet commented Jun 16, 2023

instead of splicing out of the array immediately
Fixes #541

* @param {Array<AbstractStruct>} structs
* @param {number} mergedCount
*/
const filterMergedStructs = (structs, mergedCount) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better off constructing a new array using array.filter but im not sure if there will be references to the old array dangling so gotta do this ugly mutation

src/utils/StructStore.js Outdated Show resolved Hide resolved
@NilSet NilSet force-pushed the splice-less branch 2 times, most recently from c3cde54 to 748c5f5 Compare June 21, 2023 02:09
@en-tcelvi
Copy link

en-tcelvi commented Jul 6, 2023

I'll admit, I don't fully understand what is going on in this PR, but would it be possible to do something like this:

**
 * @param {Array<AbstractStruct>} structs
 * @param {number} pos
 */
const tryToMergeWithLeft = (structs, pos) => {
  const left = structs[pos - 1]
  const right = structs[pos]
  if (left.deleted === right.deleted && left.constructor === right.constructor) {
    if (left.mergeWith(right)) {
      if (right instanceof Item && right.parentSub !== null && /** @type {AbstractType<any>} */ (right.parent)._map.get(right.parentSub) === right) {
        /** @type {AbstractType<any>} */ (right.parent)._map.set(right.parentSub, /** @type {Item} */ (left))
      }
      return true;
    }
  }
  return false;
}

const tryToMergeWithLeftProvider = () => {
  /**
   * @type {Array<number[]>}
   */
  const spliceRanges = []
  spliceRanges.push([0])
  return {
    /**
     * @param {Array<AbstractStruct>} structs
     * @param {number} pos
     */
    exec: (structs, pos) => {
      const needsSplice = tryToMergeWithLeft(structs, pos)
      const spliceRange = spliceRanges[spliceRanges.length - 1]
      if(needsSplice) {
        spliceRange[1] = pos
        spliceRange[0]++
      } else if (spliceRange.length === 2) {
        // @ts-ignore
        spliceRange.push([0]);
      }
    },
    /**
     * @param {Array<AbstractStruct>} structs
     */
    flush: (structs) => {
      for (const spliceRange of spliceRanges) {
        if (spliceRange.length === 2) {
          structs.splice(spliceRange[1], spliceRange[0])
        }
      }
    }
  };

Usage:

const tryMergeDeleteSet = (ds, store) => {
  // try to merge deleted / gc'd items
  // merge from right to left for better efficiecy and so we don't miss any merge targets
  ds.clients.forEach((deleteItems, client) => {
    const structs = /** @type {Array<GC|Item>} */ (store.clients.get(client))
    for (let di = deleteItems.length - 1; di >= 0; di--) {
      const deleteItem = deleteItems[di]
      // start with merging the item next to the last deleted item
      const mostRightIndexToCheck = math.min(structs.length - 1, 1 + findIndexSS(structs, deleteItem.clock + deleteItem.len - 1))
      const mergeLeftProvider = tryToMergeWithLeftProvider()
      for (
        let si = mostRightIndexToCheck, struct = structs[si];
        si > 0 && struct.id.clock >= deleteItem.clock;
        struct = structs[--si]
      ) {
        mergeLeftProvider.exec(structs, si)  // TRACK THE CHANGES HERE WITH THE WRAPPED FUNCTIONS
      }
     mergeLeftProvider.flush(structs, si) // APPLY THEM IN BULK HERE AFTER THE LOOP
    }
  })
}

The idea being we track the data in another data structure as opposed to adding values on the struct as we process them? (Then do some operation at the end, which I believe is similar to your solution here - Fully aware that might sample code does not pass tests - just trying contribute to the iteration of the solution somehow)

instead of splicing out of the array immediately
this avoids mutating the struct objects beyond the merge, no additional properties
@NilSet
Copy link
Contributor Author

NilSet commented Jul 6, 2023

Conceptually, while merging structs, a struct can become "wider" in the struct store, taking up multiple array indexes. The binary search accounts for this, and tries to return the left-most index of the wide struct.

We scan the array at the end to shrink the wide structs into a single position in the array.

@en-tcelvi
Copy link

@dmonad Can we bump the priority on this one please?

dmonad added a commit that referenced this pull request Jul 17, 2023
@dmonad
Copy link
Member

dmonad commented Jul 17, 2023

Thanks @NilSet! I spent the week and thought more about this problem.

This PR adds special behavior to how we interact with the data structure that stores the structs. In Yjs, we use a simple Array. In Yrs, we created a special data structure. Eventually, I want to do the same in Yjs. Hence, I feel we shouldn't add additional complexity specific to StructStore using Arrays & binary search. It should be possible to keep the data structure in a valid state without duplicates. A better data structure will eventually avoid splicing altogether.

I distilled the idea to a simpler fix #553 (admittedly covering fewer cases). It also avoids iterating again through the structs to filter out the duplicates. This should fix #541 as well.

@dmonad dmonad closed this Jul 17, 2023
dmonad added a commit that referenced this pull request Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow apply when garbage collecting large update
3 participants