Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix (runtime): arraysDiff() with duplicates #263

Merged
merged 4 commits into from
Jun 10, 2024

Conversation

angelsolaorbaiceta
Copy link
Owner

@angelsolaorbaiceta angelsolaorbaiceta commented Jun 9, 2024

arraysDiff() handles duplicate entries

Closes #260 .

As noted by @terrence-ou , the current implementation of the arraysDiff() function doesn't deal with duplicate values.
For example:

const oldArray = []
const newArray = [1, 1]

arraysDiff(oldArray, newArray)

Would incorrectly return:

{
  added: [1],
  removed: [],
}

When the correct solution would have been that two 1s were added:

{
  added: [1, 1],
  removed: [],
}

This error didn't surface earlier because, the only use of the arraysDiff() function is to find the CSS classes that were added/removed from an element.
The browser deduplicates the CSS classes added to an element, and thus, this error has no visible effect in the inner working of the framework.
It's nevertheless a very interesting problem.

The Problem

The problem is in how the added and removed items are searched for:

  • added: newArray.filter((newItem) => !oldArray.includes(newItem))
  • removed: oldArray.filter((oldItem) => !newArray.includes(oldItem))

That includes() usage stops at the first match, thus missing duplicates.

##聽Solution

Solving this problem requires to count how many times a given item appears in each array.
A count map is created for each array for this purpose:

export function arraysDiff(oldArray, newArray) {
  const oldsCount = makeCountMap(oldArray)
  const newsCount = makeCountMap(newArray)

   ...
}

The makeCountMap() function is straightforward to implement:

export function makeCountMap(array) {
  const map = new Map()

  for (const item of array) {
    map.set(item, (map.get(item) || 0) + 1)
  }

  return map
}

Then, we diff the keys of both maps:

const diff = mapsDiff(oldsCount, newsCount)

Note

Note that we need to use maps and not objects, because objects don't allow types other than string and Symbol to be the keys.
Maps don't impose a type to be used as keys.

The mapsDiff() function implementation is equivalent to the objectsDiff() one, but this time, the keys can be anything and not just strings or Symbols:

export function mapsDiff(oldMap, newMap) {
  const oldKeys = Array.from(oldMap.keys())
  const newKeys = Array.from(newMap.keys())

  return {
    added: newKeys.filter((key) => !oldMap.has(key)),
    removed: oldKeys.filter((key) => !newMap.has(key)),
    updated: newKeys.filter(
      (key) => oldMap.has(key) && oldMap.get(key) !== newMap.get(key)
    ),
  }
}

Then, in the arraysDiff() function, we have not only to see what items were added or removed, but count how many times they were added or removed:

export function arraysDiff(oldArray, newArray) {
  const oldsCount = makeCountMap(oldArray)
  const newsCount = makeCountMap(newArray)
  const diff = mapsDiff(oldsCount, newsCount)

  // Added items repeated as many times as they appear in the new array
  const added = diff.added.flatMap((key) =>
    Array(newsCount.get(key)).fill(key)
  )

  // Removed items repeated as many times as they appeared in the old array
  const removed = diff.removed.flatMap((key) =>
    Array(oldsCount.get(key)).fill(key)
  )

  ...
}

And lastly, for the keys that were updated (items that appear in both arrays, but in a different number) we want to calculate the difference between how many times it appears in the new versus the old array.
Based on the result, we include it as addition or removal:

  // Updated items have to check the difference in counts
  for (const key of diff.updated) {
    const oldCount = oldsCount.get(key)
    const newCount = newsCount.get(key)
    const diff = newCount - oldCount

    if (diff > 0) {
      added.push(...Array(diff).fill(key))
    } else {
      removed.push(...Array(-diff).fill(key))
    }
  }

And hence the complete implementation is:

export function arraysDiff(oldArray, newArray) {
  const oldsCount = makeCountMap(oldArray)
  const newsCount = makeCountMap(newArray)
  const diff = mapsDiff(oldsCount, newsCount)

  // Added items repeated as many times as they appear in the new array
  const added = diff.added.flatMap((key) =>
    Array(newsCount.get(key)).fill(key)
  )

  // Removed items repeated as many times as they appeared in the old array
  const removed = diff.removed.flatMap((key) =>
    Array(oldsCount.get(key)).fill(key)
  )

  // Updated items have to check the difference in counts
  for (const key of diff.updated) {
    const oldCount = oldsCount.get(key)
    const newCount = newsCount.get(key)
    const delta = newCount - oldCount

    if (delta > 0) {
      added.push(...Array(delta).fill(key))
    } else {
      removed.push(...Array(-delta).fill(key))
    }
  }

  return {
    added,
    removed,
  }
}

Thanks @terrence-ou for reporting this error! 馃

@terrence-ou
Copy link
Contributor

Hi @angelsolaorbaiceta

Thanks for the detailed explanation! Your code is an excellent example of practicing SOLID principles. I always learn a lot from your implementations and explanations.

This update perfectly solves the duplication problems. Since the sequence of items in the output arrays is not essential, I updated the testing method to only consider the count of items being added or removed:

  it("array items being removed", () => {
    const oldArr = [1, 1, 2, 2, 1, 4, 3, 4, 5, 1, 1, 5, 7, 8, 9];
    const newArr = [1, 2, 3, 4];
    const diff = arraysDiff(oldArr, newArr);
    expect(makeCountMap(diff.added)).toEqual(new Map([]));
    expect(makeCountMap(diff.removed)).toEqual(
      new Map([
        [1, 4],
        [2, 1],
        [4, 1],
        [5, 2],
        [7, 1],
        [8, 1],
        [9, 1],
      ]),
    );
  });

I have one additional thought regarding the for loop in the arraysDiff(oldArray, newArray) function:

for (const key of diff.updated) {
  const oldCount = oldsCount.get(key)
  const newCount = newsCount.get(key)
  const diff = newCount - oldCount

  if (diff > 0) {
    added.push(...Array(diff).fill(key))
  } else {
    removed.push(...Array(-diff).fill(key))
  }
}

The variable name diff is similar to the object diff we鈥檙e looping over. While it won鈥檛 break the program or tests, it could potentially cause some confusion. Do you think we could use a name like diffCount to avoid such confusion?

@angelsolaorbaiceta
Copy link
Owner Author

Hi @angelsolaorbaiceta

Thanks for the detailed explanation! Your code is an excellent example of practicing SOLID principles. I always learn a lot from your implementations and explanations.

This update perfectly solves the duplication problems. Since the sequence of items in the output arrays is not essential, I updated the testing method to only consider the count of items being added or removed:

  it("array items being removed", () => {
    const oldArr = [1, 1, 2, 2, 1, 4, 3, 4, 5, 1, 1, 5, 7, 8, 9];
    const newArr = [1, 2, 3, 4];
    const diff = arraysDiff(oldArr, newArr);
    expect(makeCountMap(diff.added)).toEqual(new Map([]));
    expect(makeCountMap(diff.removed)).toEqual(
      new Map([
        [1, 4],
        [2, 1],
        [4, 1],
        [5, 2],
        [7, 1],
        [8, 1],
        [9, 1],
      ]),
    );
  });

I have one additional thought regarding the for loop in the arraysDiff(oldArray, newArray) function:

for (const key of diff.updated) {
  const oldCount = oldsCount.get(key)
  const newCount = newsCount.get(key)
  const diff = newCount - oldCount

  if (diff > 0) {
    added.push(...Array(diff).fill(key))
  } else {
    removed.push(...Array(-diff).fill(key))
  }
}

The variable name diff is similar to the object diff we鈥檙e looping over. While it won鈥檛 break the program or tests, it could potentially cause some confusion. Do you think we could use a name like diffCount to avoid such confusion?

You are very right! I'll rename that variable to make it clearer.
Thanks for the feedback.

@angelsolaorbaiceta angelsolaorbaiceta merged commit c58d78c into main Jun 10, 2024
5 checks passed
@angelsolaorbaiceta angelsolaorbaiceta deleted the fix/arrays-diff-with-dups branch June 10, 2024 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The arraysDiff function might be failed in some special cases.
2 participants