Skip to content

refactor(utils): improve performance of copyEmptyArrayProps function#1816

Merged
danrleyt merged 4 commits into
masterfrom
PANGOLIN-2073-sync-actions-improve-performance
Nov 14, 2022
Merged

refactor(utils): improve performance of copyEmptyArrayProps function#1816
danrleyt merged 4 commits into
masterfrom
PANGOLIN-2073-sync-actions-improve-performance

Conversation

@qmateub
Copy link
Copy Markdown
Contributor

@qmateub qmateub commented Nov 9, 2022

Summary

Refactors a little bit the way copyEmptyArrayProps is implemented in order to seek a better performance specially for cases with a huge amount of items in an array.

Description

Context: We (pangolins) do a heavy use of this library in Audit Log. We were observing some customer comparisons that were taking a huge amount of time and when debugging that led us to spotting sync actions library as the one that was causing the delay.

After some investigation, we found that the library was not really performant for cases when customers with a huge amount of addresses where compared between versions. In our case it was a customer with almost 13k addresses in the old and new version.

In this PR we propose a solution by changing the implementation to remove the usage of spread operators inside a reduce in favour of attribute assignments or even the usage of a map instead of array methods which usually led to the O(n^2) problem More info

Todo

  • Tests
    • Unit
    • Integration
    • Acceptance
  • Documentation
  • Type label for the PR

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Nov 9, 2022

🦋 Changeset detected

Latest commit: 85daa2e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@commercetools/sync-actions Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

const merged = {
...newObj,
...nextObject,
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combined object was not needed as it is exactly what the accumulator does in the reduce.

const hashMapValue = value.reduce((acc, val) => {
acc[val.id] = val
return acc
}, {})
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small trick to avoid the usage of find in a situation of array of array (of arrays). The difference between both approaches was remarkable when debugging this locally.

With hashmap it was less than 100 ms
With .find around 6 seconds

Copy link
Copy Markdown

@angel-marin-ct angel-marin-ct Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a pattern I will name this trick a "dictionary"... you could even do this a little trickier

 const valueDict = Object.assign({}, ...value.map( x=>( { [x.id]: x}) ) )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw the object assign method is a bit slower, probably because you are mapping through all of the values three times (once for the map, once for the spread, once for the assignment).

If it's a large array (100k elements) the reduce approach takes ~2ms whereas the object assign approach takes ~50ms.

// Create an array for testing.
const testArray = new Array(100_000).fill(null).map((_, index) => ({ id: index }));

// Store individual iteration times.
const reduceTimes = [];
const objectAssignTimes = [];

const testIterations = 1_000;
for (let i = 0; i < testIterations; i++) {
  // Run reduce method.
  const startTimeReduce = Date.now();
  testArray.reduce((acc, element) => {
    acc[element.id] = element;
    return acc;
  }, {});
  reduceTimes.push(Date.now() - startTimeReduce);

  // Run object assign method.
  const startTimeObjectAssign = Date.now();
  Object.assign({}, ...testArray.map((element) => ({ [element.id]: element })));
  objectAssignTimes.push(Date.now() - startTimeObjectAssign);
}

// Log results.
console.log({ totalTimeReduce: reduceTimes.reduce((acc, val) => acc + val) / testIterations });
console.log({
  totalTimeObjectAssign: objectAssignTimes.reduce((acc, val) => acc + val) / testIterations,
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... I just said trickier :-)

Thanks for the testing!

Another interesting approach will be if Array.prototype.group proposal gets approved. It won't return exactly a dictionary because the values will be an array...

But it will read nicely:

array.group( x=>x.id ) 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both for the input and the testing around this. It was really a nice learning experience and I think this improves the library for those cases that we see big numbers.

Just for the record, the case that we were checking was a customer from cimpress that had almost 13k addresses defined in it and the size of the entity was almost 9MB 😨 💥

)
/* eslint-disable no-param-reassign */
newObj[key][i] = nestedObject
merged[key][i] = nestedObject
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that we can just reassign this to the accumulator.

[key]: isNil(newObj[key]) ? [] : newObj[key],
}
merged[key] = isNil(newObj[key]) ? [] : newObj[key]
return merged
Copy link
Copy Markdown
Contributor Author

@qmateub qmateub Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using the approach of spread + assignment, we assign first and then return the merged object to avoid the spread operator inside the .reduce

[key]: nestedObject,
}
merged[key] = nestedObject
return merged
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here 👍

expect(old).toEqual(oldObj)
expect(fixedNewObj).toEqual(newObj)
expect(end - start).toBeLessThan(100)
})
Copy link
Copy Markdown
Contributor Author

@qmateub qmateub Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick test to verify this. It is using performance from node in order to verify the function.

You can run this test in main to validate that there were performance issues with this scenario 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or checkout 97f3638 and run it there)

@qmateub qmateub requested a review from ajimae November 9, 2022 15:00
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 9, 2022

Codecov Report

Merging #1816 (85daa2e) into master (b07dbae) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1816   +/-   ##
=======================================
  Coverage   94.64%   94.64%           
=======================================
  Files         141      141           
  Lines        4875     4877    +2     
  Branches     1332     1332           
=======================================
+ Hits         4614     4616    +2     
  Misses        258      258           
  Partials        3        3           
Impacted Files Coverage Δ
...s/sync-actions/src/utils/copy-empty-array-props.js 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@qmateub qmateub requested review from a team and removed request for emmenko and tdeekens November 9, 2022 15:26
@qmateub qmateub force-pushed the PANGOLIN-2073-sync-actions-improve-performance branch from f33aa97 to 582fbb5 Compare November 10, 2022 08:33
Copy link
Copy Markdown
Contributor

@danrleyt danrleyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the brilliant contribution, that will also help us immensely.

@danrleyt danrleyt merged commit cad54c4 into master Nov 14, 2022
@danrleyt danrleyt deleted the PANGOLIN-2073-sync-actions-improve-performance branch November 14, 2022 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants