Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant inefficiencies in stitchIntoRecord #12

Closed
far-blue opened this issue Aug 21, 2019 · 4 comments
Closed

Significant inefficiencies in stitchIntoRecord #12

far-blue opened this issue Aug 21, 2019 · 4 comments

Comments

@far-blue
Copy link
Contributor

Hi all,

I'm testing pushing real-world levels of data through a microservice I've written using Atlas and I've noticed some significant inefficiencies in the various stitchIntoRecord methods.

For instance, the OneToOne class method in combination with RegularRelationship's stitchIntoMethods() method basically loops through every nativeRecord in turn and then loops through every foreignRecord to match them up. However, it neither exits early when a match is found nor removed matched foreignRecords. So with 1000 nativeRecords and their associated 1000 foreignRecords you end up with 1,000,000 calls to recordsMatch().

I'd like to suggest that:

  • The stitchIntoRecord abstract be adjusted to accept the foreignRecords array by reference
  • The OneToOne method both removes a matched foreignRecord and also breaks the loop early
  • The ManyToOne method breaks early but doesn't remove the foreignRecord from the array
  • The OneToMany method removes the matched foreignRecord from the array but doesn't break the loop early.

I don't think there are further optimisations available for the other relationship types.

In the case of OneToOne the result is a change from O(n^2) to O(2n). In testing with 3000 records in my app this changed execution time from over 2 mins to 5 seconds.

Do these changes sound correct and reasonable or have I missed something? If there's no reason against the changes I'm happy to put together a PR.

One point with these changes will be that in the case of duplicate records the behaviour will change. If there are duplicates foreignRecord entries, where currently the last record that matches in the foreignRecords array will always be assigned with my changes the first will be assigned instead for OneToOne and ManyToOne. If there are duplicate nativeRecord entries, where currently all duplicates will receive the same foreignRecord they will now receive different ones and possibly no match will be found for duplicates in the case of OneToOne and OneToMany as the foreignRecord entries are removed as they are consumed.

Thoughts from anyone?

@pmjones
Copy link
Contributor

pmjones commented Aug 21, 2019

In the case of OneToOne the result is a change from O(n^2) to O(2n). In testing with 3000 records in my app this changed execution time from over 2 mins to 5 seconds.

You had me at O-notation.

If there's no reason against the changes I'm happy to put together a PR.

Please do!

@froschdesign
Copy link

My suggestion is to use benchmarking and define some scenarios for this case. PhpBench can help here.

@far-blue
Copy link
Contributor Author

The most significant improvements with my suggested changes are with 1-2-1 relationships where my own testing has shown massive improvements but I think a more general approach using common uses of Atlas and profiling for bottlenecks would be great :)

@pmjones
Copy link
Contributor

pmjones commented Sep 1, 2019

Fixed by #13

@pmjones pmjones closed this as completed Sep 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants