-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlatMerge performance issue #287
Comments
You call I can have a look and maybe change something for version 7. Without knowing the code, I think either give an option to not call this or rather do some equality check based on reflection - should still be much faster than json serialization. |
I call the
This merge request seems to try to address this issue, sadly it is a bit bloated and looks stale. Question really is what constitutes equality for a component. Personally I would not go down the reflection road, as it also does not win any performance prices. Since the model is static, it seems like a good job for a code generator. |
I will have a look at the pull request. Generally, I obviously want a solution that is quickly implemented but requires little maintenance in future changes to the model. But I mark this as help wanted, maybe somebody wants to make a code generator as it is a rather "cool" thing to build. |
Isn't the problem here much more the way in which the merge is performed? cyclonedx-dotnet-library/src/CycloneDX.Utils/Merge.cs Lines 144 to 147 in 1e3886b
Let's focus only on the components. This implies that we first merge components1 with components2 then with components3 and so on. Each of these merges is performed using ListMergeHelper which merges 2 lists. If we merge two lists with N_1 and N_2 entries, respectively, which are all distinct, because of cyclonedx-dotnet-library/src/CycloneDX.Utils/Merge.cs Lines 35 to 37 in 1e3886b
we compute the serialization for each entry in the list1 N_2 times, and for the first entry in list2 N_1 times, for the next N_1+1 times etc. leading to N_1*N_2 + N_1 + (N_1+1) + ... +(N_1+N_2-1) = 2*N_1*N_2 + N_2*(N_2+1)/2 serialization calls.The important bit is that it leads to a large number of serialization calls (quadratic complexity in terms of serialization calls). And you are doing this repeately for all lists to be merged. You can improve it in two steps: Step 1: Step 2: |
Based on this SBOM https://github.com/CycloneDX/bom-examples/blob/0979663521c4623792dc432d09f88bcb85862a62/SBOM/juice-shop/v11.1.2/bom.json
it reduces the time for the merge on my computer from roughly 35s to 1s. |
Thanks for spending time on this and sorry for the late response, only managed to check on it now. For my current test case the improvement is roughly 7.5x (2709ms -> 360ms). So it looks like this was already significant step in the right direction. 👍 Quick profiler run on your PR showed that 3/4 of the time is still spend on the JSON serializer, so I guess for the final solution there is no way around equality without the JSON detour. Looks like you are already part of the merging discussion CycloneDX/specification#320, though I could not quite grasp if the question what constitutes equality between components was really answered yet. |
Looks like a good example for the 20:80 rule, here. If Andreas fixes the warnings, I think we can merge it and build a patch version.
Completely on board with this. It's a terrible solution anyway and sometimes causes problems when testing too. |
Yes, the outcome of this discussion is still open. In general, dependening on the use case there might be different notions of equivalent components. The current hash based implementation can only handle "exact" equality. (However, as a starting point this is fine IMO.) In addition, I'm not surprised that the serialization is still the dominating part. I guess step 2 could give another speedup in the same order of magnitude. |
edit: improve_merge_performance_2: |
@jwfx Thanks, surprises me a bit that the speed-up based on name equality is less (as the serialization should only be triggered in case of name collisions => i.e. I would expect that the JSON serialization shows up much less in the profiling). Do your SBOMs only have large number of components, or are there other large list of entities? |
55, 51, 87, 57, 82, 14, 40, 96, 60, 34, 66, 126, 140, 1, 47 These are the component counts in the 15 SBOMs I'm merging. To make the consumed time a bit more significant I merge that list 5 times with each other. Other than components and dependencies there is pretty much nothing in there. |
Another test with production data: 16 SBOMs with 379, 215, 230, 218, 255, 214, 214, 222, 216, 253, 226, 218, 426, 252, 270, 50 components merged into 528 unique components.
#300 looks like a good candidate for merging. |
As of
6.0.0
it looks like JSON serialization is used to establish equality for components. This seems to greatly hurt merging performance. In my case about 70% of time is spent on the serializer.cyclonedx-dotnet-library/src/CycloneDX.Core/Models/Component.cs
Lines 218 to 221 in 1e3886b
The text was updated successfully, but these errors were encountered: