Fixes AB#1261123 (feedback ticket)
Checking metadata for matches was taking (#fragments in item) * (#fragments in item to remove) * (#metadata). For many users, this was extremely slow.
Put fragments into a custom MetadataSet object before checking for a match. This takes it from O(n * r * m) to O(m * (n + r)). Since m is normally small, this is a substantial improvement.
Time on a single operation went from ~180 seconds to ~150 ms for the penultimate version. It also passes @cdmihai's extensive unit tests.*
Is there any reason to keep the error for referencing multiple items? As a user, I would expect not-match-on-metadata to ignore metadata and match-on-metadata to ignore the item.
* Except one on paths. Not sure why that's failing, since it's passing in VS, just not from the command line. Will look into it more.
Also moves construction to the constructor
BenVillalobos left a comment
I can theoretically see a use case for "Match on everything that has no metadata." Maybe we can use a special keyword/symbol here to represent that? Might be best to file that as an issue and see if that gets traction.
ladipro left a comment
Apologies for getting back to this after so many days.
Playing devil's advocate, imagine that instead of the Trie, we use a simple
Using your example from the code comment, the HashSet would contain something like:
so it wouldn't match the Forgind item with the "signature" of
You clearly get the advantage (disadvantage?
The biggest disadvantage in my mind is with reusing strings, since even if we concatenated them all in one step, we'd still make a new string for each item, and if there are several metadata or the metadata are large (noting that these are metadata values, not metadata keys), it could end up consuming a lot of memory. This is a case in which StringTools would struggle because each of the metadata values, once concatenated to others, could be unique, in which case we'd make a large number of unique large strings. Even matching on two metadata, if the values are large, say, "description" and "path" (with long paths enabled), these could be hundreds or even thousands of characters. With a metadata trie, we don't make any new strings. With a single hash set, we would.
Thank you for the detailed analysis!
Actually we do if