Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove incorrect utf8 conversion of ResultCache keys #16569

Merged
merged 14 commits into from
Jun 12, 2024

Conversation

kgyrtkirk
Copy link
Member

  • remove the incorrect call to read the byte[] as utf8
  • not sure about its origin - but this will push the contract for Cache implementations to handle the byte[] in NamedKey - its already a byte[] - so they shouldn't expect any better than that...
  • use Cache.NamedKey instead of passing byte[] and creating the key at multiple places
  • added some test to ensure that cache implementations are able to accept such keys

Fixes #16552

@kgyrtkirk kgyrtkirk added the Bug label Jun 6, 2024
@kgyrtkirk kgyrtkirk requested a review from gianm June 6, 2024 17:19
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@@ -34,6 +34,8 @@

public class CacheUtil
{
private static final String RESULT_CACHE_NS = "ResultCacheNS";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe make a version of NamedKey for result cache where the nsBytes is already computed so we don't have to call StringUtils.toUtf8 on the constant everytime. Also could probably use a smaller string since the segment level cache uses segment identifiers for the namespace, so unlikely to have any collisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the resultcache namespace to RES

the toUtf8 stuff is a bit more complicated:

  • I wanted to push out the String/byte[] conversion from the NamedKey and change the namespace to use byte[];
    • however: the MapCache directly accesses the namespace field of the NamedKey class
      • since its being used as a key in a Map it must provide a valid equals - this could be addressed with a small refactor around there to use ByteBuffer
    • the namespace also appears as an argument to Cache#close - so this will kinda force that to change as well
      • this could force changes in the segment cache key generation stuff
    • ...so this path leads to refactor(s)
  • I could store the byte[] next to the String by passing that as well in the constructor - but I would rather not do that unless there are measurable benefits of doing so
  • status quo also have some pros: the toByteArray is not used in all cache implementations - so the toUtf8 might not be even called

I think for now it would be best to just leave this alone; not passing the full cachekey as the namespace have kinda reduced the key sizes to around half

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, re complicated stuff i guess i was imagining just subclassing it or something so parts could be constant, but if its complicated then seems fine to skip

This reverts commit 9214137.
This reverts commit 3dfdd10.
This reverts commit c7ba47e.

Revert "Revert "Revert "d-old"""

This reverts commit 1f13b1d.
@clintropolis clintropolis merged commit f8645de into apache:master Jun 12, 2024
88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Result level cache key collisions from utf8 encoding
2 participants