Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace use of std String.hashCode() with safer alternative #21

Closed
cowtowncoder opened this issue Jul 6, 2012 · 3 comments
Closed

Replace use of std String.hashCode() with safer alternative #21

cowtowncoder opened this issue Jul 6, 2012 · 3 comments

Comments

@cowtowncoder
Copy link
Member

There have been some attacks (DoS) that make use of collisions in String.hashCode() values (or at least their low-bits collisions).
The one place where this matters most is in handling of symbol table: although Jackson actually does not directly use String.hashCode(), internal calculation is along the same lines.

This should be changed by, for example:

  • Using a non-constant seed value for calculation starting point; this could use system time as base, and needs to be something that varies between different runs
  • Append or prepend String length
  • Perhaps even use different multiplier during different runs? (31 vs 37 vs 39)

Care needs to be taken as this is one of more performance critical paths.

@cowtowncoder
Copy link
Member Author

Ok, some learnings:

  • append/prepend of a per-map seed does not help with std JDK hashCode() (or its variations), since they fundamentally prone to cheap substring-concatenation style generation of collisions.
  • length() won't either, as substring-based strings can have same length as well.

Some remaining practical alternatives include:

  • Use same hash code algo as Perl, which uses shifting; downside is that this is rather slow (twice as slow) -- however, while this is true for stand-alone calculation, it might not be measurable in big picture.
  • Use Adler-32 variation: this would be much faster (in fact, even marginally faster than original hashCode()!); but I need to verify that it can benefit from seed

In both cases it is important to note that per-Map seed value should make it impractical to pre-calculate collisions.

@cowtowncoder
Copy link
Member Author

With some testing, found out that Adler-32 is not (alas!) a good alternative; number of collisions is surprisingly high.

So: with that, changes to make will be:

  • For byte-based variant, add shifting, make less linear; should pretty much fix the problem
  • For char-based variant, add post-shuffling, use better multiplier. Will NOT fix substring problem, just improves non-malevolent cases
  • For both, add checking so that if max-collisions-length exceeds, exception thrown: assumption being that collisions may be still calculatable.

NOTE: this does NOT fix potential issue with ObjectNode; that is covered by another Issue.

@cowtowncoder
Copy link
Member Author

On versions: fixes included in upcoming releases:

  • 1.9.9 for 1.x series (not backported as previous branches are closed)
  • 2.0.5 for 2.0; and 2.1.0 when 2.1 gets released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant