Skip to content

Cache keys use randomized Python hash(), breaking cache sharing between workers #3450

@MaxGhenis

Description

@MaxGhenis

Summary

make_cache_key uses Python's built-in hash() on a string, which is randomized per interpreter via PYTHONHASHSEED. Each Gunicorn worker computes different cache keys for the same request, so the Redis hit rate is near zero. Hash collisions can also surface a wrong cached response to a client.

Location

policyengine_api/utils/cache_utils.py:25

What goes wrong

cache_key = str(hash(flask.request.full_path + data))
  • hash("abc") returns a different integer in every Python process (since 3.3, string hashing is randomized unless PYTHONHASHSEED=0). Multiple Gunicorn workers never share cache keys, so cache coverage is effectively per-worker.
  • hash returns a signed 64-bit integer; collisions are far more likely than with a cryptographic digest. On a collision, the cache will happily return stored bytes that belong to a different request — a correctness bug for any endpoint using this key.
  • The key is also unstable across deploys, invalidating the cache on every restart.

Suggested fix

Use a stable, collision-resistant digest:

import hashlib

payload = (flask.request.full_path + data).encode("utf-8")
cache_key = hashlib.sha256(payload).hexdigest()

(Include request method too, to avoid collapsing GET/POST with identical bodies.)

Severity

Medium.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions