Skip to content

Cache get_dir_size() to avoid slow list rendering performance on /admin/core/snapshot/ #1035

@matthazinski

Description

@matthazinski

I noticed listing snapshots is really slow. I found a few optimizations that seem to increase performance, but wanted to know whether there any undesirable side effects to this:

  • get_dir_size() requires recursing the snapshot directories for every snapshot on a page. It seems archive_size() caches this, but only for the default Django cache TTL (300 seconds). Is there any reason we can't just set CACHES['default']['TIMEOUT'] = None to ensure these keys don't expire by default?
  • Archivebox doesn't expose any options for choosing an external cache, which isn't great when running in ephemeral containers. I've had luck with configuring django_redis in settings.py so the cache can be periodically written to disk. If there's interest, I can put up a PR which optionally enables this. (Alternatively, if there are no blockers to upgrading to Django>=4.0, we can use the built-in Redis cache client.)
  • With a warm cache, Snapshot.from_json() requires a round trip to both the DB and the cache. Calling self.tags_str() with nocache=False seems to cut-down DB latency by about half according to the Django debug toolbar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions