Skip to content

fix(cache-downloader): Cleanup stale toolkit versions in toolkit cache downloader#72

Merged
m1so merged 4 commits intomainfrom
michal/blu-5747-cleanup-stale-toolkit-versions-in-toolkit-cache-downloader
Mar 5, 2026
Merged

fix(cache-downloader): Cleanup stale toolkit versions in toolkit cache downloader#72
m1so merged 4 commits intomainfrom
michal/blu-5747-cleanup-stale-toolkit-versions-in-toolkit-cache-downloader

Conversation

@mfranczel
Copy link
Contributor

@mfranczel mfranczel commented Mar 5, 2026

Introduces a new function cleanup_old_versions that removes outdated toolkit versions, retaining only the specified number of recent versions. This function is called in the main workflow to ensure that old versions do not accumulate, facilitating a cleaner download process for the current release.

Summary by CodeRabbit

  • New Features

    • Automatic cleanup of outdated version directories before downloads, retaining recent releases and protecting the current release.
  • Bug Fixes

    • Clear exit when the release identifier is missing; improved cleanup and download logging and resilience so failures are surfaced without crashing.
  • Tests

    • Added comprehensive unit tests for cleanup behavior, edge cases, exclusions, nested contents, and partial-failure scenarios.

…ld toolkit versions

Introduces a new function `cleanup_old_versions` that removes outdated toolkit versions, retaining only the specified number of recent versions. This function is called in the main workflow to ensure that old versions do not accumulate, facilitating a cleaner download process for the current release.
@linear
Copy link

linear bot commented Mar 5, 2026

@mfranczel mfranczel self-assigned this Mar 5, 2026
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

📦 Python package built successfully!

  • Version: 2.1.2.dev12+eb1f98c
  • Wheel: deepnote_toolkit-2.1.2.dev12+eb1f98c-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.1.2.dev12%2Beb1f98c/deepnote_toolkit-2.1.2.dev12%2Beb1f98c-py3-none-any.whl"

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.71%. Comparing base (f0c341f) to head (97692b2).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #72      +/-   ##
==========================================
+ Coverage   73.50%   73.71%   +0.21%     
==========================================
  Files          93       93              
  Lines        5253     5261       +8     
  Branches      764      764              
==========================================
+ Hits         3861     3878      +17     
+ Misses       1147     1138       -9     
  Partials      245      245              
Flag Coverage Δ
combined 73.71% <ø> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Adds a new function cleanup_old_versions(base_path: str, current_release_name: str, versions_to_keep: int = 2) to the cache-downloader script. The function lists subdirectories under base_path, excludes the current release, sorts directories by modification time (newest first), and removes older ones beyond the keep count using shutil.rmtree with error handling and logging. main() now validates RELEASE_NAME (exits non-zero if missing) and calls cleanup_old_versions before starting downloads. New unit tests exercise multiple edge cases including non-existent base path, mtime ordering, current-release exclusion, nested contents, and rmtree failures.

Sequence Diagram(s)

sequenceDiagram
    participant Main as "cache-downloader:main"
    participant Cleanup as "cleanup_old_versions"
    participant FS as "Filesystem (os/scandir/stat)"
    participant Shutil as "shutil.rmtree"
    participant Downloader as "Downloader logic"

    Main->>Cleanup: call cleanup_old_versions(base_path, release_name)
    Cleanup->>FS: list entries in base_path
    FS-->>Cleanup: return entries (files & dirs with mtimes)
    Cleanup->>Cleanup: filter directories, exclude current_release_name
    Cleanup->>Cleanup: sort by mtime (newest first) and select old ones
    loop for each old_dir
        Cleanup->>Shutil: rmtree(old_dir)
        alt success
            Shutil-->>Cleanup: removed
        else failure
            Shutil-->>Cleanup: raise OSError (caught & logged)
        end
    end
    Cleanup-->>Main: return
    Main->>Downloader: proceed with downloads
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Updates Docs ⚠️ Warning The PR introduces a new cleanup_old_versions feature for cache-downloader without updating user-facing documentation or README files. Add README.md to dockerfiles/cache-downloader/ documenting the cleanup functionality and parameters, or update main project README.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: adding cleanup functionality for stale toolkit versions in the cache downloader.
Docstring Coverage ✅ Passed Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dockerfiles/cache-downloader/cache-downloader.py`:
- Around line 110-117: The code uses versions_to_keep in a slice without
validation, so negative values cause unexpected deletion; before the existing
check (the line with "if len(version_dirs) <= versions_to_keep:"), validate and
normalize versions_to_keep (e.g., ensure it is an integer and clamp to a
non-negative value: versions_to_keep = max(0, int(versions_to_keep))) or raise a
clear ValueError for invalid input; reference the variables and loop in this
block (version_dirs, versions_to_keep, the sort lambda, and the for entry in
version_dirs[versions_to_keep:]) so the fix is applied immediately before the
slicing that removes old versions.
- Around line 86-123: Update the signature of cleanup_old_versions to include an
explicit return type (-> None) and switch filesystem ops to pathlib.Path APIs:
treat base_path as a Path (or wrap Path(base_path)), use Path.exists()/is_dir()
instead of os.path.isdir, iterate with Path.iterdir() and filter .is_dir() and
.name != current_release_name, and use .stat().st_mtime for sorting; keep the
existing behavior for skipping current_release_name, limiting with
versions_to_keep, logging, and shutil.rmtree removal handling in the same
places.
- Line 137: Add a check before calling cleanup_old_versions in the main function
or wherever cleanup_old_versions(BASE_PATH, release_name) is called to verify if
release_name is not None. If release_name is None (indicating RELEASE_NAME is
unset), exit early or raise an error instead of proceeding with the cleanup to
avoid deleting unintended version directories.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9b6cd832-7617-4552-9a1d-26ccfa86e843

📥 Commits

Reviewing files that changed from the base of the PR and between 9fbeb25 and a8f6042.

📒 Files selected for processing (2)
  • dockerfiles/cache-downloader/cache-downloader.py
  • tests/unit/test_cache_downloader.py

@deepnote-bot
Copy link

deepnote-bot commented Mar 5, 2026

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-72
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-03-05 14:30:25 (UTC)
📜 Deployed commit 2e5eae46282cce767d1e33ffb365ad6fd950e5a6
🛠️ Toolkit version eb1f98c

…E_NAME

Enhances the `cleanup_old_versions` function by adding a check to ensure that the `versions_to_keep` parameter is non-negative, raising a ValueError if it is not. Additionally, the main function now verifies that the `RELEASE_NAME` environment variable is set, exiting with an error message if it is missing.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/test_cache_downloader.py`:
- Around line 9-21: Replace os.path-based path construction in the test module
spec creation with pathlib.Path: build the path to
"dockerfiles/cache-downloader/cache-downloader.py" using Path(__file__).parent /
".." / ".." / "dockerfiles" / "cache-downloader" / "cache-downloader.py" (or
chained parents) when calling importlib.util.spec_from_file_location for _spec,
and then remove unused import os; keep the rest (_spec,
importlib.util.module_from_spec, _spec.loader.exec_module(_mod)) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 87a455a4-3a6d-4192-9361-dc4673b00ce2

📥 Commits

Reviewing files that changed from the base of the PR and between a8f6042 and 97692b2.

📒 Files selected for processing (2)
  • dockerfiles/cache-downloader/cache-downloader.py
  • tests/unit/test_cache_downloader.py

…h pathlib

Refactors the `cleanup_old_versions` function to utilize `pathlib.Path` for file path operations, enhancing readability and consistency. The function now checks if the base path is a directory using `Path.is_dir()` and iterates through version directories with `Path.iterdir()`. This change simplifies the code and adheres to project standards for file handling.
@mfranczel mfranczel marked this pull request as ready for review March 5, 2026 14:10
@mfranczel mfranczel requested a review from a team as a code owner March 5, 2026 14:10
@mfranczel mfranczel requested a review from m1so March 5, 2026 14:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dockerfiles/cache-downloader/cache-downloader.py`:
- Around line 147-150: The code currently calls cleanup_old_versions(BASE_PATH,
release_name) before verifying the TOOLKIT_INDEX_BUCKET_NAME env var; add a
guard that checks toolkit_index_bucket_name (the value from
os.getenv("TOOLKIT_INDEX_BUCKET_NAME")) and aborts (raise/exit/log error) if it
is empty/None before performing any destructive action, and only call
cleanup_old_versions(...) and submit_downloading(python_versions, release_name,
toolkit_index_bucket_name) after the env var is validated; reference the
toolkit_index_bucket_name variable, cleanup_old_versions, submit_downloading,
BASE_PATH, and release_name when implementing the check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4058afc5-b18e-4c85-b01b-068baef50154

📥 Commits

Reviewing files that changed from the base of the PR and between 97692b2 and a1088b1.

📒 Files selected for processing (1)
  • dockerfiles/cache-downloader/cache-downloader.py

@m1so m1so merged commit 2df9d55 into main Mar 5, 2026
30 checks passed
@m1so m1so deleted the michal/blu-5747-cleanup-stale-toolkit-versions-in-toolkit-cache-downloader branch March 5, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants