Skip to content

Conversation

@deependujha
Copy link
Collaborator

@deependujha deependujha commented Jun 14, 2025

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #626

# commands supported for now
litdata --help
litdata cache path
litdata clear cache
Screenshot 2025-06-28 at 5 17 26 PM

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@deependujha
Copy link
Collaborator Author

@robmarkcole can you try this pr, and let me know if it helps in clearing cache. Next I can focus on profiling.

One thing to note is: it can only list/delete default cache and not your specified cache directory for SD.

@codecov
Copy link

codecov bot commented Jun 14, 2025

Codecov Report

Attention: Patch coverage is 94.11765% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83%. Comparing base (36a6dc8) to head (245afbc).
Report is 1 commits behind head on main.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #627   +/-   ##
===================================
  Coverage    83%    83%           
===================================
  Files        43     49    +6     
  Lines      6699   6750   +51     
===================================
+ Hits       5556   5604   +48     
- Misses     1143   1146    +3     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@robmarkcole
Copy link
Contributor

Checked with

⚡ ~ litdata cache path
Default cache directory: /cache/chunks
⚡ ~ ls /cache/chunks
⚡ ~ echo "This is a random string: $(openssl rand -hex 12)" > /cache/chunks/random_$(date +%s).txt
⚡ ~ ls /cache/chunks                                                                              
random_1750235188.txt
⚡ ~ litdata cache clear
Cache directory '/cache/chunks' cleared.

@deependujha deependujha changed the title [WIP]: add cli support to clear default cache feat: add cli support to clear default cache Jun 18, 2025
@deependujha deependujha requested review from Copilot June 18, 2025 11:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds CLI support to manage the cache by introducing commands to show the cache path and clear the default cache.

  • Adds tests for CLI commands (help, cache path, and clear cache) in tests/test_cli.py
  • Implements the CLI commands in src/litdata/cli.py
  • Registers a console script entry point in setup.py

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
tests/test_cli.py New tests for verifying CLI help message and cache commands.
src/litdata/cli.py CLI implementation with subcommands for cache management.
setup.py Entry point added for invoking the CLI.
Comments suppressed due to low confidence (1)

tests/test_cli.py:21

  • Consider using monkeypatch to override get_default_cache_dir to return tmp_path in test_cache_clear_command. This will ensure that the test does not inadvertently remove a production or important cache directory.
def test_cache_clear_command(tmp_path, monkeypatch):

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

This comment was marked as outdated.

Borda
Borda previously requested changes Jun 24, 2025
Copy link
Collaborator

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do CLI let's have consistent experience also with litServe: https://github.com/Lightning-AI/LitServe/blob/main/src/litserve/cli.py

Copy link
Collaborator

@bhimrazy bhimrazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, a CLI might not be necessary here just to clear the cache—unless there are additional plans for it that I'm missing.

@deependujha deependujha requested a review from Copilot June 28, 2025 11:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds command-line interface support for cache management (path display and clearing) and registers a console script entry point.

  • Implements argument parsing and subcommand registration for cache and optimize.
  • Provides clear_cache and show_cache_path actions that call into get_default_cache_dir.
  • Adds CLI tests and configures litdata as a console script in setup.py.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_cli.py Add tests for CLI help, cache path, and cache clear commands
src/litdata/cli/parser.py Set up the main litdata argument parser with a custom LitFormatter
src/litdata/cli/commands.py Register cache and optimize subcommands and bind handlers
src/litdata/cli/actions/cache.py Implement clear_cache and show_cache_path logic
src/litdata/main.py Define app() entry point to dispatch commands
setup.py Add litdata = litdata.__main__:app to console_scripts
Comments suppressed due to low confidence (1)

src/litdata/cli/commands.py:52

  • The new optimize subcommand is not covered by any CLI tests. Consider adding a test that runs litdata optimize --dataset <path> and asserts the expected output.
def register_optimize_subcommand(subparser: _SubParsersAction) -> None:

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@deependujha
Copy link
Collaborator Author

@Borda I've updated the CLI and removed the typer dependency. The structure now follows the same pattern as the litserve CLI.

One difference I noticed: LitServe makes use of lightning_sdk helpers to run certain commands. As far as I know, there aren't similar utility functions in lightning_sdk for litdata yet, but please correct me if I'm wrong.

Other than that, the CLI structure and style are now consistent across both projects.

@Borda
Copy link
Collaborator

Borda commented Jun 30, 2025

lets psl update the PR description with examples :)

@Borda Borda self-requested a review July 1, 2025 09:28
@Borda Borda dismissed their stale review July 1, 2025 09:29

will review it again :)

@Borda Borda enabled auto-merge (squash) July 1, 2025 17:28
@Borda Borda requested a review from bhimrazy July 1, 2025 17:48
@Borda Borda merged commit d11b1c9 into Lightning-AI:main Jul 1, 2025
35 checks passed
@deependujha deependujha deleted the feat/add-cli-support branch July 2, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Command to profile & clear the cache

5 participants