-
Notifications
You must be signed in to change notification settings - Fork 57
refactor: replace sqlite with rocksdb #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @ChenZiHong-Gavin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the project's caching infrastructure by replacing the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the caching mechanism by replacing diskcache (which uses SQLite) with rocksdict (a wrapper for RocksDB). This involves adding a new RocksDBCache class, updating ParallelFileScanner to use this new cache, and adjusting dependencies in requirements.txt. The changes are logical and well-implemented. I've provided a couple of suggestions to enhance the new RocksDBCache implementation for better resource management and to improve code consistency in ParallelFileScanner.
| def close(self): | ||
| if hasattr(self, "db") and self.db is not None: | ||
| self.db.close() | ||
| self.db = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better resource management and to make RocksDBCache more robust and reusable, consider implementing the context manager protocol (__enter__ and __exit__). This ensures that the database connection is always closed, even if errors occur.
This would allow using RocksDBCache directly with a with statement, which is idiomatic Python for managing resources.
| self.db = None | |
| def __enter__(self): | |
| return self | |
| def __exit__(self, exc_type, exc_val, exc_tb): | |
| self.close() | |
| self, cache_dir: str, allowed_suffix, rescan: bool = False, max_workers: int = 4 | ||
| ): | ||
| self.cache = Cache(cache_dir) | ||
| self.cache = RocksDBCache(os.path.join(cache_dir, "file_paths_cache")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pathlib is already used in this file, it would be more consistent to use it for path manipulation here as well, instead of os.path.join. This improves readability and maintains a consistent style.
To make this even cleaner, you could consider updating RocksDBCache.__init__ to accept pathlib.Path objects directly, which would remove the need for str() conversion here.
| self.cache = RocksDBCache(os.path.join(cache_dir, "file_paths_cache")) | |
| self.cache = RocksDBCache(str(Path(cache_dir) / "file_paths_cache")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully refactors the caching layer from diskcache to a RocksDB-based solution using rocksdict. The new RocksDBCache class is a clean abstraction. My review includes suggestions to enhance its robustness by implementing the context manager protocol for safer resource management and improving the API for more flexible usage. These changes will make the new cache component more reliable and easier to use in other parts of the application.
| @@ -0,0 +1,33 @@ | |||
| from pathlib import Path | |||
| from typing import Any, Iterator, Optional | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
|
|
||
| class RocksDBCache: | ||
| def __init__(self, cache_dir: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For improved flexibility and to align with modern Python practices using pathlib, it's better to accept both string paths and Path objects for cache_dir. Path(cache_dir) already handles both types correctly.
| def __init__(self, cache_dir: str): | |
| def __init__(self, cache_dir: Union[str, Path]): |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* refactor: replace sqlite with rocksdb * Potential fix for pull request finding 'Empty except' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * Update graphgen/models/storage/rocksdb_cache.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This pull request refactors the project's caching infrastructure by replacing
sqlitelibrary withrocksdb, assqlitemay not work well on NFS(network file systems).refer: