Skip to content

Conversation

@ChenZiHong-Gavin
Copy link
Collaborator

@ChenZiHong-Gavin ChenZiHong-Gavin commented Dec 3, 2025

This pull request refactors the project's caching infrastructure by replacing sqlite library with rocksdb , as sqlite may not work well on NFS(network file systems).

refer:

  1. https://stackoverflow.com/questions/9907429/locking-sqlite-file-on-nfs-filesystem-possible

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ChenZiHong-Gavin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the project's caching infrastructure by replacing the diskcache library with a new RocksDBCache implementation powered by rocksdict. This change primarily affects the ParallelFileScanner, which now utilizes RocksDB for its file path caching, aiming to enhance the performance and reliability of data storage within the application.

Highlights

  • Storage Backend Migration: The caching mechanism has been migrated from using the diskcache library to RocksDB for persistent key-value storage.
  • New RocksDBCache Implementation: A new RocksDBCache class has been introduced, providing an interface to rocksdict for efficient key-value operations like get, set, and delete.
  • Dependency Update: The project dependencies have been updated to remove diskcache and include rocksdict, reflecting the change in the underlying storage technology.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
@ChenZiHong-Gavin
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the caching mechanism by replacing diskcache (which uses SQLite) with rocksdict (a wrapper for RocksDB). This involves adding a new RocksDBCache class, updating ParallelFileScanner to use this new cache, and adjusting dependencies in requirements.txt. The changes are logical and well-implemented. I've provided a couple of suggestions to enhance the new RocksDBCache implementation for better resource management and to improve code consistency in ParallelFileScanner.

def close(self):
if hasattr(self, "db") and self.db is not None:
self.db.close()
self.db = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better resource management and to make RocksDBCache more robust and reusable, consider implementing the context manager protocol (__enter__ and __exit__). This ensures that the database connection is always closed, even if errors occur.

This would allow using RocksDBCache directly with a with statement, which is idiomatic Python for managing resources.

Suggested change
self.db = None
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()

self, cache_dir: str, allowed_suffix, rescan: bool = False, max_workers: int = 4
):
self.cache = Cache(cache_dir)
self.cache = RocksDBCache(os.path.join(cache_dir, "file_paths_cache"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since pathlib is already used in this file, it would be more consistent to use it for path manipulation here as well, instead of os.path.join. This improves readability and maintains a consistent style.

To make this even cleaner, you could consider updating RocksDBCache.__init__ to accept pathlib.Path objects directly, which would remove the need for str() conversion here.

Suggested change
self.cache = RocksDBCache(os.path.join(cache_dir, "file_paths_cache"))
self.cache = RocksDBCache(str(Path(cache_dir) / "file_paths_cache"))

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the caching layer from diskcache to a RocksDB-based solution using rocksdict. The new RocksDBCache class is a clean abstraction. My review includes suggestions to enhance its robustness by implementing the context manager protocol for safer resource management and improving the API for more flexible usage. These changes will make the new cache component more reliable and easier to use in other parts of the application.

@@ -0,0 +1,33 @@
from pathlib import Path
from typing import Any, Iterator, Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To support a more flexible type hint in __init__, please import Union from the typing module.

Suggested change
from typing import Any, Iterator, Optional
from typing import Any, Iterator, Optional, Union



class RocksDBCache:
def __init__(self, cache_dir: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved flexibility and to align with modern Python practices using pathlib, it's better to accept both string paths and Path objects for cache_dir. Path(cache_dir) already handles both types correctly.

Suggested change
def __init__(self, cache_dir: str):
def __init__(self, cache_dir: Union[str, Path]):

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ChenZiHong-Gavin ChenZiHong-Gavin merged commit 761e64b into main Dec 3, 2025
4 checks passed
@ChenZiHong-Gavin ChenZiHong-Gavin deleted the refactor/replace-sqlite-with-rocksdb branch December 3, 2025 07:50
CHERRY-ui8 pushed a commit to CHERRY-ui8/GraphGen that referenced this pull request Dec 17, 2025
* refactor: replace sqlite with rocksdb

* Potential fix for pull request finding 'Empty except'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* Update graphgen/models/storage/rocksdb_cache.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants