Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Robinhood allows scanning only a part of the namespace, by specifying an argument to
This feature can be used to parallelize the namespace scan accross multiple filesystem clients.
This can be useful if the limiting factor of scanning speed is the client filesystem call throughput.
- Make your usual robinhood config available to all nodes that will run the scan. Make sure the database host is designated by its hostname (i.e. not "localhost").
- Make sure the database is accessible from all those nodes. For this, you can use
rbh-config test_db <db_name> <password>
- Determine a balanced partitioning (in terms of entry count) of filesystem top-level directories to be distributed to the robinhood scanning commands. For example:
- Run robinhood commands accordingly:
robinhood --scan=/fs/dir1 --no-gc ;
robinhood --scan=/fs/dir2 --no-gc
robinhood --scan=/fs/dir3/A --no-gc
robinhood --scan=/fs/dir3/B --no-gc
--no-gc is very important for performance in this use-case. If it is not specified, robinhood tries to clean entries that were previously located in this part of the namespace and than have not been seen during the scan. This cleaning is VERY expensive for partial scanning as it requires to build and match the path of all entries in the DB. This cleaning is much more efficient for whole filesystem scans, so it is recommended to keep it enabled only for such whole scans.