Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to Store API to get the inner store when possible #410

Merged
merged 1 commit into from
Dec 3, 2023

Conversation

allada
Copy link
Member

@allada allada commented Nov 19, 2023

Adds inner_store() function to all stores that enables the resolution of inner stores recursively to get an underlying store. This is mostly for places that can perform optimizations when specific code paths can be optimized with specific stores.

towards: #409


This change is Reviewable

Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@aaronmondal

Reviewable status: 0 of 22 files reviewed, all discussions resolved (waiting on @aaronmondal)

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this is an API that should apply "globally" to all stores. I also find it somewhat confusing that e.g. the FastSlowStore's inner_store is itself and not e.g. a vector of its fast and slow stores or something like that.

Actually this is something that's been bothering me for a while: In my mind something like the RedisStore and the DedupStore are quite different. The former actually stores data while the latter is just a wrapper. Our current API doesn't reflect that and I think it could make things like an inner_store functionality more intuitive if we split up stores into e.g. "stores" and "store-wrappers". This way it'd be clear that a store can't have an inner store (i.e. it's always a leaf node), and a store-wrapper always has some inner store (i.e. it's never a leaf node).

This could also simplify the API for non-wrapper stores as those only need to care about how to read/update data.

Reviewable status: 0 of 22 files reviewed, all discussions resolved

Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partially agree. In the case of DedupStore, there is no actual inner store, the inner store is itself. What I'm trying to do is create a case where inner_store will resolve any stores that pass through the data but do no complex logic on it. This will allow us to use things like:

    "GRPC_CAS_STORE": {
      "size_partitioning": {
        "size": 10000,
        "lower": {
          "grpc": {
            // ...
          }
        },
        "upper": {
          "s3_store": {
            // ...
          }
        }
      }
    },
    "WORKER_FAST_SLOW_STORE": {
      "fast_slow": {
        "fast": {
          "filesystem": {
            // ...
          }
        },
        "slow": {
          "ref_store": {
            "ref": "CAS_STORE"
          }
        }
      }
    }

This config would allow the GRPC optimizations we have laying around significantly more frequent.

The specific use case that I am doing this for is for #409, what I plan on doing is checking to see if the store I'm uploading to when a worker is uploading it's result is a filesystem store. If it is, it'll simply downcast it to the filesystem store and then move the file directly instead of coping the data. The reason I wanted to do this first is because ref_store is frequently used when the worker lives on the same machine as the CAS.

Reviewable status: 0 of 22 files reviewed, all discussions resolved (waiting on @aaronmondal)

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 22 of 22 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @allada)


cas/store/shard_store.rs line 178 at r1 (raw file):

        };
        let index = self.get_store_index(digest);
        self.weights_and_stores[index].1.clone().inner_store(Some(digest))

nit: Might be worth considering some sort of struct to get rid of the 1 syntax here. Not too relevant though.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @allada)

Adds `inner_store()` function to all stores that enables the
resolution of inner stores recursively to get an underlying store.
This is mostly for places that can perform optimizations when
specific code paths can be optimized with specific stores.

towards: #409
@allada allada force-pushed the add-inner_store_to_store_api branch from c98baaf to 7197495 Compare December 2, 2023 04:45
Copy link
Member Author

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 43 files reviewed, all discussions resolved (waiting on @aaronmondal)


cas/store/shard_store.rs line 178 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: Might be worth considering some sort of struct to get rid of the 1 syntax here. Not too relevant though.

Kinda agree, but not for this PR.

@allada allada merged commit a0788fa into main Dec 3, 2023
14 of 15 checks passed
@allada allada deleted the add-inner_store_to_store_api branch December 3, 2023 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants