Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added existence cache #383

Merged
merged 1 commit into from
Nov 15, 2023

Conversation

blakehatch
Copy link
Member

@blakehatch blakehatch commented Nov 7, 2023

Description

Added an existence cache for caching calls to has and whether the action exists or not, does not store the content.

Fixes #357

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Tested whether calls to has or walking the tree populate the existence cache properly.

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

Copy link
Contributor

@steedmicro steedmicro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 3 unresolved discussions (waiting on @blakehatch)


cas/store/BUILD line 503 at r1 (raw file):

        "@crate_index//:tokio",
    ],
)

Need a line break here.


cas/store/verify_store.rs line 272 at r1 (raw file):

        );
    }
}

ditto


cas/store/tests/verify_store_test.rs line 490 at r1 (raw file):

        Ok(())
    }
}

ditto

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 7 unresolved discussions (waiting on @blakehatch and @steed924)


cas/store/verify_store.rs line 15 at r2 (raw file):

// limitations under the License.

use std::collections::HashSet;

nit: Use:
https://docs.rs/hashbrown/latest/hashbrown/hash_set/struct.HashSet.html


cas/store/verify_store.rs line 47 at r2 (raw file):

        VerifyStore {
            inner_store,
            existence_cache: Mutex::new(HashSet::new()),

nit: I don't think we want to do existence cache in this store, it should be it's own store.


cas/store/verify_store.rs line 55 at r2 (raw file):

    }

    pub async fn record_existence_directory_walk(

This is never called.


cas/store/verify_store.rs line 189 at r2 (raw file):

                    .has_with_results(&[digest], &mut results[i..i + 1])
                    .await;
                self.existence_cache.lock().unwrap().insert(digest);

if result has failed it will add the entry to the cache.

if the results (which is the result of the digest) fails, it also inserts saying it exists (which would be None).

@blakehatch blakehatch force-pushed the existence-cache-store branch 4 times, most recently from 2f3dcaf to 5f2eaac Compare November 8, 2023 07:23
Copy link
Member Author

@blakehatch blakehatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @allada and @steed924)


cas/store/BUILD line 503 at r1 (raw file):

Previously, steed924 (Steed) wrote…

Need a line break here.

Done.


cas/store/verify_store.rs line 272 at r1 (raw file):

Previously, steed924 (Steed) wrote…

ditto

Done.


cas/store/verify_store.rs line 15 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Use:
https://docs.rs/hashbrown/latest/hashbrown/hash_set/struct.HashSet.html

Done.


cas/store/verify_store.rs line 55 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

This is never called.

Removed as this functionality belongs in completeness_checking_store not existence_store.


cas/store/verify_store.rs line 189 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

if result has failed it will add the entry to the cache.

if the results (which is the result of the digest) fails, it also inserts saying it exists (which would be None).

Done.


cas/store/tests/verify_store_test.rs line 490 at r1 (raw file):

Previously, steed924 (Steed) wrote…

ditto

Done.

Copy link
Member Author

@blakehatch blakehatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @allada, @blakehatch, and @steed924)


cas/store/verify_store.rs line 47 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: I don't think we want to do existence cache in this store, it should be it's own store.

Moving to a new store wrapper.

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @blakehatch and @steed924)


cas/store/verify_store.rs line 55 at r2 (raw file):

Previously, blakehatch (Blake Hatch) wrote…

Removed as this functionality belongs in completeness_checking_store not existence_store.

Forget to move it?


cas/store/verify_store.rs line 187 at r4 (raw file):

                let result = self
                    .pin_inner()
                    .has_with_results(&[digest], &mut results[i..i + 1])

We are iterating the digests, then calling .has_with_results on each one individually. This function is specifically exposed for stores that can lookup multiple digests at once and return the results with low latency.

I think what makes more sense is to filter and map the digests then do 1 call to .has_with_results() then process the results after.

@blakehatch blakehatch force-pushed the existence-cache-store branch 2 times, most recently from b1c64d8 to 8f7bb51 Compare November 8, 2023 17:19
Copy link
Member Author

@blakehatch blakehatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @allada, @blakehatch, and @steed924)


cas/store/verify_store.rs line 47 at r2 (raw file):

Previously, blakehatch (Blake Hatch) wrote…

Moving to a new store wrapper.

Moved.


cas/store/verify_store.rs line 55 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Forget to move it?

Moved.


cas/store/verify_store.rs line 189 at r2 (raw file):

Previously, blakehatch (Blake Hatch) wrote…

Done.

Done.


cas/store/verify_store.rs line 187 at r4 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

We are iterating the digests, then calling .has_with_results on each one individually. This function is specifically exposed for stores that can lookup multiple digests at once and return the results with low latency.

I think what makes more sense is to filter and map the digests then do 1 call to .has_with_results() then process the results after.

Restructured has_with_results to be more efficient.

@blakehatch blakehatch marked this pull request as ready for review November 8, 2023 17:20
@blakehatch blakehatch force-pushed the existence-cache-store branch 8 times, most recently from 21ced10 to 218d9d8 Compare November 10, 2023 18:46
Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r6, all commit messages.
Reviewable status: 11 of 13 files reviewed, 12 unresolved discussions (waiting on @allada, @blakehatch, and @steed924)


-- commits line 2 at r6:
nit: "Add existence cache"


cas/store/BUILD line 35 at r6 (raw file):

        ":shard_store",
        ":size_partitioning_store",
        "existence_store",

Suggestion:

:existence_store

cas/store/existence_store.rs line 29 at r6 (raw file):

pub struct ExistenceStore {
    inner_store: Arc<dyn StoreTrait>,
    existence_cache: Arc<Mutex<HashSet<DigestInfo>>>,

Note: It might be interesting to investigate the use of an RwLock in a future commit. Since HashSet implements Send and Sync this should be a drop-in replacement in theory. However, without a benchmark suite we shouldn't attempt these kinds of optimizatons since they could be pessimizations.


cas/store/existence_store.rs line 34 at r6 (raw file):

impl ExistenceStore {
    pub fn new(inner_store: Arc<dyn StoreTrait>) -> Self {
        ExistenceStore {

https://rust-lang.github.io/rust-clippy/master/index.html#/use_self

Suggestion:

Self

cas/store/existence_store.rs line 48 at r6 (raw file):

        let mut cache = self.existence_cache.lock().await;
        cache.insert(*digest)
    }

nit: Since digest_in_existence_cache and cache_existence are only used once it seems fine to remove this an inline their implementations in has_with_results.


cas/store/existence_store.rs line 70 at r6 (raw file):

        }

        let not_in_cache: Vec<DigestInfo> = not_in_cache.into_iter().cloned().collect();

https://rust-lang.github.io/rust-clippy/master/index.html#/cloned_instead_of_copied

Suggestion:

copied

cas/store/tests/existence_store_test.rs line 35 at r6 (raw file):

        let store_owned = ExistenceStore::new(inner_store.clone());
        let store = Pin::new(&store_owned);
        const VALUE: &str = "123";

nit: Put consts in the first line.

Copy link
Member Author

@blakehatch blakehatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 13 files reviewed, 9 unresolved discussions (waiting on @aaronmondal, @allada, and @steed924)


-- commits line 2 at r6:

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: "Add existence cache"

Done.


cas/store/BUILD line 35 at r6 (raw file):

        ":shard_store",
        ":size_partitioning_store",
        "existence_store",

Done.


cas/store/existence_store.rs line 29 at r6 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Note: It might be interesting to investigate the use of an RwLock in a future commit. Since HashSet implements Send and Sync this should be a drop-in replacement in theory. However, without a benchmark suite we shouldn't attempt these kinds of optimizatons since they could be pessimizations.

That does seem like an interesting possible optimization compared to a mutex, added a comment to consider it at a later date once there is a benchmark.


cas/store/existence_store.rs line 34 at r6 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

https://rust-lang.github.io/rust-clippy/master/index.html#/use_self

Done.


cas/store/existence_store.rs line 48 at r6 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: Since digest_in_existence_cache and cache_existence are only used once it seems fine to remove this an inline their implementations in has_with_results.

Done.


cas/store/existence_store.rs line 70 at r6 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

https://rust-lang.github.io/rust-clippy/master/index.html#/cloned_instead_of_copied

Done.


cas/store/tests/existence_store_test.rs line 35 at r6 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: Put consts in the first line.

Done.

@blakehatch blakehatch force-pushed the existence-cache-store branch 3 times, most recently from 71e6b8a to dab3590 Compare November 14, 2023 01:48
Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed from the global store list? Will it be added later? ATM we have no docs for this.

Reviewed 7 of 7 files at r7, all commit messages.
Reviewable status: 11 of 13 files reviewed, 6 unresolved discussions (waiting on @allada and @steed924)

@blakehatch blakehatch force-pushed the existence-cache-store branch 2 times, most recently from 3d72531 to 91df89e Compare November 14, 2023 13:08
Copy link
Member Author

@blakehatch blakehatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back to global store list, was an accident from using one of my stashes from when existence caching was integrated in verify_store instead of it's own store.

Reviewable status: 9 of 13 files reviewed, 6 unresolved discussions (waiting on @aaronmondal, @allada, and @steed924)

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: ❤️

Reviewed 2 of 2 files at r8, all commit messages.
Reviewable status: 11 of 13 files reviewed, 6 unresolved discussions (waiting on @allada and @steed924)

@blakehatch blakehatch merged commit e8e6701 into TraceMachina:main Nov 15, 2023
13 of 14 checks passed
Copy link
Member Author

@blakehatch blakehatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Reviewable status: 11 of 13 files reviewed, 6 unresolved discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement existance cache store
4 participants