Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lost file data in filesystem store (with test) #439

Merged

Conversation

adam-singer
Copy link
Contributor

@adam-singer adam-singer commented Dec 1, 2023

Description

  • Refactor filesystem_store::FilesystemStore to allow for controlling sleep function from outside of implementation.

  • Hoist BytesMut.split() to a position before the loop to prevent subsequent calls that result in an empty buffer. Original patch: Fix lost file data in filesystem store #426 / @Jirixek

Fixes # (issue)

#426

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

get_part_timeout_test unit test against non-hoisted BytesMut.split() will trigger the bug, once split is moved bug is no longer present.

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 4 unresolved discussions (waiting on @aaronmondal and @adam-singer)


native-link-store/src/filesystem_store.rs line 22 at r2 (raw file):

use std::time::SystemTime;

use core::time::Duration;

nit: use std instead of core.


native-link-store/src/filesystem_store.rs line 39 at r2 (raw file):

use tokio::task::spawn_blocking;
use tokio::time::{sleep, Sleep};
use tokio::time::timeout;

nit: combine this with the line above.


native-link-store/tests/filesystem_store_test.rs line 880 at r2 (raw file):

        let temp_path = make_temp_path("temp_path");

        fs::set_idle_file_descriptor_timeout(Duration::from_millis(100))

nit: This line is not needed for this test.


native-link-store/tests/filesystem_store_test.rs line 883 at r2 (raw file):

            .err_tip(|| "Error setting idle file descriptor timeout")?;

        assert_eq!(fs::idle_file_descriptor_timeout(), Duration::from_millis(100));

nit: ditto.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I love a sleep less than 1 second.

native-link-store/tests/filesystem_store_test.rs Outdated Show resolved Hide resolved
@MarcusSorealheis
Copy link
Collaborator

Also, can you condense these into a single commit?

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 5 unresolved discussions (waiting on @aaronmondal, @adam-singer, and @MarcusSorealheis)


native-link-store/tests/filesystem_store_test.rs line 876 at r2 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

nit (should not hold up PR, because it is a test): I know this is only a test, I'm curious why this variable name was chosen. It's your code so I shouldn't dictate, but my intuition would've been to name this variable many_x_digest or something similar to represent the long string of x's for a few reasons.

  1. digest is a moderately common function name in Rust.
  2. digest1 is slightly non-descript. Not totally, it is a digest, and the first one.

On the other hand, it's not that important because it doesn't really matter what the value is, as the important point is that it is a large string.

FYI: This PR was not flagged as ready for review, it was just so we could talk about it. It turns out we need a large digest anyway.

In other parts of the code it is usually digest1 then digest2 when there are two digests that are needed for the test.

Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @adam-singer and @MarcusSorealheis)


-- commits line 5 at r2:
Remember to squash the commits.


native-link-store/src/filesystem_store.rs line 695 at r1 (raw file):

                        continue;
                    }
                    res = writer.send(buf.split().freeze()) => {

Looks like we're still using the split/freeze inside the loop. Maybe it makes sense to combine this PR with #426.

Edit: Ok it's no longer in the loop 😆 Let's make sure to add @Jirixek as co-author.

@MarcusSorealheis
Copy link
Collaborator

In other parts of the code it is usually digest1 then digest2 when there are two digests that are needed for the test.

This is true. In this case, digest1_clone could be digest_2 for consistency? I only saw 1 digest in the test but could have missed something.

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally digest1_clone style name is used when you need to need to pass ownership of an item to another closure or similar.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @adam-singer and @Jirixek)

Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged as single commit and hopefully gave @Jirixek proper annotation for the fix.

done : digest1_clone && digest1 renamed

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @aaronmondal and @Jirixek)


native-link-store/src/filesystem_store.rs line 695 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Looks like we're still using the split/freeze inside the loop. Maybe it makes sense to combine this PR with #426.

Edit: Ok it's no longer in the loop 😆 Let's make sure to add @Jirixek as co-author.

done , ptal if it is correct in git log, not metadata I've often had to adjust


native-link-store/src/filesystem_store.rs line 22 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: use std instead of core.

done


native-link-store/src/filesystem_store.rs line 39 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: combine this with the line above.

done


native-link-store/tests/filesystem_store_test.rs line 876 at r2 (raw file):

Previously, adam-singer (Adam Singer) wrote…

The test case was graciously provided by @Jirixek, lifted as is, naming can be adjusted

done


native-link-store/tests/filesystem_store_test.rs line 880 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: This line is not needed for this test.

done


native-link-store/tests/filesystem_store_test.rs line 883 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: ditto.

done

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @aaronmondal, @adam-singer, and @Jirixek)


native-link-store/src/filesystem_store.rs line 695 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

done , ptal if it is correct in git log, not metadata I've often had to adjust

We could also flag this test as a "do-not-run" then have @Jirixek just enable the test in his PR?

Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @aaronmondal and @Jirixek)


-- commits line 5 at r2:

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Remember to squash the commits.

Done.


native-link-store/src/filesystem_store.rs line 695 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

We could also flag this test as a "do-not-run" then have @Jirixek just enable the test in his PR?

sgtm, or can just take it has a patch locally, apply and update the PR, since we have already reviewed turn around time would be low.

@adam-singer adam-singer marked this pull request as ready for review December 1, 2023 23:51
Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 9 unresolved discussions (waiting on @aaronmondal, @adam-singer, and @Jirixek)


native-link-store/tests/filesystem_store_test.rs line 28 at r3 (raw file):

use std::time::SystemTime;

use core::time::Duration;

nit: use std instead of core


native-link-store/tests/filesystem_store_test.rs line 49 at r3 (raw file):

use tokio::sync::Barrier;
use tokio::time::sleep;
use tokio::time::Sleep;

nit: Combine these into the same line.


native-link-store/tests/filesystem_store_test.rs line 880 at r3 (raw file):

        let temp_path = make_temp_path("temp_path");

        fn sleep_fn(_: Duration) -> Sleep {

nit: Can you try and make this a lambda and inline it? I think rust does support it as long as it is a pure function (no-captures and not mutable).


native-link-store/tests/filesystem_store_test.rs line 888 at r3 (raw file):

                content_path: content_path.clone(),
                temp_path: temp_path.clone(),
                eviction_policy: Some(native_link_config::stores::EvictionPolicy {

nit: Just remove the eviction policy, we are not testing it.


native-link-store/tests/filesystem_store_test.rs line 900 at r3 (raw file):

        let store_pin = Pin::new(store.as_ref());
        // Insert data into store.

nit: This comment is obvious, so i'd remove it.


native-link-store/tests/filesystem_store_test.rs line 910 at r3 (raw file):

        // Check to ensure our first byte has been received. The future should be stalled here.
        let first_byte = DropCloserReadHalf::take(&mut reader, 1)

nit: I don't think we need to do this for this test.


native-link-store/tests/filesystem_store_test.rs line 915 at r3 (raw file):

        assert_eq!(first_byte[0], large_value.as_bytes()[0], "Expected first byte to match");

        sleep(Duration::from_millis(200)).await;

nit: We can't sleep in tests. You may need to sleep(0) though, so it triggers the kernel's epoll, and may also need to use tokio::task::yield_now().await

@adam-singer adam-singer force-pushed the adams/pr-426-refactor-for-test branch 2 times, most recently from d18f83b to 7039cd9 Compare December 2, 2023 00:28
Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @aaronmondal and @Jirixek)


native-link-store/tests/filesystem_store_test.rs line 28 at r3 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: use std instead of core

done.


native-link-store/tests/filesystem_store_test.rs line 49 at r3 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Combine these into the same line.

done.


native-link-store/tests/filesystem_store_test.rs line 880 at r3 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Can you try and make this a lambda and inline it? I think rust does support it as long as it is a pure function (no-captures and not mutable).

done.


native-link-store/tests/filesystem_store_test.rs line 888 at r3 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Just remove the eviction policy, we are not testing it.

done.


native-link-store/tests/filesystem_store_test.rs line 900 at r3 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: This comment is obvious, so i'd remove it.

done.


native-link-store/tests/filesystem_store_test.rs line 910 at r3 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: I don't think we need to do this for this test.

done.


native-link-store/tests/filesystem_store_test.rs line 915 at r3 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: We can't sleep in tests. You may need to sleep(0) though, so it triggers the kernel's epoll, and may also need to use tokio::task::yield_now().await

done.

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed (commit messages unreviewed), 3 unresolved discussions (waiting on @aaronmondal, @adam-singer, and @Jirixek)


native-link-store/tests/filesystem_store_test.rs line 880 at r3 (raw file):

Previously, adam-singer (Adam Singer) wrote…

done.

last minor nit on this, can you inline this?


native-link-store/tests/filesystem_store_test.rs line 901 at r4 (raw file):

        let digest_clone = digest;

        tokio::spawn(async move { Pin::new(store_clone.as_ref()).get(digest_clone, writer).await });

nit: Can we wrap this spawn in a JoinHandleDropGuard::new() and assign it to a dummy variable.

Doing this will ensure the spawn gets dropped when it goes out of scope.

Failing to do this might cause (depending on how tokio test runtime is implemented) the task to be stalled forever in the current runtime (which might be shared between tests). This wrapper simply calls cancel on the spawn if the join future gets dropped.

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Thank you everyone :-)

Reviewable status: all files reviewed (commit messages unreviewed), 3 unresolved discussions (waiting on @aaronmondal, @adam-singer, and @Jirixek)

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @aaronmondal, @adam-singer, and @Jirixek)

@Jirixek
Copy link

Jirixek commented Dec 3, 2023

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @aaronmondal, @adam-singer, and @Jirixek)

native-link-store/src/filesystem_store.rs line 695 at r1 (raw file):
Previously, adam-singer (Adam Singer) wrote…

We could also flag this test as a "do-not-run" then have @Jirixek just enable the test in his PR?

Don't bother, just merge this MR as is. You've done the most of the work anyway, I'm happy with the credit you've game me :)

Co-authored-by: Adam Singer <adam@tracemachina.com>
Co-authored-by: Jiří Szkandera @Jirixek
Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @aaronmondal, @allada, and @MarcusSorealheis)


native-link-store/src/filesystem_store.rs line 695 at r1 (raw file):

Previously, Jirixek (Jiří Szkandera) wrote…

Don't bother, just merge this MR as is. You've done the most of the work anyway, I'm happy with the credit you've game me :)

Done.


native-link-store/tests/filesystem_store_test.rs line 872 at r4 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

yes, this is better.

Done.


native-link-store/tests/filesystem_store_test.rs line 901 at r4 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Can we wrap this spawn in a JoinHandleDropGuard::new() and assign it to a dummy variable.

Doing this will ensure the spawn gets dropped when it goes out of scope.

Failing to do this might cause (depending on how tokio test runtime is implemented) the task to be stalled forever in the current runtime (which might be shared between tests). This wrapper simply calls cancel on the spawn if the join future gets dropped.

Done.

Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @aaronmondal, @allada, and @MarcusSorealheis)

Copy link
Contributor

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @MarcusSorealheis)

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r5, all commit messages.
Dismissed @MarcusSorealheis from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @adam-singer)

@adam-singer adam-singer merged commit 5123ffc into TraceMachina:main Dec 5, 2023
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants