Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return file creation times on Linux (using statx) #248

Merged
merged 9 commits into from
Jun 4, 2022

Conversation

ongardie
Copy link
Contributor

@ongardie ongardie commented Jun 3, 2022

Rust's standard library returns file creation times on Linux (at least with glibc), and now cap-std will too.

File creation times are available on Linux via the statx syscall (not fstatat), introduced in kernel 4.11, which was released in April 2017. The Rust standard library already uses statx on Linux with glibc.

Making cap-std support file creation times on Linux required two changes:

  1. File::metadata() uses the standard library (which uses statx), but this wouldn't ever copy over the created field on Linux. Now, any time the created field is set in the std Metadata struct, it's also set in the cap-primitives Metadata, regardless of platform.

  2. stat_unchecked is used in several places, including fetching DirEntry metadata. Before, it called fstatat directly. Now, it calls statx on Linux when available, and it falls back to fstatat otherwise. Fortunately, Dan Gohman (@sunfishcode) had already added a method to convert statx results in commit d1fa735 (PR Fix stat when the last path component is a symlink to ... #105) in 2020.

This commit also adds a new test to make sure file creation times are set in cap-std Metadata if they are set in std Metadata.

Rust's standard library returns file creation times on Linux (at least with
glibc), and now rust-std will too.

File creation times are available on Linux via the `statx` syscall (not
`fstatat`), introduced in kernel 4.11, which was released in April 2017. The
Rust standard library already uses `statx` on Linux with glibc.

Making cap-std support file creation times on Linux required two changes:

1. `File::metadata()` uses the standard library (which uses `statx`), but this
   wouldn't ever copy over the created field on Linux. Now, any time the
   created field is set in the std Metadata struct, it's also set in the
   cap-primitives Metadata, regardless of platform.

2. `stat_unchecked` is used in several places, including fetching DirEntry
   metadata. Before, it called `fstatat` directly. Now, it calls `statx` on
   Linux when available, and it falls back to `fstatat` otherwise. Fortunately,
   Dan Gohman (@sunfishcode) had already added a method to convert `statx`
   results in commit d1fa735 (PR bytecodealliance#105) in 2020.

This commit also adds a new test to make sure file creation times are set in
cap-std Metadata if they are set in std Metadata.
Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for working on this!

cap-primitives/src/rustix/fs/stat_unchecked.rs Outdated Show resolved Hide resolved
cap-primitives/src/rustix/fs/stat_unchecked.rs Outdated Show resolved Hide resolved
tests/metadata-ext.rs Outdated Show resolved Hide resolved
let std_supports_created = matches!(
a.into_std().metadata(),
Ok(m) if m.created().is_ok(),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the modified() timestamp for expected, could you use the created() timestamp for expected? And for the directories, could we always call std::fs::metadata so that we always get the created() timestamp from std? It seems like that should eliminate the need for a tolerance, because the creation time from cap-std should be exactly the same as the creation time from std.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes more sense than what I had done. Updated.

@@ -83,3 +83,55 @@ fn test_metadata_ext() {
);
}
}

#[test]
fn test_metadata_ext_created() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test in the right / a reasonable place? I wasn't sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good question. metadata-ext.rs is for MetadataExt APIs, but the created() field is a regular Metadata API. This can go in fs_additional.rs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

);
match statx_result {
Ok(statx) => return Ok(MetadataExt::from_rustix_statx(statx)),
Err(rustix::io::Error::NOSYS) => HAS_STATX.store(false, Ordering::Relaxed),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a comment in open_impl.rs about seccomp on Android making ENOSYS a bad idea. I don't think that applies here yet, but it might later. Is there a better approach? Should I place a comment on this landmine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's own standard library calls statx without doing an explicit version check, so we're ok on Androidd here.

Looking into it though, I notice that there is a potential problem with statx returning EPERM on old versions of Docker; std does a special check to avoid this problem. If you want to implement that same trick here, that'd be great, though since rustix also needs to do this, it's also fine to leave this code as-is for now, and I'll handle this when I update rustix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. It seems like if we merge this PR now, it'd break metadata for not-very old versions of Docker, so that's probably bad.

Why/where does rustix need this logic? I thought it just wrapped statx and doesn't really care whether or not statx succeeds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustix uses statx in its implementation of rustix::fs::stat. Rustix usually doesn't do that kind of emulation, but in this case, it's motivated by a desire to be y2038-ready on platforms which support that. On Linux, that means using statx instead of the native stat family of syscalls whenever possible. So if you want to fix this now, that'd be great, but I am also happy to fix this myself at the same time that I fix rustix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't see that earlier because I was searching through an older version, oops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just simulated the seccomp statx -> EPERM behavior on my system using this Python script (as root):

import errno
import os
import seccomp

f = seccomp.SyscallFilter(seccomp.ALLOW)
f.add_rule(seccomp.ERRNO(errno.EPERM), "statx")
f.load()

os.execl(
    "target/debug/deps/fs_additional-d7e3e72f4c824747",
    "fs_additional",
    "--nocapture",
    "--test-threads=1",
    "metadata_vs_std_fs",
)

It's not quite as accurate as getting an old Docker version, but I'd rather not do that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated stat_unchecked to handle EPERM. I couldn't do the same EFAULT trick as Rust std uses (I think it'd require unsafe), but I pushed the test off the critical path. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering adding something in Rustix to allow or the null-pointer EFAULT trick, but for now the code you have here looks good.

@@ -3,6 +3,14 @@ use rustix::fs::{statat, AtFlags};
use std::path::Path;
use std::{fs, io};

// TODO: update all these to
// #[cfg(any(target_os = "android", target_os = "linux"))]
// once we're on restix >= v0.34.3.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea when the rustix version will be updated? I don't have a sense of how big that is, so I didn't want to add it to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest thing for cap-std is that there are some semver incompatibilities between versions of io-lifetimes, which is tracking changes from I/O safety stabilization. Fortunately, the FCP looks to be almost done. Once it's done, I plan to do an io-lifetimes release and then a rustix 0.35 release, and then update cap-std and various other things.


let std_dir = check!(dir.into_std_file().metadata());
let std_file = check!(file.into_std().metadata());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is much nicer now, I think. However, it's a little funny as is. It does all this setup to extract metadata in various ways from cap_std and std, but then it only compares the created times of the metadata and nothing else. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, creation times have an interesting enough implementation landscape that I feel it's justified to have a fair amount of test setup just for that one purpose, so this seems fine. On the other, since we have all this setup here, it'd be nice to expand this to test all of Metadata's accessors to make sure they all match. accessed() might need to be a >= check instead of equality because files can get spurious access-time updates.

I don't want to bog you down if you're working on a project which just needs the creation times fix. So I'll leave it up to you for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I expanded this out for all of the accessible Metadata (with only Unix extensions for now).

Comment on lines +932 to +934
// The standard library returns file format bits with `mode()`, whereas
// cap-std currently doesn't.
assert_eq!(std.permissions().mode() & 0o7777, cap.permissions().mode());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unexpected and a bit surprising to me. Is it intentional? If so, is that documented somewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cap-std's intention to match std's behavior at least, so it looks like a bug in cap-std. I filed #249 to track this.

Comment on lines +942 to +944
// The access times might be a little different due to either our own
// or concurrent accesses.
const ACCESS_TOLERANCE_SEC: u32 = 60;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't love this but didn't want to assume we made the cap or std access first, and I wanted to check the access times weren't wildly far into the future either. I guess it's pretty easy to find and update later if it causes trouble.

I had used `cfg!(unix)` instead of `#[cfg(unix)]` in 2eca023, which doesn't
conditionally compile out the Unix-specific code.
@sunfishcode sunfishcode merged commit 6e177ac into bytecodealliance:main Jun 4, 2022
@ongardie
Copy link
Contributor Author

ongardie commented Jun 6, 2022

Thanks for all your guidance with this, @sunfishcode, especially since there's a lot of relevant context that I wasn't aware of when I started. I think the code improved with each iteration, and it's nice that the expanded test caught another issue right away. Cheers.

@sunfishcode
Copy link
Member

This is now released in cap-std 0.25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants