Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error reporting and logging #2705

Merged
merged 2 commits into from
Feb 29, 2024

Conversation

YJDoc2
Copy link
Collaborator

@YJDoc2 YJDoc2 commented Feb 23, 2024

ref : #2695

This converts a lot of .unwrap calls to propagated errors, and also improves general error reporting that is seen by users of youki.

I'll suggest to review both of the commits separately, as they focus on two different aspects of this :

  • 7f8422f deals with the issue that when init process fails to exec, it does not report back to main process, thus the error gets reported as channel broken which is not at all useful. This now sends the error back via the channel. This also changes the InvalidArg error that is given by validation of executable to ArgValdationError . I feel the latter is better suited, and does not collide with another error of the same name that we have. This might be a breaking change for runwasi If they are match-ing on the enum values.

  • 62431aa : This converts various .unwrap s we have into error propagation so that users of the libraries can decide how they want to deal with it. I have left some unwraps where we first check that it cannot fail, and then call unwrap, as in those places, converting it to more idiomatic ways is either not possible to too complex code. I have also left out unwraps in all the tests, as it does not make sense to convert them. While the error propagation behavior is changed, otherwise the behavior should not change.

I also have some other comments, which I'm leaving as review comments below.

Thanks :)

Signed-off-by: Yashodhan Joshi <yjdoc2@gmail.com>
@YJDoc2 YJDoc2 added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. breaking change labels Feb 23, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2024

Codecov Report

Merging #2705 (62431aa) into main (919ff4a) will decrease coverage by 0.19%.
Report is 2 commits behind head on main.
The diff coverage is 21.12%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2705      +/-   ##
==========================================
- Coverage   65.40%   65.21%   -0.19%     
==========================================
  Files         133      133              
  Lines       16942    16981      +39     
==========================================
- Hits        11081    11075       -6     
- Misses       5861     5906      +45     

Signed-off-by: Yashodhan Joshi <yjdoc2@gmail.com>
Copy link
Collaborator Author

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not sure is converting the is_true_root , is_in_new_userns and rootless_required this way is a good idea or not. The only error case here is failing to read /proc/self/uid_map , which I'm not sure how much possible it is, and it bloats the signatures and forces to use IO error everywhere it is called. I'd be happy to convert it back to original if someone else also feels the same way
  2. While doing this I have found that right now youki does not work with podman exec at all. I have found the rc, and have a fix in another branch ; but some unit tests are failing. If that works, then the failing podman tests goes down from 161 to 99 🎉 This PR will help me diagnose further errors, so opening this first and waiting for merge before that one.

Comment on lines +130 to +135
if let Some(weight) = wd.weight() {
common::write_cgroup_file(
root_path.join(CGROUP_BFQ_IO_WEIGHT),
format!("{}:{} {}", wd.major(), wd.minor(), weight),
)?;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this based upon https://github.com/opencontainers/runc/blob/main/libcontainer/cgroups/fs/blkio.go#L29-L33 from runc ; and if the weight is None the existing code would just panic. However, I'm not sure if this is the correct approach, or the None wight should be defaulted to 0 or reported as error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this too much? For now we don't support checkpoint at all, so I feel this might be overkill. We can also potentially do these changes when we fix the support for checkpointing.

@@ -78,6 +78,8 @@ pub enum UserNamespaceError {
UnknownUnprivilegedUsernsClone(u8),
#[error(transparent)]
IDMapping(#[from] MappingError),
#[error(transparent)]
OtherIO(#[from] std::io::Error),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the OtherIO be some other higher level error, as this is duplicated in another error enum below.

@YJDoc2 YJDoc2 requested a review from a team February 23, 2024 11:48
Comment on lines 131 to 139
if let ContainerType::TenantContainer { exec_notify_fd } = args.container_type {
let buf = format!("{e}");
if let Err(err) = write(exec_notify_fd, buf.as_bytes()) {
tracing::error!(?err, "failed to write to exec notify fd");
return -1;
}
if let Err(err) = close(exec_notify_fd) {
tracing::error!(?err, "failed to close exec notify fd");
return -1;
}
}
Copy link
Collaborator Author

@YJDoc2 YJDoc2 Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the return -1 from individual arms, as we should still try to close the socket even if we fail to write it, and we are returning -1 at the block end

@utam0k
Copy link
Collaborator

utam0k commented Feb 23, 2024

Overall, it looks good to me, but I have one question. How about using unwraped_used in cargo clippy to detect using unwrap() in CI.
https://rust-lang.github.io/rust-clippy/master/index.html#/unwrap_used

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Feb 23, 2024

How about using unwraped_used in cargo clippy to detect using unwrap() in CI.

I checked that out, and while it would be good, right now there are a lot of places where we will need to manually add [allow(unwrap_used)] . This is because at those places, using unwrap is fine due to some pre-check we have done, or because we are sure that that operation will not fail (eg converting from os string to string , for all our use cases should be ok, as those should always be utf-8). dbus/message.rs and dbus/serialize.rs would also need to allow this lint over all the file, as there we first check and the unwrap so it is ok.

Copy link
Collaborator

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome improvement!

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Feb 26, 2024

I'll wait maybe a day or two more and see if anyone has any comments, if not will merge.

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Feb 29, 2024

Going ahead and merging this, as no further comments were added . Thanks :)

@YJDoc2 YJDoc2 merged commit ed2c08d into youki-dev:main Feb 29, 2024
28 checks passed
@YJDoc2 YJDoc2 deleted the fix/improve_logging_errors branch February 29, 2024 14:40
@github-actions github-actions bot mentioned this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants