Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/amdgpu: Don't print error for "No such process" during resume #2340

Merged

Conversation

fdavid-amd
Copy link
Contributor

During the late stages of restore, each process being resumed gets an ioctl call to KFD_CRIU_OP_RESUME. If the process has no kfd process info, this call with fail with -ESRCH. This is normal behaviour, so we shouldn't print an error message for it.

During the late stages of restore, each process being resumed gets
an ioctl call to KFD_CRIU_OP_RESUME. If the process has no kfd
process info, this call with fail with -ESRCH. This is normal
behaviour, so we shouldn't print an error message for it.

Signed-off-by: David Francis <David.Francis@amd.com>
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (07a090b) 70.56% compared to head (931de85) 70.57%.

❗ Current head 931de85 differs from pull request most recent head 62f40b6. Consider uploading reports for the commit 62f40b6 to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##           criu-dev    #2340   +/-   ##
=========================================
  Coverage     70.56%   70.57%           
=========================================
  Files           132      132           
  Lines         33619    33619           
=========================================
+ Hits          23723    23726    +3     
+ Misses         9896     9893    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avagin avagin merged commit a9cbdad into checkpoint-restore:criu-dev Feb 1, 2024
39 checks passed
@@ -1999,7 +1999,10 @@ int amdgpu_plugin_resume_devices_late(int target_pid)
args.op = KFD_CRIU_OP_RESUME;
pr_info("Calling IOCTL to start notifiers and queues\n");
if (kmtIoctl(fd, AMDKFD_IOC_CRIU_OP, &args) == -1) {
pr_perror("restore late ioctl failed");
if (errno == ESRCH)
pr_info("Pid %d has no kfd process info\n", target_pid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't print error here, but we still return -1, I feel that this behavior is kind of inconsistent? Do I miss something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

criu "ignores" this error code: https://github.com/checkpoint-restore/criu/blob/criu-dev/criu/cr-restore.c#L2487

I think @Snorch is right, we have to return 0 in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's do it:

#2343

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants