Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thinktime_spin option can cause fio worker to sleep for too long #1588

Closed
kelleymh opened this issue Jun 30, 2023 · 4 comments
Closed

thinktime_spin option can cause fio worker to sleep for too long #1588

kelleymh opened this issue Jun 30, 2023 · 4 comments

Comments

@kelleymh
Copy link
Contributor

kelleymh commented Jun 30, 2023

Please acknowledge the following before creating a ticket

Description of the bug:
When the thinktime_spin option specifies a value that is within a few milliseconds of the thinktime value, in function handle_thinktime() it's possible in a VM environment for the duration of usec_spin() to exceed the thinktime value. While doing usec_spin(), the vCPU could get de-scheduled or the hypervisor could steal CPU time from the vCPU. When the guest vCPU runs after being scheduled again, it may read the clock and find that more time has elapsed than intended. In such a case, the code in handle_thinktime() calculates a negative value for 'left'. Then 'left' is cast as an unsigned long long for comparison with 'runtime_left', and 'left' is set to 'runtime_left'. Finally usec_sleep() is called for 'left' amount of time, which is until the end of the job, when it should not have slept at all.

The solution is to use code like this after the call to usec_spin():

    if (total < td->o.thinktime)
            left = td->o.thinktime - total;
    else
            left = 0;

I've tested this fix and it solves the problem I observe.

Environment: Ubuntu 20.04 running a 5.15 kernel as a guest VM in the Azure cloud. But the problem could happen in any VM environment where vCPUs are subject to getting de-scheduled or are sharing cycles with the hypervisor.

fio version: 3.35. The same problem happens with earlier versions such as 3.7 and 3.16.

Reproduction steps
See above.

@ankit-sam
Copy link
Contributor

Hi @kelleymh the changes look good to me, can you please send a patch to the fio mailing list.

@axboe
Copy link
Owner

axboe commented Jul 14, 2023

Or a PR in here is fine too. I'm fine making the edit too myself, let us know @kelleymh what you prefer.

@kelleymh
Copy link
Contributor Author

I have a patch ready. I'll post it to the fio mailing list.

@vincentkfu
Copy link
Collaborator

Thanks for reviewing @ankit-sam

@axboe axboe closed this as completed in 14adf6e Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants