You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please acknowledge the following before creating a ticket
[X ] I have read the GitHub issues section of REPORTING-BUGS.
Description of the bug:
When the thinktime_spin option specifies a value that is within a few milliseconds of the thinktime value, in function handle_thinktime() it's possible in a VM environment for the duration of usec_spin() to exceed the thinktime value. While doing usec_spin(), the vCPU could get de-scheduled or the hypervisor could steal CPU time from the vCPU. When the guest vCPU runs after being scheduled again, it may read the clock and find that more time has elapsed than intended. In such a case, the code in handle_thinktime() calculates a negative value for 'left'. Then 'left' is cast as an unsigned long long for comparison with 'runtime_left', and 'left' is set to 'runtime_left'. Finally usec_sleep() is called for 'left' amount of time, which is until the end of the job, when it should not have slept at all.
The solution is to use code like this after the call to usec_spin():
if (total < td->o.thinktime)
left = td->o.thinktime - total;
else
left = 0;
I've tested this fix and it solves the problem I observe.
Environment: Ubuntu 20.04 running a 5.15 kernel as a guest VM in the Azure cloud. But the problem could happen in any VM environment where vCPUs are subject to getting de-scheduled or are sharing cycles with the hypervisor.
fio version: 3.35. The same problem happens with earlier versions such as 3.7 and 3.16.
Reproduction steps
See above.
The text was updated successfully, but these errors were encountered:
Please acknowledge the following before creating a ticket
Description of the bug:
When the thinktime_spin option specifies a value that is within a few milliseconds of the thinktime value, in function handle_thinktime() it's possible in a VM environment for the duration of usec_spin() to exceed the thinktime value. While doing usec_spin(), the vCPU could get de-scheduled or the hypervisor could steal CPU time from the vCPU. When the guest vCPU runs after being scheduled again, it may read the clock and find that more time has elapsed than intended. In such a case, the code in handle_thinktime() calculates a negative value for 'left'. Then 'left' is cast as an unsigned long long for comparison with 'runtime_left', and 'left' is set to 'runtime_left'. Finally usec_sleep() is called for 'left' amount of time, which is until the end of the job, when it should not have slept at all.
The solution is to use code like this after the call to usec_spin():
I've tested this fix and it solves the problem I observe.
Environment: Ubuntu 20.04 running a 5.15 kernel as a guest VM in the Azure cloud. But the problem could happen in any VM environment where vCPUs are subject to getting de-scheduled or are sharing cycles with the hypervisor.
fio version: 3.35. The same problem happens with earlier versions such as 3.7 and 3.16.
Reproduction steps
See above.
The text was updated successfully, but these errors were encountered: