Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio-devices: block: Fix latency counter for average read/write #5763

Merged

Conversation

likebreath
Copy link
Member

The cumulative average formula [1] requires to use signed integers for proper calculations, while calculated result (e.g. cumulative average) is always positive. This patch reflects the above requirements in our code.

[1] https://en.wikipedia.org/wiki/Moving_average#Cumulative_average

Fixes: #5745

The cumulative average formula [1] requires to use signed integers
for proper calculations, while calculated result (e.g. cumulative
average) is always positive. This patch reflects the above requirements
in our code.

[1] https://en.wikipedia.org/wiki/Moving_average#Cumulative_average

Fixes: cloud-hypervisor#5745

Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath likebreath requested a review from a team as a code owner September 11, 2023 19:24
@likebreath
Copy link
Member Author

@rbradford Can you please take a look at it? I think it is a more appropriate fix to #5745. Please let me know your thoughts. Thanks.

+ ((latency * LATENCY_SCALE) as i64 - read_avg as i64)
/ (read_ops_last + read_ops.0) as i64)
.try_into()
.unwrap()
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised about the signedness issue in my updated version in my PR:

                            read_avg.saturating_add_signed(
                                ((latency as i64)
                                    .saturating_mul(LATENCY_SCALE as i64)
                                    .saturating_sub(read_avg as i64))
                                    / (read_ops_last + read_ops.0) as i64

Do you think there is still scope for overflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of. To overflow the calculation, you will need to have a latency lager than (u64::MAX-1)/LATENCY_SCALE ms (that is roughly 58494 years). If we really worry about silent overflow, I can replace all as i64 with try_into().unwrap(), so that any future overflows will be reported as a panic. I am fine either way.

+ ((latency * LATENCY_SCALE) - read_avg)
/ (read_ops_last + read_ops.0)
// Cumulative average is guaranteed to be
// positive if being calculated properly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this comment - could add an assert...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was put to explain why the unwrap() was used here, e.g. programming error (not an error to be handled/reported).

@rbradford
Copy link
Member

test_vfio_user has been very flaky recently so this PR is ready to merge.

@rbradford rbradford merged commit b76d0e8 into cloud-hypervisor:main Sep 11, 2023
21 of 22 checks passed
@likebreath likebreath added the bug-fix Bug fix to include in release notes label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix Bug fix to include in release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

thread '_disk0_q0' panicked at 'attempt to subtract with overflow',
2 participants