Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid inconsistent quantile computation between architectures #80

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

gduranceau
Copy link
Contributor

What does this PR do?

For particular sketches and quantile values, GetValueAtQuantile can return very different results on different architectures, resulting from slightly different rounding on internal float values used in the computation. This PR exposes the issues in the tests and addresses the problem.

The problem was initially highlighted with a sketch containing 21 positive values, 14 of which being 0, and computing the 0.95 quantile of the sketch. In this case, we get:

rank = quantile * (count - 1) = 0.95 * (20 - 1) = 19

The number of zeroes in the sketch is then subtracted from this rank to find the key at rank 5 (19-14) in the positive-values store of the sketch.

0.95 does not have an exact representation as a float64 (0x1.e666666666666p-01). But when it is multiplied by 20, the rounding on the result lead to the value 19, which is represented in an exact way as float64 (0x1.3p+04). This is the case on both ARM64 and AMD64 systems on which I tested this. In GetValueAtQuantile, I could validate that rank for this particular sketch was computed consistently to this value 19 (0x1.3p+04).

I was surprised to see that when GetValueAtQuantile computes rank - s.zeroCount just after, the value was different on ARM64 and AMD64:

  • On AMD64, I got the value 5, represented exactly as a float64 (0x1.4p+02).
  • On ARM64, I got the value 5, but rounded down to 0x1.3ffffffffffffp+02.

This is enough to get a different bin for this rank, resulting in a very different 0.95 quantile for the sketch I was testing with.

Disassembling the code generated for GetValueAtQuantile showed that the rank variable was actually not reused by the compiler to compute rank - s.zeroCount. Instead, the fused multiply-add fnmsub instruction was used to compute quantile * (count - 1) - s.zeroCount directly. This instruction does not perform an intermediate rounding after the multiplication, which explains how the end result can be slightly different.

In GetValueAtQuantile, we want to make sure that the rank computed can be safely used as a reference point when later computing a rank to lookup in the negative-value or positive-value store. This can be achieved by an explicit floating point conversion:

rank := float64(quantile * (count - 1))

As per the go specification:

An implementation may combine multiple floating-point operations into a single fused operation, possibly across statements, and produce a result that differs from the value obtained by executing and rounding the instructions individually. An explicit floating-point type conversion rounds to the precision of the target type, preventing fusion that would discard that rounding.

This problem is not specific to the sketch I was testing with of course. It can happen whenever a sketch contains zero values, the computed quantile doesn't have an exact float64 representation (requiring rounding on the rank computation), and the rank to look for in the positive-value store falls exactly at the boundary between 2 bins.

The PR adds a test which exposes the issue on ARM64 when the explicit float64 conversion added by this PR is removed. It uses sketches built from values following a linear distribution, but with zeroes added once every 2 values. For the right test size (21) and quantile (0.95), the lower and upper quantiles computed on the Dataset are equal (because math.Floor(rank) and math.Ceil(rank) are equal), but the effective rank used by GetValueAtQuantile is rounded down, selecting the bin just below the expected one.

Copy link
Collaborator

@piochelepiotr piochelepiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting change, but I'm wondering why it was an issue?
The error on the value of a specific percentile was still kept within the bounds of relative accuracy right?

@piochelepiotr piochelepiotr merged commit f301b7a into DataDog:master Jun 24, 2024
1 check passed
@gduranceau
Copy link
Contributor Author

@piochelepiotr

The error on the value of a specific percentile was still kept within the bounds of relative accuracy right?

No. If you run the test I added without the float type conversion of rank, the test will fail, because the wrong bin is selected.

dmitryax pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Jun 25, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[github.com/DataDog/sketches-go](https://togithub.com/DataDog/sketches-go)
| `v1.4.5` -> `v1.4.6` |
[![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fDataDog%2fsketches-go/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fDataDog%2fsketches-go/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fDataDog%2fsketches-go/v1.4.5/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fDataDog%2fsketches-go/v1.4.5/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>DataDog/sketches-go (github.com/DataDog/sketches-go)</summary>

###
[`v1.4.6`](https://togithub.com/DataDog/sketches-go/releases/tag/v1.4.6)

[Compare
Source](https://togithub.com/DataDog/sketches-go/compare/v1.4.5...v1.4.6)

#### What's Changed

- Avoid inconsistent quantile computation between architectures by
[@&#8203;gduranceau](https://togithub.com/gduranceau) in
[DataDog/sketches-go#80

#### New Contributors

- [@&#8203;gduranceau](https://togithub.com/gduranceau) made their first
contribution in
[DataDog/sketches-go#80

**Full Changelog**:
DataDog/sketches-go@v1.4.5...v1.4.6

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MTMuMiIsInVwZGF0ZWRJblZlciI6IjM3LjQxMy4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com>
tomasmota pushed a commit to SpringerPE/opentelemetry-collector-contrib that referenced this pull request Jul 1, 2024
…y#33763)

[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[github.com/DataDog/sketches-go](https://togithub.com/DataDog/sketches-go)
| `v1.4.5` -> `v1.4.6` |
[![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fDataDog%2fsketches-go/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fDataDog%2fsketches-go/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fDataDog%2fsketches-go/v1.4.5/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fDataDog%2fsketches-go/v1.4.5/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>DataDog/sketches-go (github.com/DataDog/sketches-go)</summary>

###
[`v1.4.6`](https://togithub.com/DataDog/sketches-go/releases/tag/v1.4.6)

[Compare
Source](https://togithub.com/DataDog/sketches-go/compare/v1.4.5...v1.4.6)

#### What's Changed

- Avoid inconsistent quantile computation between architectures by
[@&open-telemetry#8203;gduranceau](https://togithub.com/gduranceau) in
[DataDog/sketches-go#80

#### New Contributors

- [@&open-telemetry#8203;gduranceau](https://togithub.com/gduranceau) made their first
contribution in
[DataDog/sketches-go#80

**Full Changelog**:
DataDog/sketches-go@v1.4.5...v1.4.6

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MTMuMiIsInVwZGF0ZWRJblZlciI6IjM3LjQxMy4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com>
lalith47 pushed a commit to lalith47/opentelemetry-collector-contrib that referenced this pull request Jul 1, 2024
…y#33763)

[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[github.com/DataDog/sketches-go](https://togithub.com/DataDog/sketches-go)
| `v1.4.5` -> `v1.4.6` |
[![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fDataDog%2fsketches-go/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fDataDog%2fsketches-go/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fDataDog%2fsketches-go/v1.4.5/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fDataDog%2fsketches-go/v1.4.5/v1.4.6?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>DataDog/sketches-go (github.com/DataDog/sketches-go)</summary>

###
[`v1.4.6`](https://togithub.com/DataDog/sketches-go/releases/tag/v1.4.6)

[Compare
Source](https://togithub.com/DataDog/sketches-go/compare/v1.4.5...v1.4.6)

#### What's Changed

- Avoid inconsistent quantile computation between architectures by
[@&open-telemetry#8203;gduranceau](https://togithub.com/gduranceau) in
[DataDog/sketches-go#80

#### New Contributors

- [@&open-telemetry#8203;gduranceau](https://togithub.com/gduranceau) made their first
contribution in
[DataDog/sketches-go#80

**Full Changelog**:
DataDog/sketches-go@v1.4.5...v1.4.6

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MTMuMiIsInVwZGF0ZWRJblZlciI6IjM3LjQxMy4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiLCJyZW5vdmF0ZWJvdCJdfQ==-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <107717825+opentelemetrybot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants