[release/8.0] Fix OOM in BigInteger OuterLoop tests causing SIGKILL on Linux#126011
[release/8.0] Fix OOM in BigInteger OuterLoop tests causing SIGKILL on Linux#126011Copilot wants to merge 3 commits intorelease/8.0from
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-numerics |
…lue/10 Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/3875817a-ab5d-4e0e-a906-e8d6e2162043
a69e62a to
058392a
Compare
|
@tannergooding I believe if you are OK with us taking this we need you to add the "Servicing approved" label (per the test only change backport flow) |
|
This is in 8.0 but it continues to cause noise as we continue to run tests there. |
There was a problem hiding this comment.
Pull request overview
This PR adjusts System.Runtime.Numerics.Tests BigInteger OuterLoop tests on release/8.0 to avoid extremely large BigInteger left-shifts that were triggering Linux OOM/SIGKILL in constrained/parallel test environments.
Changes:
- Reduce the “very large” shift size in
LargeValueLogTestsfromint.MaxValue / 10to1 << 24and update the expected-log calculation accordingly. - Reduce the “very large” shift size in
DoubleExplicitCastFromLargeBigIntegerTestsfromint.MaxValue / 10to1 << 24, preserving the infinity assertions while avoiding huge allocations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/libraries/System.Runtime.Numerics/tests/BigInteger/log.cs | Lowers per-iteration BigInteger growth in large Log tests to prevent OOM while keeping assertions meaningful. |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/cast_from.cs | Lowers per-iteration BigInteger growth in large double cast tests to prevent OOM while still producing ±Infinity. |
| /// Test Log Method on Very Large BigInteger more than (1 << (1 << 24)) by base 2 | ||
| /// Tested BigInteger are: pow(2, startShift + smallLoopShift * [1..smallLoopLimit] + (1 << 24) * [1..bigLoopLimit]) |
There was a problem hiding this comment.
The XML doc now reads like a C# expression (1 << (1 << 24)), which is easy to misinterpret. Consider rephrasing to a clearer mathematical description (e.g., 2^(1<<24) or “1 shifted left by 16,777,216 bits”) so readers don’t confuse the shift-count with the shifted value.
| /// Test Log Method on Very Large BigInteger more than (1 << (1 << 24)) by base 2 | |
| /// Tested BigInteger are: pow(2, startShift + smallLoopShift * [1..smallLoopLimit] + (1 << 24) * [1..bigLoopLimit]) | |
| /// Test Log method on very large BigInteger values greater than 2^(1 << 24), that is, greater than 1 shifted left by 16,777,216 bits, using base 2. | |
| /// Tested BigInteger values are 2^(startShift + smallShift * [1..smallShiftLoopLimit] + (1 << 24) * [1..bigShiftLoopLimit]). |
| temp = temp << (1 << 24); | ||
| double expected = | ||
| (double)startShift + | ||
| smallShift * (double)(i + 1) + | ||
| (int.MaxValue / 10) * (double)(j + 1); | ||
| (1 << 24) * (double)(j + 1); |
There was a problem hiding this comment.
(1 << 24) is a non-obvious tuning constant used both for the shift and expected-value calculation. Consider introducing a named const int (and optionally a brief comment explaining the choice) to make future adjustments less error-prone.
| /// Test cast to Double on Very Large BigInteger more than (1 << (1 << 24)) | ||
| /// Tested BigInteger are: +/-pow(2, startShift + smallLoopShift * [1..smallLoopLimit] + (1 << 24) * [1..bigLoopLimit]) |
There was a problem hiding this comment.
The XML doc uses (1 << (1 << 24)), which reads like code and is easy to misinterpret. Consider rephrasing to a clearer mathematical description (e.g., 2^(1<<24) or “1 shifted left by 16,777,216 bits”).
| /// Test cast to Double on Very Large BigInteger more than (1 << (1 << 24)) | |
| /// Tested BigInteger are: +/-pow(2, startShift + smallLoopShift * [1..smallLoopLimit] + (1 << 24) * [1..bigLoopLimit]) | |
| /// Test cast to Double on very large BigInteger values with magnitude greater than 2^(1 << 24) | |
| /// Tested BigInteger values are: +/-2^(startShift + smallLoopShift * [1..smallLoopLimit] + (1 << 24) * [1..bigLoopLimit]) |
| temp = temp << (1 << 24); | ||
| VerifyDoubleExplicitCastFromBigInteger(double.PositiveInfinity, temp); |
There was a problem hiding this comment.
(1 << 24) is a non-obvious tuning constant that controls how large the BigInteger gets. Consider introducing a named const int (shared with the doc comment if appropriate) so the intent is clearer and the value can be adjusted in one place.
|
@copilot address feedback |
…or bit shift Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/763c2d49-b5a3-4483-ae1e-484fe34e0b4f
Done in cabb434. I've addressed all four review items:
|
|
@copilot revert back to the first commit. I commented on the wrong one -- for backports we don't want changes to the code. |
…ommit Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/d2473d7c-9ab4-431e-bd7d-948a40220e50
main PR
Description
System.Runtime.Numerics.Testsouterloop tests were crashing with exit code 137 (SIGKILL) on Linux becauseLargeValueLogTestsandDoubleExplicitCastFromLargeBigIntegerTestsshiftedBigIntegerbyint.MaxValue / 10bits (~214M bits) per iteration in nested loops (up to 4×3 = 12 times), creating values of ~107MB each. With 2 parallel test threads in a Docker container, this reliably triggered the OOM killer.On
main, PR #102874 addedBigInteger.MaxLengthwhich causes these shifts to throwOverflowExceptioninstead of allocating.release/8.0has no such cap, so allocations succeed and exhaust container memory.Changes:
log.cs/LargeValueLogTests: Replaceint.MaxValue / 10with1 << 24in both the shift operation and the expected log value calculation. Test correctness is preserved — the values are still well above theBigInteger.Logprecision threshold.cast_from.cs/DoubleExplicitCastFromLargeBigIntegerTests: Replaceint.MaxValue / 10with1 << 24.2^(1<<24)still far exceedsdouble.MaxValue ≈ 2^1024, so infinity assertions remain valid.Peak per-iteration allocation drops from ~107MB to ~8MB, eliminating the OOM in constrained container environments.
Customer Impact
Persistent 100% failure rate of
System.Runtime.Numerics.Testsouterloop on all four Linux legs (x64, arm64, musl-x64, musl-arm64) in every scheduledrelease/8.0build. Known issue hit count: 56 times in one month.Regression
No — this is a long-standing test design issue exposed by container memory limits, not a regression introduced in a recent release.
Testing
Test-only changes. The fix reduces allocation sizes while keeping the test assertions semantically correct. The library itself is not modified.
Risk
Very low. Changes are confined to test helper functions; no product code is touched. The assertions remain correct with smaller (but still large) values.
Package authoring signed off?
IMPORTANT: If this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.
This is a test-only change; no NuGet package authoring is required.
📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.