Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] scalar leaks when running nds query51 #10509

Closed
jbrennan333 opened this issue Feb 28, 2024 · 1 comment · Fixed by #10510
Closed

[BUG] scalar leaks when running nds query51 #10509

jbrennan333 opened this issue Feb 28, 2024 · 1 comment · Fixed by #10510
Assignees
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@jbrennan333
Copy link
Collaborator

Describe the bug
When running nds query51 on my desktop at scale 100, I am seeing some leaked scalars:

24/02/27 22:55:04 ERROR Scalar: A SCALAR WAS LEAKED(ID: 1514186 7f408402c1a0)

It looks like these are coming from SumBinaryFixer:

24/02/27 22:55:04 ERROR MemoryCleaner: Leaked scalar (ID: 1508241): 2024-02-27 22:54:59.0732 UTC: INC
java.lang.Thread.getStackTrace(Thread.java:1564)
ai.rapids.cudf.MemoryCleaner$RefCountDebugItem.<init>(MemoryCleaner.java:341)
ai.rapids.cudf.MemoryCleaner$Cleaner.addRef(MemoryCleaner.java:90)
ai.rapids.cudf.Scalar.incRefCount(Scalar.java:540)
ai.rapids.cudf.Scalar.<init>(Scalar.java:528)
ai.rapids.cudf.ColumnView.getScalarElement(ColumnView.java:4002)
com.nvidia.spark.rapids.window.SumBinaryFixer.$anonfun$updateState$11(GpuWindowExpression.scala:1412)
scala.Option.map(Option.scala:230)
com.nvidia.spark.rapids.window.SumBinaryFixer.updateState(GpuWindowExpression.scala:1412)
com.nvidia.spark.rapids.window.SumBinaryFixer.$anonfun$fixUpDecimal$24(GpuWindowExpression.scala:1588)
com.nvidia.spark.rapids.Arm$.closeOnExcept(Arm.scala:98)
com.nvidia.spark.rapids.window.SumBinaryFixer.$anonfun$fixUpDecimal$23(GpuWindowExpression.scala:1587)
com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
com.nvidia.spark.rapids.window.SumBinaryFixer.$anonfun$fixUpDecimal$20(GpuWindowExpression.scala:1586)
com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
com.nvidia.spark.rapids.window.SumBinaryFixer.fixUpDecimal(GpuWindowExpression.scala:1580)
com.nvidia.spark.rapids.window.SumBinaryFixer.fixUp(GpuWindowExpression.scala:1600)
com.nvidia.spark.rapids.window.GpuRunningWindowIterator.$anonfun$fixUpAll$2(GpuRunningWindowExec.scala:105)
com.nvidia.spark.rapids.window.GpuRunningWindowIterator.$anonfun$fixUpAll$2$adapted(GpuRunningWindowExec.scala:101)

It looks like previousOverflow is not being closed in the close method.

Steps/Code to reproduce bug
Run nds query51 and observe errors in output.

Expected behavior
We should not leak scalars.

@jbrennan333 jbrennan333 added bug Something isn't working ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Feb 28, 2024
@jbrennan333 jbrennan333 self-assigned this Feb 28, 2024
@jbrennan333
Copy link
Collaborator Author

I am also seeing this in a query93 when I run with ShuffleExchangeExec disabled (which I am doing for testing other things).

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants