Skip to content

ARROW-4086: [Java] Add apis to debug memory alloc failures#4369

Closed
pravindra wants to merge 1 commit intoapache:masterfrom
pravindra:memdebug
Closed

ARROW-4086: [Java] Add apis to debug memory alloc failures#4369
pravindra wants to merge 1 commit intoapache:masterfrom
pravindra:memdebug

Conversation

@pravindra
Copy link
Copy Markdown
Contributor

  • On failures, capture state for each allocator in chain and add the
    details to the exception.
  • add APIs to get parent/child allocators
  • Do not mask the original exception when allocating from vectors

@pravindra pravindra changed the title ARROW-4086 : [WIP][Java] Add apis to debug memory alloc failures ARROW-4086 :[Java] Add apis to debug memory alloc failures May 24, 2019
@pravindra pravindra requested a review from siddharthteotia May 27, 2019 02:57
@pravindra
Copy link
Copy Markdown
Contributor Author

@praveenbingo can you please review ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, would it make sense to construct the OOM with reservation and outcome instead of string formatting here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is slightly different - the allocation is failing when allocating the reservation. The other failures happen when allocating a buffer. Also, we throw the same exception if the allocator's metrics satisfy the request, but netty doesn't have available memory (can happen due to fragmentation).

so, i think it's better to leave the string be constructed as per the context.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can avoid the second comparison (compiler might optimize it away anyways) if you return the new allocation inside the if block above and return SUCCESS_INSANCE here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what do you think about adding /foo=/true, /bar=/ false it would make this more readable outside and IDE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made the change but it looks a bit ugly inside an IDE - intellij shows me the field name twice now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nested ternary operators are a little hard to read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any standard policy for changing the public API in Java?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None so far, the changes i've seen so far are backward incompatible..they should not be though :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert something about the OOM message?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be clearer to call this something like failureAtParentLimit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to failureAtParentLimitOutcomeDetails

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failureAtRootLimit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to failureAtRootLimitOutcomeDetails

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't clear necessary anymore?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because allocateNew is already doing the same. please see line 410..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like it might be cleaner to overload. Is there a reason not to so?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an overloaded fn - but i'm not sure that's what you meant. please take a look.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None so far, the changes i've seen so far are backward incompatible..they should not be though :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use optional instead ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no concurrent identity hash map..might prefer that since it will prevent future bugs..

(or) might create a synchronized map that wraps the identity map..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - created a synchronized map wrapper over the identity map

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work consolidating this..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because allocateNew is already doing the same. please see line 410..

@kou kou changed the title ARROW-4086 :[Java] Add apis to debug memory alloc failures ARROW-4086: [Java] Add apis to debug memory alloc failures May 27, 2019
@codecov-io
Copy link
Copy Markdown

codecov-io commented May 28, 2019

Codecov Report

Merging #4369 into master will increase coverage by 1.06%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4369      +/-   ##
==========================================
+ Coverage   88.32%   89.38%   +1.06%     
==========================================
  Files         781      639     -142     
  Lines       98540    88864    -9676     
  Branches     1251        0    -1251     
==========================================
- Hits        87031    79434    -7597     
+ Misses      11273     9430    -1843     
+ Partials      236        0     -236
Impacted Files Coverage Δ
cpp/src/parquet/schema-test.cc 90.63% <0%> (-8.86%) ⬇️
cpp/src/arrow/flight/client.cc 92.27% <0%> (-2.03%) ⬇️
cpp/src/parquet/column_scanner.h 95.6% <0%> (-0.87%) ⬇️
cpp/src/arrow/flight/server.cc 90.06% <0%> (-0.78%) ⬇️
python/pyarrow/_parquet.pyx 90.85% <0%> (-0.35%) ⬇️
cpp/src/arrow/flight/internal.cc 91.3% <0%> (-0.23%) ⬇️
cpp/src/parquet/reader-test.cc 100% <0%> (ø) ⬆️
python/pyarrow/cuda.py 100% <0%> (ø) ⬆️
python/pyarrow/ipc.py 100% <0%> (ø) ⬆️
cpp/src/arrow/flight/test-util.h 100% <0%> (ø) ⬆️
... and 201 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c4d433...7d88ccb. Read the comment docs.

@praveenbingo
Copy link
Copy Markdown
Contributor

@emkornfield +1 LGTM..Please merge if you are ok with the changes.

Copy link
Copy Markdown
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments but otherwise looks good.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a javadoc or CI will fail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- On failures, capture state for each allocator in chain and add the
  details to the exception.
- add APIs to get parent/child allocators
- Do not mask the original exception when allocating from vectors
@pravindra pravindra closed this in 2ea7350 May 29, 2019
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
- On failures, capture state for each allocator in chain and add the
  details to the exception.
- add APIs to get parent/child allocators
- Do not mask the original exception when allocating from vectors

Author: Pindikura Ravindra <ravindra@dremio.com>

Closes apache#4369 from pravindra/memdebug and squashes the following commits:

92a269b <Pindikura Ravindra> ARROW-4086 :  Add apis to debug memory alloc failures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants