Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LIVY-667][WIP] Collect a part of partition to the driver by batch to avoid OOM #233

Closed
wants to merge 2 commits into from

Conversation

runzhiwang
Copy link
Contributor

What changes were proposed in this pull request?

Collecting a part of partition to the driver by batch to avoid OOM

Background:
1.When enable livy.server.thrift.incrementalCollect, thrift use "toLocalIterator" to load one partition at each time instead of the whole rdd to avoid OutOfMemory. However, if the largest partition is too big, the OutOfMemory still occurs.

2.This PR collect a part of partition to the driver by batch at each time to avoid OOM.

How was this patch tested?

create a big size of data into one partition and query them all.

@runzhiwang runzhiwang force-pushed the batch-partition branch 3 times, most recently from 60eb5b9 to 5b919c9 Compare September 16, 2019 13:21
@runzhiwang runzhiwang changed the title [LIVY-667] Collect a part of partition to the driver by batch to avoid OOM [LIVY-667][WIP] Collect a part of partition to the driver by batch to avoid OOM Sep 17, 2019
@runzhiwang runzhiwang force-pushed the batch-partition branch 2 times, most recently from eacd9a6 to 966cd53 Compare September 17, 2019 03:33
@runzhiwang runzhiwang force-pushed the batch-partition branch 13 times, most recently from 2557aa0 to b3f13c8 Compare September 27, 2019 06:07
@codecov-io
Copy link

codecov-io commented Sep 27, 2019

Codecov Report

Merging #233 into master will decrease coverage by 0.03%.
The diff coverage is 75%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #233      +/-   ##
============================================
- Coverage     68.37%   68.33%   -0.04%     
  Complexity      921      921              
============================================
  Files           100      100              
  Lines          5729     5732       +3     
  Branches        871      872       +1     
============================================
  Hits           3917     3917              
- Misses         1247     1249       +2     
- Partials        565      566       +1
Impacted Files Coverage Δ Complexity Δ
...rver/src/main/scala/org/apache/livy/LivyConf.scala 95.85% <ø> (-0.03%) 21 <0> (ø)
rsc/src/main/java/org/apache/livy/rsc/RSCConf.java 86.23% <100%> (+0.25%) 7 <0> (ø) ⬇️
...e/livy/server/interactive/InteractiveSession.scala 69.25% <50%> (-0.12%) 46 <0> (ø)
...ain/scala/org/apache/livy/utils/SparkYarnApp.scala 66.01% <0%> (-1.31%) 40% <0%> (ø)
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java 77.96% <0%> (ø) 41% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8251eb...6df4ede. Read the comment docs.

@runzhiwang runzhiwang closed this Sep 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants