Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIUSupport: GC Heap re-sizing on restore #18217

Closed
tajila opened this issue Sep 29, 2023 · 26 comments
Closed

CRIUSupport: GC Heap re-sizing on restore #18217

tajila opened this issue Sep 29, 2023 · 26 comments
Labels
comp:gc criu Used to track CRIU snapshot related work doc:externals

Comments

@tajila
Copy link
Contributor

tajila commented Sep 29, 2023

Overview

The current documentation states the following:

The Java™ heap is configured at startup. The VM detects the available memory on the system and sizes the heap based 
on it. With the CRIU support, this means that the size of the Java heap 
([-Xms](https://www.ibm.com/docs/en/SSYKE2_8.0.0/openj9/xms/index.html), 
[-Xmx](https://www.ibm.com/docs/en/SSYKE2_8.0.0/openj9/xms/index.html)) and the respective heap regions, such 
as nursery and tenure at checkpoint will be same on restore. If a checkpoint is taken in a container with no memory 
limits and then restored in a container with memory limits, the restored VM instance does not detect the memory limits.

We would like to offer support for users who would like to create a single image and restore it on various nodes with different memory limits. There are several scenarios to consider.

Scenario Checkpoint 8g Restore 4g Restore 8g Restore 16g
1 No Options (default mode) Implicit Xmx6g [1] Implicit softmx3g [2] Implicit softmx6g (noop) Implicit softmx12g (error message [8])
2 Xmx on checkpoint only [3] -Xmx12g (potential issues [4]) -Xmx12g (potenial issues ) -Xmx12g (potenial issues) -Xmx12g
3 Xmx on checkpoint only with dynamic adjustment for restore [5] -Xmx12g (potential issues) Implicit softmx3g Implicit softmx6g Implicit softmx12g (noop)
4 Xmx on checkpoint only with dynamic adjustment for checkpoint & restore -Xmx12g (Implicit softmx6g) Implicit softmx3g Implicit softmx6g Implicit softmx12g (noop)
5 Xmx on checkpoint, softmx on restore -Xmx8g softmx3g softmx6g softmx12g (outputs error message [6])
6 No Options on checkpoint, softmx on restore Implicit Xmx6g softmx3g softmx6g (noop) softmx12g (outputs error message)
7 Xmx and Xms on checkpoint Xmx6g Xms6g softmx3g (outputs error message [7]) softmx6g (noop) softmx12g (outputs error message)

[1] When running in a container the JVM will set Xmx to 75% of the available memory if no Xmx is set.
[2] Softmx applies a soft limit on the max heap size. This limit cannot be larger than the max heap limit at startup. It cannot be lower than the minimum heap limit at startup. When running in a container the JVM can implicitly set softmx to 75% of the available memory upon restore.
[3] When Xmx is set at checkpoint (JVM start) this value is retained for the lifetime of the JVM.
[4] If Xmx is set to be larger than the available memory on the system there could be an OOM if one is unable to commit all required memory, there could be excessive paging or a bus error.
[5] Dynamic Max Heap adjustment (we can come up with a better name) is a mode that "overrides" the default Xmx value by applying a softmx that is 75% of the available memory.
[6] Softmx cannot be set larger than max heap limit at startup.
[7] Softmx cannot be set lower than min heap limit.
[8] If the implicit softmx is calculated to be higher than that max heap limit then an error message is outputed and softmx is set to the max heap limit.

Key Questions

  1. Should we Dynamic Max Heap adjustments in [4] and [5] be default policies or should it be opt-in with a new option?
  2. Should Xmx on restore be an alias for softmx?
  3. Should implicit softmx failures output an error message or should it adjust silently?
@tajila tajila added comp:gc criu Used to track CRIU snapshot related work labels Sep 29, 2023
@tajila
Copy link
Contributor Author

tajila commented Sep 29, 2023

@tajila
Copy link
Contributor Author

tajila commented Sep 29, 2023

Should we Dynamic Max Heap adjustments in [4] and [5] be default policies or should it be opt-in with a new option?

Personally, I think this should all be opt-in. If we make the dynamic adjustment by default we are essentially overriding a specified user request.

If someone says they want -Xmx12g at checkpoint and they restore on a 4g node, there are clearly potential downsides to this. But what if that's what they want, how can they express a behaviour they have already expressed? The documentation already clearly states that this is the outcome if one sets Xmx in this manner.

We could make an option called -XautoSoftmx and define it such that it will set softmx to 75% of available memory then users have more flexibility on how the heap is sized. This option can be specified at checkpoint and restore.

@dmitripivkine
Copy link
Contributor

I am with Tobi on this one. It would be hard to explain to customer why JVM throw OOM if -Xmx has been specified explicitly, but JVM decided to use less heap memory by some "internal" reason. Creation of new non-default behaviour sounds reasonable for me.

@vijaysun-omr
Copy link
Contributor

Should row 1 ("no options") say something like Implicit softmx 6g for the last column ("16g restore") ? I don't think we can support a softmx value greater than the mx value chosen at startup, correct ?

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Oct 2, 2023

The documentation already clearly states that this is the outcome if one sets Xmx in this manner.

I understand what is being said here, but just wanted to mention that the CRIU support feature has not technically GAed on its own at OpenJ9 (we are supporting Open Liberty's GA with the function as we have implemented for them). So, this should give us some freedom to change behavior if we want to.

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Oct 2, 2023

Should we Dynamic Max Heap adjustments in [4] and [5] be default policies or should it be opt-in with a new option?

Is this question about rows 3 and 4 in the table ? Just making sure because those are the ones that talks about "dynamic adjustment". I generally agree with Dmitri and you that we should probably not change the default behavior by applying a dynamic adjustment factor if a user explicitly specified an -Xmx option at checkpoint.

i.e. Dynamic adjustment can be applied if
a) User did not specify -Xmx on the checkpoint side by default (i.e. no -XautoSoftmx needs to be specified)
or
b) User specified -XautoSoftmx in which case it has an effect even if the user specified -Xmx on the checkpoint side

That is my thinking now, but happy to hear thoughts.

@vijaysun-omr
Copy link
Contributor

For your other two questions:

  1. Yes, I think we should alias
  2. I feel a "warning" message along with silent adjustment is what I was picturing for this. A warning message would go into the javacore and trace output (or whatever the closest precedent would be in terms of where we issue warning but still start up the JVM).

@tajila
Copy link
Contributor Author

tajila commented Oct 2, 2023

Should row 1 ("no options") say something like Implicit softmx 6g for the last column ("16g restore") ? I don't think we can support a softmx value greater than the mx value chosen at startup, correct ?

The default heuristics is to attempt to set it to 75%, hence 12g. We have a choice here to silently set it to 6g or output a message saying something like "ideally you'd get 12g but we cant, so here is 6g".

@tajila
Copy link
Contributor Author

tajila commented Oct 2, 2023

Is this question about rows 3 and 4 in the table ?

Yes. [4] and [5] refers to the section below the table. In hindsight I should have used letters :)

@dmitripivkine
Copy link
Contributor

For scenario Restore 16g "Implicit softmx12g": "(error message [8])" does not make sense. I think returning the error for -Xsoftmx option never specified explicitly is very confusing for customer. (noop) looks like is a better response in my opinion, particularly in the case nothing prevents JVM run properly.
And yes, -Xsoftmx can not be specified larger than -Xmx, JVMJ9GC020E -Xsoftmx too large for heap error is returned otherwise.

@amicic
Copy link
Contributor

amicic commented Oct 2, 2023

Note that while making these decisions that we also want to consider how we react on the other important h/w env change (from snapshot to restore), that is the CPU (HW thread count) that affects GC threads

The current behaviour is rather dynamic - the restore side will adapt to the current environment, even if h/w thread count is larger than the original snapshot GC thread count (what was rather large work to accommodate for). This is true if either GC thread count was left default or -Xgcthreads was explicitly set on snapshot side.
Note also that we fully obey Xgcthreads on restore side (no special soft/restore gcthreads variant of the option introduced).

[But while we adapt, I'll again re-iterate that we make some compromises (the larger the change factor in h/w thread count is - the more visible effects of the compromises may be), hence it's optimal to version snapshots for various restore deployments.]

We may want to consider aligning the behavior. For example to obey snapshot's Xgcthreads, what we seem to plan to do with Xmx?

@amicic
Copy link
Contributor

amicic commented Oct 2, 2023

Should we Dynamic Max Heap adjustments in [4] and [5] be default policies or should it be opt-in with a new option?

Is this question about rows 3 and 4 in the table ? Just making sure because those are the ones that talks about "dynamic adjustment". I generally agree with Dmitri and you that we should probably not change the default behavior by applying a dynamic adjustment factor if a user explicitly specified an -Xmx option at checkpoint.

i.e. Dynamic adjustment can be applied if a) User did not specify -Xmx on the checkpoint side by default (i.e. no -XautoSoftmx needs to be specified) or b) User specified -XautoSoftmx in which case it has an effect even if the user specified -Xmx on the checkpoint side

That is my thinking now, but happy to hear thoughts.

I'm ok with either way, so seems like we converging to the original proposal (case 2 is default and 3/4 are opt-in)
The only dilemma left here is the option we use to opt-in.

If we introduce a new option (that would be specified on snapshot side), I think it should have 'restore' in its name, for example -XautoSoftmxOnRestore is more clear than just -XautoSoftmx

Alternatively, we could extent the behavior of the existing -Xsoftmx option. If that is specified on snapshot side (along Xmx) it basically means that softmx functionality is activated from now on and restore should continue to auto adjust regardless of explicit Xmx. Since Xsoftmx takes an argument, it could also limit to which point auto-on-restore could really adjust (by default softmx is 0, so it effectively disables auto adjust on restore, if Xsoftmx is not specified). I'm not pushing that approach, but perhaps someone likes it....

@amicic
Copy link
Contributor

amicic commented Oct 2, 2023

[8] If the implicit softmx is calculated to be higher than that max heap limit then an error message is outputed and softmx is set to the max heap limit.

Case 1, restore 16G, I believe we should proceed with softmx6G and no error message, since it's less intrusive.

Note that what will help with potential confusion is that we will report in restore's VGC (re-)initialize stanza that softmx was set (this is just a general change that we recently added, regardless if softmx was explicitly set or implicitly activated on restore).

@amicic
Copy link
Contributor

amicic commented Oct 2, 2023

Case 7 (Xmx Xms set on snapshot) should be 2 sub-cases, with and without auto-adjust-softmx option.

If not specified, we should be consistent with case 2 in a way if we firmly obey snapshot's Xmx we should also obey Xms, so we proceed, no softmx set.

If the option is specified, for both 4G and 16G restore failing is probably ok. The existing messages 'softmx too large for Xms/Xmx' might not be as confusing (as for the one that Dmitri commented about for case 1), since here we would have that extra auto-adjust-softmx option.

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Oct 2, 2023

I want to understand the thinking on "error message" when softmx value chosen is too large for the restore environment and hence reduced (e.g. 16g case in row 1). As I wrote in an earlier comment, I was not really picturing a message any place our customers usually look, e.g. javacore or tracepoint output from the JVM. Having a message (this is why I called it a "warning" instead of an "error") could be useful for our own service work in case we wanted to understand why a certain value was picked. In such a case, it would be like we made it work silently from a customer perspective (since they never see the message to begin with) and hopefully not confusing.

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Oct 2, 2023

This is true if either GC thread count was left default or -Xgcthreads was explicitly set on snapshot side.

Interesting, I did not know that we would adapt on the restore side even a user explicitly sets number of GC threads on the checkpoint side (a rare case I feel) via the -Xgcthreads option. One other way of becoming more consistent vs what we are discussing for -Xmx and -Xms in this issue is to change the behavior for -Xgcthreads to not adapt on the restore side if the user had specified -Xgcthreads on the checkpoint side.

Having said that, there may well be an inherent/significant difference in the difficulty level of the problem of adapting GC threads vs adapting Java heap memory sizes and so one need not have complete consistency in how the two problems are handled, though obviously if we can reasonably be consistent, then we can consider it.

@vijaysun-omr
Copy link
Contributor

Regarding Aleks' comment #18217 (comment) I am fine with -XautoSoftmxOnRestore

I prefer that over the alternative in that comment, i.e. having the use of mx and softmx simultaneously on the checkpoint side mean that dynamic adjustment is to be applied on the restore side. One use case that needs to be kept in mind is that the Open Liberty server script can fail to restore under certain (hopefully rare) situations, and in such cases, it falls back to start Open Liberty up normally (in JVM mode, rather than a restore). Perhaps we want to think through this scenario with respect to all the cases in the table anyway, but this was one reason I was thinking a new option such as -XautoSoftmxOnRestore that is a nop if we are not restoring might be preferable (easier to reason about it).

@vijaysun-omr
Copy link
Contributor

Regarding Aleks' comment #18217 (comment)
I agree that we should attempt to be as consistent in terms of -Xms as we are for -Xmx.

@tajila
Copy link
Contributor Author

tajila commented Oct 26, 2023

Update:

In Row 1 (No Options (default mode))

  • do not obey default heuristics, do the best you can to continue running the JVM
  • no explicit warning message (verbose GC will indicate the choice that was made)
  • if the choice is a no-op then verbose:gc will not output a softmx message indicating that softmx did not kick-in

Row 2

  • we always respect -Xmx, no automatic adjustments unless the user permits us to via something like -XdynamicHeapAdjustment

Row 3 & 4

  • -XdynamicHeapAdjustment can only be supplied at checkpoint time. It applies to both checkpoint and restore.

Row 7

  • Xms should have the same behaviour as Xmx

@tajila
Copy link
Contributor Author

tajila commented Oct 26, 2023

@kangyining Please write a documentation that captures the behaviours above.

@kangyining
Copy link
Contributor

Proposed doc:

-Xsoftmx use cases in snapshot restore:

  1. If no -Xmx, -Xms are specified:
    We do our best to continue running the JVM no matter what restore side has, which means we don’t follow the default heuristics if necessary.

For example: if we do snapshot on a machine with 8GB RAM running docker, the heuristic will set the max heap memory to be 6GB. When we restore on another machine with 4GB RAM, we will set the max heap memory to be full 4GB instead of the default 3GB.

Note here, JVM won’t generate output explicit warning message for softmx change, but record in verbose gc. And if JVM decides to do nothing, it outputs nothing which suggests softmx is not introduced.

  1. When -Xmx or -Xms is set to be too large, it can cause a few problems or even errors. We provide a new option -XdynamicHeapAdjustment here. It aims to auto adjust the system and avoid potential problems caused by -Xmx or -Xms being too large.
    2.1 If -Xmx or -Xms is specified, but no -XdynamicHeapAdjustment:
    We will always respect the user options and won’t perform any automatic adjustments.

Note, if -Xmx is set to be larger than the available memory, a few errors could happen: OOM, excessive paging, or a bus error; if -Xms is set to be larger than the available memory or softmx value, an error message will be sent and a fatal error occurs.

2.2 If -Xmx or -Xms is specified together with -XdynamicHeapAdjustment:
We will do the best to avoid any kind of error caused by either -Xmx or -Xms being too large.

For example: if the machine has 8GB memory but -Xmx12G, then we will still use implicit softmx 6G as default heuristics give.
Note we can only specify -XdynamicHeapAdjustment at checkpoint time, and it applies to both checkpoint and restore.

@kangyining
Copy link
Contributor

kangyining commented Nov 7, 2023

Test # XdynamicHeapAdjustmentRestore Xmx XX:MaxRAMPercentage= Xms Xsoftmx snapshot Xsoftmx restore Snapshot 8g Restore 4g Restore 8g Restore 12g
1 true N/A N/A N/A N/A N/A max 6g softmx 3g noop noop
2 true 12g N/A N/A N/A N/A max 12g softmx 3g softmx 6g softmx 9g
3 true N/A 50 N/A N/A N/A max 4g softmx 2g noop noop
4 true 12g 50 N/A N/A N/A max 12g softmx 3g softmx 6g softmx 9g
5 true N/A N/A 4g N/A N/A max 6g min 4g softmx 4g noop noop
6 true N/A N/A N/A 4g N/A max 6g softmx 4g softmx 3g sofxmx cleared sofxmx cleared
7 true N/A N/A N/A N/A 3g max 6g softmx 4g softmx 3g sofxmx cleared sofxmx cleared
8 true N/A N/A N/A 4g 3g max 6g softmx 4g softmx 3g sofxmx cleared sofxmx cleared
9 N/A N/A N/A N/A N/A N/A max 6g softmx 3g noop noop
10 N/A 12g N/A N/A N/A N/A max 12g max 12g max 12g max 12g
11 N/A N/A 50 N/A N/A N/A max 4g softmx 2g noop noop
12 N/A 12g 50 N/A N/A N/A max 12g max 12g max 12g max 12g
13 N/A N/A N/A 4g N/A N/A max 6g softmx 4g softmx 4g noop noop
14 N/A N/A N/A N/A 4g N/A max 6g softmx 4g softmx 4g softmx 4g softmx 4g
15 N/A N/A N/A N/A N/A 3g max 6g softmx 3g softmx 3g softmx 3g
16 N/A N/A N/A N/A 4g 3g max 6g softmx 4g softmx 3g softmx 3g softmx 3g

@kangyining
Copy link
Contributor

Another question rises about the -Xsoftmx at the snapshot/restore side:

On snapshot side, we probably shouldn't preserve its command line value since we can set it by the Java_com_ibm_java_lang_management_internal_MemoryMXBeanImpl_setMaxHeapSizeImpl() function. The command line option behaves really like an intermediate call.

On restore side, if we specify the softmx at command line we probably want to use it as a lifetime one, since this is somewhat similar to a second "Xmx" for restore machine.

Should we support these two different modes of softmx or we have a better approach for these?

@vijaysun-omr
Copy link
Contributor

Just so I understand, what is the down side of obeying Java_com_ibm_java_lang_management_internal_MemoryMXBeanImpl_setMaxHeapSizeImpl() function after restore ? i.e. have it affect softmx value specified on restore ?

I don't know if we need to change/complicate the behavior of softmx in order to make (I think) a rare scenario work differently (rare == someone has softmx explicitly specified on restore side and they call the MXBean api and they are looking for some autonomic behavior by the GC to give them some other behavior).

@kangyining
Copy link
Contributor

@vijaysun-omr Hi Vijay, after some discussions with Aleks, I think I misunderstand the user-specified softmx option at restore previously. I'll just correct it here:

For snapshot side command line, we just store the user-provided value, and it behaves really like an intermediate call towards Java_com_ibm_java_lang_management_internal_MemoryMXBeanImpl_setMaxHeapSizeImpl().

For restore side initialization, we have two (potentially) softmx sources, one from default heuristic introduced by this PR, and one from the command line. We will just prioritize the user-provided softmx value here. As you said, we will simply obey Java_com_ibm_java_lang_management_internal_MemoryMXBeanImpl_setMaxHeapSizeImpl() after initialization.

kangyining added a commit to kangyining/openj9 that referenced this issue Nov 20, 2023
When we snapshot on a machine with higher physical memory and restore on one
with lower physical memory, the InstantOn JVM heap size will be larger than
a usual JVM run on the restore machine. And the momory usage will be abnormally
higher.

Update the extension with the correct physical memory limit and utilize the
existing softmax mechanism to solve the problem. Note we only update the softmax
if it is smaller than the Xmx size. If it is smaller than the
Xms size, then we upscale the softmx to match with the Xms size.

Related:
eclipse-openj9#17596

Detailed behavior table:
eclipse-openj9#18217 (comment)

Signed-off-by: Frank Kang frank.kang@ibm.com
@amicic
Copy link
Contributor

amicic commented Mar 5, 2024

Resolved by #18168 (and a couple of others PRs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gc criu Used to track CRIU snapshot related work doc:externals
Projects
Status: Done
Development

No branches or pull requests

5 participants