Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reading/writing a JITServer AOT cache in the background #16075

Merged
merged 5 commits into from Nov 11, 2022

Conversation

mpirvu
Copy link
Contributor

@mpirvu mpirvu commented Oct 7, 2022

This PR implements support for loading an AOT cache from a file and saving an AOT cache to a file. This is done by a compilation thread that works concurrently with other compilation threads that compile methods.
For reading, the sequence of actions is the following:

  1. Compilation thread at the server reads compilation request from client and determines whether the client wants to use the AOTCache
  2. Server checks its hashmap to see if the named cache already exists. If it exists, it will be used.
  3. If desired cache does not exist, server checks if the persistence feature is enabled and whether a file for the desired cache exists.
  4. If the AOT cache file exists, the server will memorize the name of the cache in a set, queue a special compilation request (with stream==0) and wake a compilation thread to process it. It will then proceed to compile the requested method, without waiting for the AOT cache to be loaded.
  5. If the AOT cache file does not exist (or persistence is disabled), the server will create an empty AOT cache which is going to be populated as methods get compiled.
  6. When a compilation thread extracts the special request from the queue, it will proceed to load the AOT cache from disk. The name of the AOT cache is retrieved from the set where it was memorized earlier. If there is a problem during the loading, the name of the cache will be included into a set of cached that are excluded from being loaded in the future.

For writing, the sequence of actions is the following:

  1. In outOfProcessCompilationEnd(), when a compilation finishes, we check if conditions for saving the AOT cache are met (in method triggerAOTCacheStoreToFileIfNeeded()). These conditions are: (A) "enough" methods were added to the in-memory AOT cache since the last snapshot and (B) "enough" time has passed since the last snapshot
  2. If we decide to save the cache then,
    (A) we set _saveOperationInProgress to prevent other threads doing a concurrent save;
    (B) add the name of the cache to a queue of caches to be saved;
    (C) queue a special compilation request with stream==SAVE_AOTCACHE_REQUEST
    (D) notify a sleeping compilation thread
  3. Later on, the special compilation request is picked up by a compilation thread in processEntry() and we call aotCacheMap->saveNextQueuedAOTCacheToFile();
  4. The name of the first cache to be saved is dequeued and we find the pointer to the cache
  5. Based on the cache name we construct the name of the file where the cache will be saved (e.g. JITServerAOTCache.default.J17)
  6. Try to open a possible existing cache file with the same name.
  7. If an existing snapshot already exists, read the header and determine if it contains more methods than we have. If it does, we don't perform the save
  8. If the existing snapshot is incompatible with our cache or if it contains fewer methods than what we have, we will save our own cache
  9. Create a temporary file and write our cache content to it
  10. Rename the temporary file to the final cache filename
  11. If the rename operation fails or if we cannot create the temporary file we mark this cache so that it we don't attempt other save operations for it
  12. Finalize the save operation by remembering the time when it took place and updating the number of methods that we need to have in order to allow for another save operation. Also reset _saveOperationInProgress flag

Signed-off-by: Marius Pirvu mpirvu@ca.ibm.com

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Oct 7, 2022
@mpirvu mpirvu added this to In progress in JIT as a Service via automation Oct 7, 2022
@mpirvu mpirvu marked this pull request as draft October 7, 2022 20:03
@mpirvu
Copy link
Contributor Author

mpirvu commented Oct 7, 2022

Attn. @AlexeyKhrabrov @cjjdespres

runtime/compiler/runtime/JITServerAOTCache.hpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.hpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.hpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
std::string cacheFileName = buildCacheFileName(compInfo->getPersistentInfo()->getJITServerAOTCacheDir(), name);
OMRPORT_ACCESS_FROM_OMRPORT(TR::Compiler->omrPortLib);
J9FileStat buf;
int32_t rc = omrfile_stat(cacheFileName.c_str(), 0, &buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider not stating the file separately and instead deal with the missing file when we try to open it. This introduces a TOCTTOU issue (a minor one, but still) and adds syscall (and possibly I/O) latency to the current compilation request that bypasses the cache being loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote the code I thought about the scenario where a client asks for a cache name the server has not seen before and therefore there is no cache. If we proceed without verifying the existence of the cache file, the in-memory cache will be created later (when we attempt the read the nonexistent file) and all the methods compiled until then will not be cached.
I guess only the second client will be affected because the compilations from this one will be cached and the third client will have access to them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then it does make sense to check if the file exists. Maybe a better solution is to open the file here and pass the file descriptor/stream to the thread that will do the snapshot load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that a better solution is to fopen() the file here. If it succeeds, add the FILE * along with the name to _cachesToLoadQueue. Otherwise the file either doesn't exist or we don't have permissions to read it, and in either case we should create a new cache and not try to read it from the file.

This way we avoid the scenario where cache directory permissions are misconfigured and each JITServer instance will try to read the cache instead of creating a new one, and the first few AOT methods will never be cached. I think that would also simplify the implementation a bit. But it's up to you whether you think it's worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking some more about this, I am leaning towards eliminating any system call at this point. In kubernetes the snapshot can be far away on NFS or COS (cloud object storage) and may take many milliseconds just to check whether the file exists. This could delay the current compilation (and others because of the cache map lock) a great deal.
The downside of proceeding without a snapshot existence check is that, the first N compilations from a first client that attaches to a JITServer which does not have the requested cache will not be put in the AOT cache. This may happen later on when another client requests the missing methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I wasn't even thinking of remote storage in my original suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought. Once we support merging caches/snapshots, the issue of not caching the first N compilations could be fixed by merging the cache being loaded from the file into the in-memory cache (which we can create and start populating immediately). Assuming that merging doesn't measurably slow down the load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have eliminated the stat-ing of the snapshot. In my experiments I only see one method being compiled before we choose to create an empty in-memory cache.

runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
std::string cacheFileName;
// If a directory is given, add it in the front of the cache name
if (!cacheDir.empty())
cacheFileName = cacheDir + "/"; // TODO: is there something more portable?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++17 has std::filesystem::path. Otherwise we can take inspiration from what the SCC code does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we cannot use C++17.

runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/env/J9PersistentInfo.hpp Show resolved Hide resolved
@mpirvu mpirvu force-pushed the AOTcachebackground branch 3 times, most recently from ac6d45f to 4602e46 Compare October 14, 2022 14:52
Copy link
Contributor

@AlexeyKhrabrov AlexeyKhrabrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early feedback for the WIP implementation of periodic snapshots. The PR name and description also needs to be updated accordingly.

runtime/compiler/control/JITServerCompilationThread.cpp Outdated Show resolved Hide resolved
@@ -133,6 +133,8 @@ int32_t J9::Options::_sharedROMClassCacheNumPartitions = 16;
int32_t J9::Options::_reconnectWaitTimeMs = 1000;
int32_t J9::Options::_highActiveThreadThreshold = -1;
int32_t J9::Options::_veryHighActiveThreadThreshold = -1;
int32_t J9::Options::_numExtraAotMethodsToTriggerSaveOperation = 200;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these default values chosen?

I think for the number of methods ideally we need to ensure that the overhead (user+kernel CPU time) of persisting the cache is not bigger than the savings in JIT CPU time from reusing that number of cached methods for one client. Note that we always write out the whole cache, not just the delta of newly stored methods.

It's a bit more complicated for the time period because the overhead (e.g. percentage of total CPU time taken by writing snapshots) depends on the number of caches. 10 seconds sounds reasonable in general, but ideally this should be supported by at least some performance numbers and some reasoning about the trade-offs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values were sort of arbitrary. They might change after experiments will be performed. Currently waiting for the per-requisites to be delivered.

runtime/compiler/control/J9Options.hpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.hpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.hpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 2, 2022

Some output from running in Kubernetes with 2 JITServers:

First JITServer:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache default for clientUID 12650522214143150127
#JITServer: AOT cache: Using existing cache default for clientUID 3383585214330991963
#JITServer: AOT cache: Using existing cache default for clientUID 16417070265669473128
#JITServer: AOT cache: t=75972 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=75982 Saved cache default to file /cache/JITServerAOTCache.default.J17. 200 methods saved in 2789 usec. Current time:Nov-02-2022_16:33:28
#JITServer: AOT cache: t=322361 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=322372 Saved cache default to file /cache/JITServerAOTCache.default.J17. 400 methods saved in 3960 usec. Current time:Nov-02-2022_16:37:34

Second JITServer:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache default for clientUID 6544352212498487522
#JITServer: AOT cache: t=198237 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 200 vs 200.
#JITServer: AOT cache: t=210209 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=210220 Saved cache default to file /cache/JITServerAOTCache.default.J17. 208 methods saved in 2175 usec. Current time:Nov-02-2022_16:35:42

It's debatable whether the second JITServer save operation is that useful: saved 208 methods over the 200 methods that existed.

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 2, 2022

Some traces after load is applied to AcmeAir:
JITServer1:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache default for clientUID 12650522214143150127
#JITServer: AOT cache: Using existing cache default for clientUID 3383585214330991963
#JITServer: AOT cache: Using existing cache default for clientUID 16417070265669473128
#JITServer: AOT cache: t=75972 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=75982 Saved cache default to file /cache/JITServerAOTCache.default.J17. 200 methods saved in 2789 usec. Current time:Nov-02-2022_16:33:28
#JITServer: AOT cache: t=322361 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=322372 Saved cache default to file /cache/JITServerAOTCache.default.J17. 400 methods saved in 3960 usec. Current time:Nov-02-2022_16:37:34
#JITServer: AOT cache: t=688386 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=688386 Saved cache default to file /cache/JITServerAOTCache.default.J17. 600 methods saved in 5281 usec. Current time:Nov-02-2022_16:43:40
#JITServer: AOT cache: t=1022423 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=1022423 Saved cache default to file /cache/JITServerAOTCache.default.J17. 800 methods saved in 6114 usec. Current time:Nov-02-2022_16:49:14
#JITServer: AOT cache: t=1522435 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=1522445 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1000 methods saved in 9398 usec. Current time:Nov-02-2022_16:57:34
#JITServer: AOT cache: t=2022438 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=2022448 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1200 methods saved in 13971 usec. Current time:Nov-02-2022_17:05:54
#JITServer: AOT cache: t=3022467 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3022487 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1400 methods saved in 10970 usec. Current time:Nov-02-2022_17:22:34
#JITServer: AOT cache: t=3038350 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3038361 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1773 methods saved in 12759 usec. Current time:Nov-02-2022_17:22:50
#JITServer: AOT cache: t=3762219 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3762239 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1973 methods saved in 19241 usec. Current time:Nov-02-2022_17:34:54
#JITServer: AOT cache: t=3772258 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3772288 Saved cache default to file /cache/JITServerAOTCache.default.J17. 2981 methods saved in 25735 usec. Current time:Nov-02-2022_17:35:04
#JITServer: AOT cache: t=3782292 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3782312 Saved cache default to file /cache/JITServerAOTCache.default.J17. 3464 methods saved in 26613 usec. Current time:Nov-02-2022_17:35:14
#JITServer: AOT cache: t=3797365 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3797395 Saved cache default to file /cache/JITServerAOTCache.default.J17. 3664 methods saved in 36463 usec. Current time:Nov-02-2022_17:35:29
#JITServer: AOT cache: t=3863743 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3863784 Saved cache default to file /cache/JITServerAOTCache.default.J17. 3864 methods saved in 43769 usec. Current time:Nov-02-2022_17:36:36

JITServer2:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache default for clientUID 6544352212498487522
#JITServer: AOT cache: t=198237 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 200 vs 200.
#JITServer: AOT cache: t=210209 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=210220 Saved cache default to file /cache/JITServerAOTCache.default.J17. 208 methods saved in 2175 usec. Current time:Nov-02-2022_16:35:42
#JITServer: AOT cache: t=522197 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=522197 Saved cache default to file /cache/JITServerAOTCache.default.J17. 408 methods saved in 4550 usec. Current time:Nov-02-2022_16:40:54
#JITServer: AOT cache: t=1016173 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=1016183 Saved cache default to file /cache/JITServerAOTCache.default.J17. 608 methods saved in 4572 usec. Current time:Nov-02-2022_16:49:08
#JITServer: AOT cache: t=1516200 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=1516210 Saved cache default to file /cache/JITServerAOTCache.default.J17. 808 methods saved in 5845 usec. Current time:Nov-02-2022_16:57:28
#JITServer: AOT cache: t=2018220 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=2018230 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1008 methods saved in 9403 usec. Current time:Nov-02-2022_17:05:50
#JITServer: AOT cache: t=3020275 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3020285 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1208 methods saved in 9815 usec. Current time:Nov-02-2022_17:22:32
#JITServer: AOT cache: t=3038172 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3038192 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1645 methods saved in 14284 usec. Current time:Nov-02-2022_17:22:50
#JITServer: AOT cache: t=3761638 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3761648 Saved cache default to file /cache/JITServerAOTCache.default.J17. 1845 methods saved in 12495 usec. Current time:Nov-02-2022_17:34:54
#JITServer: AOT cache: t=3771650 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3771660 Saved cache default to file /cache/JITServerAOTCache.default.J17. 2730 methods saved in 17707 usec. Current time:Nov-02-2022_17:35:04
#JITServer: AOT cache: t=3781948 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=3781978 Saved cache default to file /cache/JITServerAOTCache.default.J17. 3177 methods saved in 28098 usec. Current time:Nov-02-2022_17:35:14
#JITServer: AOT cache: t=3807790 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3664 vs 3377.
#JITServer: AOT cache: t=3817804 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3664 vs 3402.
#JITServer: AOT cache: t=3827884 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3664 vs 3420.
#JITServer: AOT cache: t=3838741 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3664 vs 3424.
#JITServer: AOT cache: t=3848945 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3664 vs 3432.
#JITServer: AOT cache: t=3858961 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3664 vs 3490.
#JITServer: AOT cache: t=3869303 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3882063 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3894113 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3905068 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3915690 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3942920 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3953367 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3964604 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3540.
#JITServer: AOT cache: t=3974888 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3543.
#JITServer: AOT cache: t=3984978 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3543.
#JITServer: AOT cache: t=4001115 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3543.
#JITServer: AOT cache: t=4012178 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3543.
#JITServer: AOT cache: t=4026474 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3543.
#JITServer: AOT cache: t=4038310 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3543.
#JITServer: AOT cache: t=4050185 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3543.
#JITServer: AOT cache: t=4063252 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3544.
#JITServer: AOT cache: t=4075766 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3545.
#JITServer: AOT cache: t=4093271 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for default. Existing snapshot has more methods 3864 vs 3548.

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 2, 2022

Some observations based on the traces above:

  1. Often, the span between 2 save operations is more than 10 sec. However, there are a few instances where within 10 seconds JITServer produces ~500-1000 methods.
  2. The overhead of saving a snapshot increases with the number of records and it can reach tens of milliseconds (44 ms is the max). 44/10000 is only 0.44%, but this can grow with other applications that have more methods. Maybe we should increase that 10 sec threshold as the number of records increases.
  3. The second JITServer tries periodically to save, but aborts because the existing snapshot is better. Maybe it's better to avoid this useless trying by setting a target (in number of methods) that needs to be reached before trying again.

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 2, 2022

Output from new code. Also applied load with more JMeter threads:
JITServer1:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache 'default' for clientUID 9866581823596286787
#JITServer: AOT cache: t=183152 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default'. Existing snapshot has more methods 200 vs 200.
#JITServer: AOT cache: t=194654 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=194654 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 205 methods saved in 3732 usec. Current time:Nov-02-2022_21:07:08
#JITServer: AOT cache: t=204744 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=204754 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 917 methods saved in 14029 usec. Current time:Nov-02-2022_21:07:18
#JITServer: AOT cache: t=214756 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=214766 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 1868 methods saved in 14497 usec. Current time:Nov-02-2022_21:07:28
#JITServer: AOT cache: t=224777 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=224797 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 2599 methods saved in 20251 usec. Current time:Nov-02-2022_21:07:38
#JITServer: AOT cache: t=240678 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default'. Existing snapshot has more methods 3076 vs 2799.
#JITServer: AOT cache: t=443123 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default'. Existing snapshot has more methods 3476 vs 3077.
#JITServer: AOT cache: Using existing cache 'default' for clientUID 14717696280519652969
#JITServer: AOT cache: Using existing cache 'default' for clientUID 15317750365123473221
#JITServer: AOT cache: t=1651038 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default'. Existing snapshot has more methods 3877 vs 3477.

JITServer2:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache 'default' for clientUID 4873921356539607263
#JITServer: AOT cache: Using existing cache 'default' for clientUID 10877126703294860478
#JITServer: AOT cache: Using existing cache 'default' for clientUID 3449809124632071115
#JITServer: AOT cache: t=59369 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=59379 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 200 methods saved in 2607 usec. Current time:Nov-02-2022_21:04:52
#JITServer: AOT cache: t=197348 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=197358 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 400 methods saved in 5331 usec. Current time:Nov-02-2022_21:07:10
#JITServer: AOT cache: t=207361 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=207381 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 1526 methods saved in 19558 usec. Current time:Nov-02-2022_21:07:20
#JITServer: AOT cache: t=217386 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=217406 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 2345 methods saved in 21950 usec. Current time:Nov-02-2022_21:07:30
#JITServer: AOT cache: t=227767 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=227798 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3076 methods saved in 28258 usec. Current time:Nov-02-2022_21:07:41
#JITServer: AOT cache: t=244023 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=244054 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3276 methods saved in 29086 usec. Current time:Nov-02-2022_21:07:57
#JITServer: AOT cache: t=313706 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=313736 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3476 methods saved in 29968 usec. Current time:Nov-02-2022_21:09:07
#JITServer: AOT cache: t=992500 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=992530 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3676 methods saved in 32182 usec. Current time:Nov-02-2022_21:20:26
#JITServer: AOT cache: Using existing cache 'default' for clientUID 14521947310791318725
#JITServer: AOT cache: Using existing cache 'default' for clientUID 14085489992398127893
#JITServer: AOT cache: Using existing cache 'default' for clientUID 9753828751896748830
#JITServer: AOT cache: Using existing cache 'default' for clientUID 5939425927236923358
#JITServer: AOT cache: Using existing cache 'default' for clientUID 10589365158930452603
#JITServer: AOT cache: Using existing cache 'default' for clientUID 7435685288238698017
#JITServer: AOT cache: Using existing cache 'default' for clientUID 7428717972999594426
#JITServer: AOT cache: Using existing cache 'default' for clientUID 12281571577371307818
#JITServer: AOT cache: t=1540144 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=1540197 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3876 methods saved in 51495 usec. Current time:Nov-02-2022_21:29:33
#JITServer: AOT cache: t=1873299 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=1873349 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 4076 methods saved in 55018 usec. Current time:Nov-02-2022_21:35:06

JITServer3 which starts later, so this one will do a read of a snapshot first:

#JITServer: AOT cache: t=56540 Queued comp request to load cache 'default' from file in the background
#JITServer: AOT cache: t=56540 Opened file /cache/JITServerAOTCache.default.J17 to load cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Using existing cache 'default' for clientUID 17755666387465820341
#JITServer: AOT cache: t=56580 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default'. Existing snapshot has more methods 3676 vs 3676.
#JITServer: AOT cache: Using existing cache 'default' for clientUID 16249877294617044856
#JITServer: AOT cache: Using existing cache 'default' for clientUID 7556048835508074553
#JITServer: AOT cache: Using existing cache 'default' for clientUID 6431310014690316226
#JITServer: AOT cache: Using existing cache 'default' for clientUID 9714244543616161090
#JITServer: AOT cache: Using existing cache 'default' for clientUID 12218089079539151343
#JITServer: AOT cache: t=66590 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default'. Existing snapshot has more methods 3876 vs 3839.
#JITServer: AOT cache: t=132595 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=132646 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3877 methods saved in 44307 usec. Current time:Nov-02-2022_21:30:42
#JITServer: AOT cache: t=2514016 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=2514067 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 4077 methods saved in 48624 usec. Current time:Nov-02-2022_22:10:23

@mpirvu mpirvu closed this Nov 2, 2022
JIT as a Service automation moved this from In progress to Done Nov 2, 2022
@mpirvu mpirvu reopened this Nov 2, 2022
JIT as a Service automation moved this from Done to In progress Nov 2, 2022
@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 2, 2022

In the logs above I see the some inefficiency in that a JITServer is overwriting an existing snapshot just to add a few methods.
Examples:
JITServer 2 writes 200 methods, while JITServer 1 comes after and writes 205 methods.
JITServer 2 writes 3876 methods, while JITServer 3 comes after and writes 3877 methods.
JITServer 2 writes 4076 methods, while JITServer 3 comes after and writes 4077 methods.

Maybe we should impose the condition that a server can overwrite the snapshot only if it adds 200 more methods.

@mpirvu mpirvu changed the title Support for reading a JITServer AOT cache in the background Support for reading/writing a JITServer AOT cache in the background Nov 3, 2022
@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 8, 2022

Review comments have been addressed. PR is ready for another review.

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 8, 2022

Some traces from the last version of the code. Starting 2 JITServers and 20 AcmeAir coontainers and then applying load:

JITServer1:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache 'default' for clientUID 9985767807270662574
#JITServer: AOT cache: Using existing cache 'default' for clientUID 9401846683300054448
#JITServer: AOT cache: Using existing cache 'default' for clientUID 1908892212535084701
#JITServer: AOT cache: Using existing cache 'default' for clientUID 15025416903702491277
#JITServer: AOT cache: Using existing cache 'default' for clientUID 13474397116947540347
#JITServer: AOT cache: Using existing cache 'default' for clientUID 4288072437234270029
#JITServer: AOT cache: Using existing cache 'default' for clientUID 3366163447438281684
#JITServer: AOT cache: Using existing cache 'default' for clientUID 6742072858498871875
#JITServer: AOT cache: t=71031 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=71031 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 200 methods saved in 5034 usec. Current time:Nov-08-2022_16:20:53
#JITServer: AOT cache: Using existing cache 'default' for clientUID 14519804746069585948
#JITServer: AOT cache: Using existing cache 'default' for clientUID 10414773304830688330
#JITServer: AOT cache: Using existing cache 'default' for clientUID 10961531260807825616
#JITServer: AOT cache: Using existing cache 'default' for clientUID 2716455396945928195
#JITServer: AOT cache: Using existing cache 'default' for clientUID 6499236187341184108
#JITServer: AOT cache: t=81114 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=81124 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 464 methods saved in 6114 usec. Current time:Nov-08-2022_16:21:03
#JITServer: AOT cache: t=129347 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=129368 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 664 methods saved in 16584 usec. Current time:Nov-08-2022_16:21:51
#JITServer: AOT cache: t=139395 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=139418 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 1469 methods saved in 21361 usec. Current time:Nov-08-2022_16:22:01
#JITServer: AOT cache: t=149425 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=149435 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 1900 methods saved in 19187 usec. Current time:Nov-08-2022_16:22:11
#JITServer: AOT cache: t=159437 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=159457 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 2362 methods saved in 20894 usec. Current time:Nov-08-2022_16:22:21
#JITServer: AOT cache: t=173356 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=173387 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 2562 methods saved in 23874 usec. Current time:Nov-08-2022_16:22:35
#JITServer: AOT cache: t=183398 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=183448 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3179 methods saved in 49162 usec. Current time:Nov-08-2022_16:22:45
#JITServer: AOT cache: t=208218 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=208310 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3379 methods saved in 85599 usec. Current time:Nov-08-2022_16:23:10
#JITServer: AOT cache: t=239943 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=239994 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3579 methods saved in 53672 usec. Current time:Nov-08-2022_16:23:42
#JITServer: AOT cache: t=386781 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=386841 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3779 methods saved in 55515 usec. Current time:Nov-08-2022_16:26:09
#JITServer: AOT cache: t=748407 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=748457 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 3979 methods saved in 54205 usec. Current time:Nov-08-2022_16:32:10

JITServer2:

#JITServer: AOT cache: Cache file /cache/JITServerAOTCache.default.J17 does not exist
#JITServer: AOT cache: Created empty cache 'default' for clientUID 3094955103293002888
#JITServer: AOT cache: Using existing cache 'default' for clientUID 11084497485763908621
#JITServer: AOT cache: Using existing cache 'default' for clientUID 13397199753352347234
#JITServer: AOT cache: Using existing cache 'default' for clientUID 1925706881385705151
#JITServer: AOT cache: Using existing cache 'default' for clientUID 2662793674406715983
#JITServer: AOT cache: Using existing cache 'default' for clientUID 17702926709551857251
#JITServer: AOT cache: t=74734 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 200 vs 200.
#JITServer: AOT cache: Using existing cache 'default' for clientUID 16207526092869569220
#JITServer: AOT cache: t=127089 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 401 vs 464.
#JITServer: AOT cache: t=137099 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=137119 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 1124 methods saved in 13471 usec. Current time:Nov-08-2022_16:21:59
#JITServer: AOT cache: t=147122 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=147142 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 1586 methods saved in 20119 usec. Current time:Nov-08-2022_16:22:09
#JITServer: AOT cache: t=157149 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=157169 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 2041 methods saved in 25537 usec. Current time:Nov-08-2022_16:22:19
#JITServer: AOT cache: t=168044 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 2241 vs 2362.
#JITServer: AOT cache: t=178047 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=178079 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 2884 methods saved in 30658 usec. Current time:Nov-08-2022_16:22:40
#JITServer: AOT cache: t=198252 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 3084 vs 3179.
#JITServer: AOT cache: t=241376 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 3379 vs 3579.

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 8, 2022

Started a third JITServer and re-started all the liberty pods. The third JITServer will read the AOT cache snapshot:

#JITServer: AOT cache: t=27370 Queued comp request to load cache 'default' from file in the background
#JITServer: AOT cache: t=27370 Opened file /cache/JITServerAOTCache.default.J17 to load cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Using existing cache 'default' for clientUID 70671829843765396
#JITServer: AOT cache: t=27481 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 3979 vs 3979.
#JITServer: AOT cache: Using existing cache 'default' for clientUID 10778118672436564718
#JITServer: AOT cache: Using existing cache 'default' for clientUID 11876546307625198757
#JITServer: AOT cache: Using existing cache 'default' for clientUID 13559553479372175125
#JITServer: AOT cache: Using existing cache 'default' for clientUID 7753202785764028094
#JITServer: AOT cache: Using existing cache 'default' for clientUID 15389435712332823637
#JITServer: AOT cache: Using existing cache 'default' for clientUID 1279860007096386656
#JITServer: AOT cache: Using existing cache 'default' for clientUID 12983123262324227456
#JITServer: AOT cache: Using existing cache 'default' for clientUID 15829186370000768502
#JITServer: AOT cache: t=491007 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 4179 vs 4179.

Copy link
Contributor

@AlexeyKhrabrov AlexeyKhrabrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments (mostly minor). Also, the commit messages are outdated:

  • "stream==0" in Load JITServer AOT cache file in the background.
  • -Xjit: option names in Save the JITServer AOT cache to file periodically.

runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
std::string cacheFileName = buildCacheFileName(compInfo->getPersistentInfo()->getJITServerAOTCacheDir(), name);
OMRPORT_ACCESS_FROM_OMRPORT(TR::Compiler->omrPortLib);
J9FileStat buf;
int32_t rc = omrfile_stat(cacheFileName.c_str(), 0, &buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that a better solution is to fopen() the file here. If it succeeds, add the FILE * along with the name to _cachesToLoadQueue. Otherwise the file either doesn't exist or we don't have permissions to read it, and in either case we should create a new cache and not try to read it from the file.

This way we avoid the scenario where cache directory permissions are misconfigured and each JITServer instance will try to read the cache instead of creating a new one, and the first few AOT methods will never be cached. I think that would also simplify the implementation a bit. But it's up to you whether you think it's worth it.

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 8, 2022

Another experiment testing scaling back to 0 (delete all JITServer instances) and then started 2 new instances and 20 AcmeAir instances.
JITServer 1:

#JITServer: AOT cache: t=20512 Queued comp request to load cache 'default' from file in the background
#JITServer: AOT cache: t=20512 Opened file /cache/JITServerAOTCache.default.J17 to load cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Using existing cache 'default' for clientUID 6602080437857706294
#JITServer: AOT cache: t=20592 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 4179 vs 4179.
#JITServer: AOT cache: Using existing cache 'default' for clientUID 6834422244102674832
#JITServer: AOT cache: Using existing cache 'default' for clientUID 12864565582795395563
#JITServer: AOT cache: Using existing cache 'default' for clientUID 9821848145892562700
#JITServer: AOT cache: Using existing cache 'default' for clientUID 15483784128407724604
#JITServer: AOT cache: Using existing cache 'default' for clientUID 8846143745920102038
#JITServer: AOT cache: Using existing cache 'default' for clientUID 4669160718405633303
#JITServer: AOT cache: Using existing cache 'default' for clientUID 16147204452172373040

JITServer2:

#JITServer: AOT cache: t=21326 Queued comp request to load cache 'default' from file in the background
#JITServer: AOT cache: t=21326 Opened file /cache/JITServerAOTCache.default.J17 to load cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Another thread is already loading cache 'default' from file
#JITServer: AOT cache: Using existing cache 'default' for clientUID 3715784029246681418
#JITServer: AOT cache: t=21426 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: Save operation aborted for cache 'default' because we don't have more methods than existing snapshot: 4179 vs 4179.
#JITServer: AOT cache: Using existing cache 'default' for clientUID 4541102138675813521
#JITServer: AOT cache: Using existing cache 'default' for clientUID 7906814618592385583
#JITServer: AOT cache: Using existing cache 'default' for clientUID 16508194663165058134
#JITServer: AOT cache: Using existing cache 'default' for clientUID 7544219559926071881
#JITServer: AOT cache: Using existing cache 'default' for clientUID 4613242010933333138
#JITServer: AOT cache: Using existing cache 'default' for clientUID 4047131869024311942
#JITServer: AOT cache: Using existing cache 'default' for clientUID 6270081620055274212
#JITServer: AOT cache: Using existing cache 'default' for clientUID 11882747137016827847
#JITServer: AOT cache: Using existing cache 'default' for clientUID 14017597409873414514
#JITServer: AOT cache: Using existing cache 'default' for clientUID 8180683746273783260
#JITServer: AOT cache: Using existing cache 'default' for clientUID 8154275294972987574
#JITServer: AOT cache: t=124769 Queued comp request to save cache 'default' to file in the background
#JITServer: AOT cache: t=124830 Saved cache 'default' to file /cache/JITServerAOTCache.default.J17. 4379 methods saved in 56181 usec. Current time:Nov-08-2022_17:24:13

@dsouzai
Copy link
Contributor

dsouzai commented Nov 8, 2022

Will wait until @AlexeyKhrabrov approves before running tests.

@dsouzai dsouzai self-assigned this Nov 8, 2022
// _cachesToSaveQueue is used to remember the caches that need to be saved to file
// A compilation thread dequeues the first name and tries to save the named cache to a file,
// possibly overwriting an existent file
PersistentList<std::string> _cachesToSaveQueue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we store cache names here rather than pointers to cache instances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure whether caches can be unloaded and I didn't want to hold onto a pointer that could become invalid.
It would be nice to allow cache unloading in the future. If there is no client wanting a particular cache anymore, why hold onto that memory for nothing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. It should be relatively easy to implement periodic purging of caches that are unused for a certain time period from memory (possibly writing the final version to the snapshot file if it adds any methods). We should open an issue for this.

New options that were introduced:
`-XX:[+|-]JITServerAOTCachePersistence`  to enable/disable the AOTCache persistence feature
`-XX:JITServerAOTCacheDir=<name>` to specify directory where caches are located (default is current directory)

Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
@mpirvu mpirvu force-pushed the AOTcachebackground branch 2 times, most recently from ceb7d6f to 7958281 Compare November 9, 2022 03:54
JITServer offers support for loading an AOT cache from a file.
This is done by a compilation thread that works concurrently with other
compilation threads that compile methods.
The sequence of actions is the following:
1. Compilation thread at the server reads compilation request from client
and determines whether the client wants to use the AOTCache
2. Server checks its hashmap to see if the named cache already exists
If it exists, it will be used.
3. If desired cache does not exist, server checks if the persistence feature
is enabled and whether a file for the desired cache exists.
4. If the AOT cache file exists, the server will memorize the name of the
cache in a set, queue a special compilation request and
wake a compilation thread to process it. It will then proceed to compile
the requestedn method, without waiting for the AOT cache to be loaded.
5. If the AOT cache file does not exist (or persistence is disabled),
the server will create an empty AOT cache which is going to be populated
as methods get compiled.
6. When a compilation thread extracts the special request from the queue,
it will proceed to load the AOT cache from disk. The name of the AOT cache
is retrieved from the set where it was memorized earlier. If there is a
problem during the loading, the name of the cache will be included into a
set of cached that are excluded from being loaded in the future.

Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
This commit implements the code that attempts to save a JITServer AOT cache
perodically to file. The save operation is done only under the following
conditions which must all be met:
1. The in-memory AOT cache has added at least 200 new methods since the last
successful save attempt (controlable by -Xjit:extraAotMethodsToTriggerSaveOperation=<N>)
2. At least 10 seconds have passed since the last save attempt (controlable
by -Xjit:aotCachePersistenceMinPeriodMs=<ms>)
3. The in-memory cache contains at least 200 more AOT methods that the cache on disk
(controlable with -Xjit:aotCachePersistenceMinDeltaMethods=<N>)

The save operation is done asynchronously by a compilation thread.
While a save operation is in progress, a flag is set to prevent other
threads from saving the same cache concurrently.

Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
When a first request for a named AOT cache is made, the code would check
for the existence of an appropriately named file cache and then place
a special compilation request to load the cache asynchronously.
However, since this file can be on network attached storage, the
check could take many milliseconds, blocking the current compilation
request and possibly others (due to holding the cache map monitor).
This commit removes the mentioned file check. The only downside is that,
we may delay the creation of an empty in-memory cache and thus we may
not store the first few AOT compilations.

Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 10, 2022

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 11, 2022

pLinux timeouts in this test:

1:33:32  ===============================================
11:33:32  Running test cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3 ...
11:33:32  ===============================================
11:33:32  cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3 Start Time: Thu Nov 10 12:15:20 2022 Epoch Time (ms): 1668096920648
11:33:32  "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
11:33:32  JVMSHRC005I No shared class caches available
11:33:32  JVMSHRC005I No shared class caches available
11:33:32  cache cleanup done
11:33:32  variation: Mode112
11:33:32  JVM_OPTIONS: -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs 
11:33:32  { success=0; \
11:33:32  itercnt=1; \
11:33:32  mkdir -p "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/output_16680926993108/cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3"; \
11:33:32  cd "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/output_16680926993108/cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3"; \
11:33:32  export LD_LIBRARY_PATH="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/openjdk-test-image/openj9:/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/../lib/default:/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/../lib/j9vm:"; \
11:33:32  "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs  -Xshareclasses:none \
11:33:32  -DTEST_ROOT="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests" \
11:33:32  -DJAR="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitest.jar:/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/lib/asm-all.jar" \
11:33:32  -DEXE='"/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs  -Xdump ' \
11:33:32  -DEXTRA_Add_OPEN_OPTION='--add-opens=java.base/java.lang.reflect=ALL-UNNAMED' \
11:33:32  -jar "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdline_options_tester/cmdlinetester.jar" \
11:33:32  -config "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitests_hcr.xml" \
11:33:32  -explainExcludes -xids all,linux_ppc-64 -xlist "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitests_excludes_17.xml" -nonZeroExitWhenError; \
11:33:32  if [ $? -eq 0 ] ; then echo ""; echo "cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3""_PASSED"; echo ""; cd /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/..; rm -f -r "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/output_16680926993108/cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3"; else echo ""; echo "cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3""_FAILED"; echo ""; fi; } 2>&1 | tee -a "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/output_16680926993108/TestTargetResult";
20:12:14  Cancelling nested steps due to timeout
20:12:15  Sending interrupt signal to process
20:12:20  /bin/sh: line 13: 22242 Terminated              "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs -Xshareclasses:none -DTEST_ROOT="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests" -DJAR="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitest.jar:/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../TKG/lib/asm-all.jar" -DEXE='"/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs  -Xdump ' -DEXTRA_Add_OPEN_OPTION='--add-opens=java.base/java.lang.reflect=ALL-UNNAMED' -jar "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdline_options_tester/cmdlinetester.jar" -config "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitests_hcr.xml" -explainExcludes -xids all,linux_ppc-64 -xlist "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_ppc64le_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitests_excludes_17.xml" -nonZeroExitWhenError
20:12:20  
20:12:20  cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3_FAILED

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 11, 2022

jenkins test sanity plinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 11, 2022

All tests have passed

@dsouzai dsouzai merged commit dcca03b into eclipse-openj9:master Nov 11, 2022
JIT as a Service automation moved this from In progress to Done Nov 11, 2022
@mpirvu mpirvu linked an issue Nov 11, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Development

Successfully merging this pull request may close these issues.

Add persistence to the JITServer AOT cache
3 participants