Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to disable the Achilles cache #2034

Closed
anthonysena opened this issue May 19, 2022 · 10 comments · Fixed by #2172
Closed

Ability to disable the Achilles cache #2034

anthonysena opened this issue May 19, 2022 · 10 comments · Fixed by #2172
Assignees
Milestone

Comments

@anthonysena
Copy link
Collaborator

anthonysena commented May 19, 2022

Per #2032, we'd like the ability to: 1) disable the Achilles cache warming process and 2) control it using the priority of the results daimon. Also relates to discussion on #2031.

@anthonysena
Copy link
Collaborator Author

From discussion with @chrisknoll and @alex-odysseus this morning, there are a few questions we'd like to answer about the current (v2.11) behavior of the Achilles cache:

  • How long does it take to warm the Achilles cache? How many jobs are created based on N number of CDMs?
  • What is the storage size (GB) of the cached Achilles data in the WebAPI database?
  • Does each start of WebAPI perform a copy of the Achilles results or is there some mechanism that only copies updates on startup?
  • Does the caching mechanism handle purging of cache results when a source is deleted?

@chrisknoll @alex-odysseus: If I missed any questions, please post them here so we can keep track and document the current behavior. This will help decide how to proceed for v2.12.

@alex-odysseus
Copy link
Contributor

Sergey, please chime in @ssuvorov-fls

@ssuvorov-fls
Copy link
Contributor

  • How long does it take to warm the Achilles cache? How many jobs are created based on N number of CDMs?
    Number of jobs is set via parameter "cache.jobs.count". The default value is "3"
    Warming duration is based on the size of the data. Approximately 20-30 minutes
  • What is the storage size (GB) of the cached Achilles data in the WebAPI database?
    It depends on the size of data. About 100-150 MB
  • Does each start of WebAPI perform a copy of the Achilles results or is there some mechanism that only copies updates on startup?
    WebAPI overwrites previous records each time and inserts new records.
  • Does the caching mechanism handle purging of cache results when a source is deleted?
    No

@anthonysena
Copy link
Collaborator Author

Q: How long does it take to warm the Achilles cache? How many jobs are created based on N number of CDMs?
Number of jobs is set via parameter "cache.jobs.count". The default value is "3"

A: Warming duration is based on the size of the data. Approximately 20-30 minutes

So in an environment with a large # of data sources (let's say 40), we'd want to adjust the "cache.jobs.count" to a value of 8 which would then spawn 5 jobs (40/8 == 5) which would copy over the data? Considering the default job queue length of 10, we would still have 5 worker jobs available to service cohort generations, etc. Do I have this correct?

Q: What is the storage size (GB) of the cached Achilles data in the WebAPI database?

A: It depends on the size of data. About 100-150 MB

This is good to know from an operations perspective.

Q: Does each start of WebAPI perform a copy of the Achilles results or is there some mechanism that only copies updates on startup?

A: WebAPI overwrites previous records each time and inserts new records.

It would be ideal if this were only performed once - presumably Achilles data is static unless a CDM is refreshed?

Q: Does the caching mechanism handle purging of cache results when a source is deleted?

A: No

Also good to know from an operations perspective as people may want to purge this data or ideally the deletion of the source from ATLAS kicks off a process to remove those cache results.

@anthonysena anthonysena modified the milestones: V2.12, V2.13 Aug 3, 2022
@anthonysena
Copy link
Collaborator Author

Idea per @chrisknoll: Could we extend the source_daimon table to include a field for "caching" int field. The default is 0 and where we want to enable the cache, we set it for 1.

@chrisknoll
Copy link
Collaborator

Note: I'd put this at the 'source' level (not sourceDaimon because I think the caching applies to the entire source and not the individual daimon level (ie: CDM, RESULTS, VOCABULARY, etc).

Maybe it might make sense to do it at the daimon level later, but for now I think a switch at the entire source level is a good first step.

@chrisknoll
Copy link
Collaborator

I have a branch for implementing is_cache_enabled on a source, and the warmCache function now filters by Vocabulary Daimon + results Daimon + isCachgeEnamed == true, however, I'm concerned about what happens when you go to an achilles report on a source that doesn't have cache enabled: @ssuvorov-fls : I think you mentioned in other thread that even when cache is disabled, it will read from the cache. Does that mean that if cache is not enabled, the webapi cache table will be empty, and the code always will read from the empty cache table? I would expect (and could try to implement) that when the cache is disabled, it should just read from the CDM results table directly....can you confirm the behavior, and if it always reads from the cache table even when cache is disabled, can you propose where to check if caching is enabled so that we read from the results table?

@ssuvorov-fls
Copy link
Contributor

@chrisknoll
hi
there're 2 parameters "cdm.result.cache.warming.enable" and "cdm.cache.cron.warming.enable"
when "cdm.result.cache.warming.enable" is true then user can warm caches from configuration page
when both of them are true then user can warm caches from configuration page and application can also warm them from cron job
but when user asks for cdm results they are obtained from cache or in case of empty cache they are are obtained from cdm source, put into cache and returned to user

the main differences between warming the cdm cache and caching cdm results during request is speed - caching of the whole cdm is much faster

@chrisknoll
Copy link
Collaborator

Understood. Thank you, @ssuvorov-fls. It sounds like there is a case in the code that if the cache is empty it will fetch the results from the CDM. I think you are saying that if caching is disabled on a source, it will find the cache empty and fetch from the source. The only issue is that it will then put it into the cache and return to user, when I'd want it to just return the data to the user.

But for the first implementation, I believe it will work that we can turn on/off sources for caching, and then let the fetching of the cdm results fall back to pulling from the cdm source.

Let me know if I have anything wrong here. The goal is to let the webapi start up with a minimum set of cached sources (ie frequently used ones can be cached=true) but for the other less-used cdms can cache later but won't hold up the startup of webapi to do it.

@ssuvorov-fls
Copy link
Contributor

@chrisknoll
This's correct

@anthonysena anthonysena modified the milestones: V2.13, v2.12.1 Dec 13, 2022
@anthonysena anthonysena linked a pull request Dec 13, 2022 that will close this issue
@chrisknoll chrisknoll modified the milestones: v2.12.1, v2.13 Jan 23, 2023
chrisknoll added a commit that referenced this issue Feb 7, 2023
fixes #2034

Set cdm.cache.cron.warming.enable = false by default
Made achilles_result_concept_count a required table.
Modified ddl population for record count table
Records are cached from achilles_result_concept_count instead of achilles_results.

Co-authored-by: Chris Knoll <cknoll@ohdsi.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants