-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding flag to disable Achilles cache - fixes #2034 #2172
Conversation
Another enhancement: In CDMCacheService.java (Function cacheRecords(): There is a loop of java code:
This can be replaced with a query that can prouduce min/max concepts directly in sql:
|
Thanks @chrisknoll - I've opened a separate issue for this change so that the focus on this PR is for disabling the Achilles caching capability. I agree with what you have proposed above and will work on changing that functionality. |
Made achilles_result_concept_count a required table. Modified ddl population for record count table Records are cached from achilles_result_concept_count instead of achilles_results.
Latest commit makes some changes: I split off some of the functions into separate functions, and some of the naming was confusing, ie: you had cacheRecords with multiple function signatures, but one function was updateing the webapi cache, the other was fetching from the cdm source, and so I renamed them to be a bit clearer in their function. The main change is the requirement of achilles_concept_recourd_count, so most likely we want put this into the 2.13 relase and not the 2.12.1 hotfix. |
Still working through validating the behavior in the caching layer. This is a report I'm seeing through pgAdmin dashboard desribign tuples in and tuples out: In this graph, we see an expected insrt of 2000 rows (tuple=row in PG) but we're seeing this massive 800,000 tuples out and it is not clear if this is some side effect of Hibernate, or this is normal behavior of PG, but it is suspicious, and updates are going very slow (I estimate it's between 20-30 inserts per second, which will cause the 11,000,000 inserts to copy the record count records into cache extremely slow. I'm trying to investigate if it is something to do with our own PG configuration: before this type of caching, the PG WebAPI would only insert data if an asset (ie: cohort def, concept set) was saved, or any time an analyitical task was executed, it would record the job. This would amount to a workload of a dozen or so updates/insert per minute, and the caching change is switching the load to be thousands of updates per second if we ever hope to complete 11 million updates during webapi startup. My current thinking is that hibernate was the wrong solution to implement a caching layer on: there are transaction semantics and other unknown variables (to me) that IMO add overhead and other complications that a pure caching mechanism should not be concerned with. |
Limit the warming batch cache to only concepts that exist in the vocabulary.
Making a note here for the release notes: the |
…disable-achilles-cache
@anton-abushkevich , this PR is ready, but we wanted to have someone approve from outside our organization. can you review? |
Adds the ability to turn off caching of Achilles results for all CDMs.