-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source summary slow queries #1008
Comments
We also need those on concepts/mappings_collections, which may soon become much larger table than concepts/mappings_sources. It will be extremely expensive to maintain that metadata in sync on concepts/mappings_collections and counts are inherently expensive on Postgres. Taking a step back I feel that what we actually need is saving these active counts instead. Similarly the last_child_updated for sources/collections should be stored. These counts will only have to be updated for HEAD sources and collections. We should invalidate these values upon modifying concepts/mappings, but only set them when requesting. I would save them in DB for transaction safety and simplicity. |
Count queries now lead to time outs on staging. The more content we load into staging the more this issue escalates. @snyaggarwal is there any chance you could shuffle things on your plate to address it sooner than later? |
…urce/Collection
@rkorytkowski I have added active_concepts and active_mappings on Source/Collection/Version. This should be fixed now. |
Thanks for addressing it so quickly. I feel we need a few adjustments here. I'm not sure if I caught all cases from your code but could you please describe the final approach? Also a few lines of documentation on active_concepts and active_mappings fields would be great i.e. point to how/when they are populated. |
When doing bulk imports how do we update the fields? Is it every resource or in bulk? |
For now, it updates after every resource import. I am looking at optimizing this. |
@snyaggarwal I would suggest to have separate fields to mark counts as outdated (super fast update) instead of updating them right away. By the end of bulk import we could do a simple query to check which counts need to be updated and do it. It could be even done synchronously in bulk import as it is already processed asynchronously. |
… are bulk updated after content import
…ating parent counts
…ill only update concepts or mappings count when required
We might still be missing something. When I go to https://app.staging.openconceptlab.org/#/orgs/CIEL/sources/CIEL/?q=&isTable=true&isList=false&isSplit=false&page=1&exactMatch=off I still see in logs:
It's as if the count was not taken from cache. |
Yea the listing API returns 'num_found' in response headers. It cant take the count from cached fields because the listing query may be of any type and combination (search/facets/scoped/global). It runs count on the queryset returned based on the query asked. |
the cached count is used in getting |
This is deployed on all environments. Closing this out. |
2021-08-26 13:00:58 UTC:10.1.11.186(47126):postgres@postgres:[20203]:LOG: duration: 153187.007 ms statement: SELECT COUNT() AS "__count" FROM "mappings" INNER JOIN "mappings_sources" ON ("mappings"."id" = "mappings_sources"."mapping_id") WHERE ("mappings_sources"."source_id" = 746 AND "mappings"."is_active" AND NOT "mappings"."retired" AND "mappings"."is_latest_version")
and
2021-08-26 13:02:20 UTC:10.1.11.186(41156):postgres@postgres:[13448]:LOG: duration: 145315.597 ms statement: SELECT COUNT() AS "__count" FROM "concepts" INNER JOIN "concepts_sources" ON ("concepts"."id" = "concepts_sources"."concept_id") WHERE ("concepts_sources"."source_id" = 746 AND "concepts"."is_active" AND NOT "concepts"."retired" AND "concepts"."is_latest_version")
Possible improvements:
Let's consider moving or duplicating is_active, retired and is_latest_version to mappings_sources and concepts_sources to avoid the join.
The text was updated successfully, but these errors were encountered: