Source summary slow queries #1008

snyaggarwal · 2021-09-29T11:12:13Z

2021-08-26 13:00:58 UTC:10.1.11.186(47126):postgres@postgres:[20203]:LOG: duration: 153187.007 ms statement: SELECT COUNT() AS "__count" FROM "mappings" INNER JOIN "mappings_sources" ON ("mappings"."id" = "mappings_sources"."mapping_id") WHERE ("mappings_sources"."source_id" = 746 AND "mappings"."is_active" AND NOT "mappings"."retired" AND "mappings"."is_latest_version")
and
2021-08-26 13:02:20 UTC:10.1.11.186(41156):postgres@postgres:[13448]:LOG: duration: 145315.597 ms statement: SELECT COUNT() AS "__count" FROM "concepts" INNER JOIN "concepts_sources" ON ("concepts"."id" = "concepts_sources"."concept_id") WHERE ("concepts_sources"."source_id" = 746 AND "concepts"."is_active" AND NOT "concepts"."retired" AND "concepts"."is_latest_version")

Possible improvements:
Let's consider moving or duplicating is_active, retired and is_latest_version to mappings_sources and concepts_sources to avoid the join.

rkorytkowski · 2021-10-14T11:48:33Z

We also need those on concepts/mappings_collections, which may soon become much larger table than concepts/mappings_sources. It will be extremely expensive to maintain that metadata in sync on concepts/mappings_collections and counts are inherently expensive on Postgres.

Taking a step back I feel that what we actually need is saving these active counts instead. Similarly the last_child_updated for sources/collections should be stored. These counts will only have to be updated for HEAD sources and collections. We should invalidate these values upon modifying concepts/mappings, but only set them when requesting. I would save them in DB for transaction safety and simplicity.

rkorytkowski · 2021-11-05T15:56:36Z

Count queries now lead to time outs on staging. The more content we load into staging the more this issue escalates. @snyaggarwal is there any chance you could shuffle things on your plate to address it sooner than later?

…urce/Collection

snyaggarwal · 2021-11-08T06:05:30Z

@rkorytkowski I have added active_concepts and active_mappings on Source/Collection/Version. This should be fixed now.

…owners

…ent queue

rkorytkowski · 2021-11-08T14:13:13Z

Thanks for addressing it so quickly. I feel we need a few adjustments here. I'm not sure if I caught all cases from your code but could you please describe the final approach? Also a few lines of documentation on active_concepts and active_mappings fields would be great i.e. point to how/when they are populated.

rkorytkowski · 2021-11-08T14:14:40Z

When doing bulk imports how do we update the fields? Is it every resource or in bulk?

snyaggarwal · 2021-11-09T11:29:42Z

When doing bulk imports how do we update the fields? Is it every resource or in bulk?

For now, it updates after every resource import. I am looking at optimizing this.
The issue with a single update of counts at the end of the import is figuring out which source/collections to update. Right now we don't have restrictions on content belonging to multiple parents. Single import file may have concepts/mappings from multiple parents. It gets even more complicated with parallel imports.

rkorytkowski · 2021-11-09T12:43:23Z

@snyaggarwal I would suggest to have separate fields to mark counts as outdated (super fast update) instead of updating them right away. By the end of bulk import we could do a simple query to check which counts need to be updated and do it. It could be even done synchronously in bulk import as it is already processed asynchronously.

… are bulk updated after content import

…ating parent counts

…ill only update concepts or mappings count when required

…e child count

rkorytkowski · 2021-11-12T14:57:25Z

We might still be missing something. When I go to https://app.staging.openconceptlab.org/#/orgs/CIEL/sources/CIEL/?q=&isTable=true&isList=false&isSplit=false&page=1&exactMatch=off

I still see in logs:

2021-11-12 14:52:56 UTC:10.1.11.167(42564):postgres@postgres:[15347]:LOG: duration: 1255.279 ms statement: SELECT COUNT(*) AS "__count" FROM "concepts" INNER JOIN "concepts_sources" ON ("concepts"."id" = "concepts_sources"."concept_id") INNER JOIN "sources" ON ("concepts_sources"."source_id" = "sources"."id") INNER JOIN "organizations" ON ("sources"."organization_id" = "organizations"."id") WHERE ("concepts"."is_active" AND "sources"."mnemonic" = 'CIEL' AND "sources"."version" = 'HEAD' AND "organizations"."mnemonic" = 'CIEL' AND NOT "concepts"."retired" AND "concepts"."is_latest_version" AND NOT ("concepts"."public_access" = 'None'))
2021-11-12 14:52:56 UTC:10.1.11.167(42564):postgres@postgres:[15347]:LOG: duration: 757.761 ms statement: SELECT COUNT(*) AS "__count" FROM "concepts" INNER JOIN "concepts_sources" ON ("concepts"."id" = "concepts_sources"."concept_id") INNER JOIN "sources" ON ("concepts_sources"."source_id" = "sources"."id") INNER JOIN "organizations" ON ("sources"."organization_id" = "organizations"."id") WHERE ("concepts"."is_active" AND "sources"."mnemonic" = 'CIEL' AND "sources"."version" = 'HEAD' AND "organizations"."mnemonic" = 'CIEL' AND NOT "concepts"."retired" AND "concepts"."is_latest_version" AND NOT ("concepts"."public_access" = 'None'))

It's as if the count was not taken from cache.

snyaggarwal · 2021-11-15T02:24:48Z

Yea the listing API returns 'num_found' in response headers. It cant take the count from cached fields because the listing query may be of any type and combination (search/facets/scoped/global). It runs count on the queryset returned based on the query asked.

snyaggarwal · 2021-11-15T02:25:30Z

the cached count is used in getting /summary/ of source/collection/version

snyaggarwal · 2021-12-09T10:07:07Z

This is deployed on all environments. Closing this out.

snyaggarwal added the api2 OCL API v2 label Sep 29, 2021

snyaggarwal self-assigned this Sep 29, 2021

snyaggarwal mentioned this issue Sep 29, 2021

Fix slow running DB queries #949

Closed

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 8, 2021

OpenConceptLab/ocl_issues#1008 | saving concepts/mappings count on So…

a572730

…urce/Collection

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 8, 2021

OpenConceptLab/ocl_issues#1008 | fixing pylints | unused arguments

5cb63fe

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 8, 2021

OpenConceptLab/ocl_issues#1008 | not eager loading concepts/mappings …

d0397e5

…owners

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 8, 2021

OpenConceptLab/ocl_issues#1008 | hierarchy asyn processing on concurr…

e27a347

…ent queue

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 9, 2021

OpenConceptLab/ocl_issues#1008 | concept/mapping counts async tasks

8f6fea4

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 10, 2021

OpenConceptLab/ocl_issues#1008 | concept/mapping parent active counts…

d30d4bc

… are bulk updated after content import

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 10, 2021

OpenConceptLab/ocl_issues#1008 | concept/mapping on retire/delete upd…

0177726

…ating parent counts

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 10, 2021

OpenConceptLab/ocl_issues#1008 | Source/Collection/Version retrieve w…

a7a8a19

…ill only update concepts or mappings count when required

snyaggarwal added a commit to OpenConceptLab/oclapi2 that referenced this issue Nov 10, 2021

OpenConceptLab/ocl_issues#1008 | Collection ref add/delete will updat…

7058083

…e child count

snyaggarwal closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source summary slow queries #1008

Source summary slow queries #1008

snyaggarwal commented Sep 29, 2021

rkorytkowski commented Oct 14, 2021

rkorytkowski commented Nov 5, 2021

snyaggarwal commented Nov 8, 2021

rkorytkowski commented Nov 8, 2021

rkorytkowski commented Nov 8, 2021

snyaggarwal commented Nov 9, 2021

rkorytkowski commented Nov 9, 2021

rkorytkowski commented Nov 12, 2021

snyaggarwal commented Nov 15, 2021

snyaggarwal commented Nov 15, 2021

snyaggarwal commented Dec 9, 2021

Source summary slow queries #1008

Source summary slow queries #1008

Comments

snyaggarwal commented Sep 29, 2021

rkorytkowski commented Oct 14, 2021

rkorytkowski commented Nov 5, 2021

snyaggarwal commented Nov 8, 2021

rkorytkowski commented Nov 8, 2021

rkorytkowski commented Nov 8, 2021

snyaggarwal commented Nov 9, 2021

rkorytkowski commented Nov 9, 2021

rkorytkowski commented Nov 12, 2021

snyaggarwal commented Nov 15, 2021

snyaggarwal commented Nov 15, 2021

snyaggarwal commented Dec 9, 2021