Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The experiment to redesign the critical case for web map #277

Closed
dadiorchen opened this issue Jan 30, 2023 · 9 comments
Closed

The experiment to redesign the critical case for web map #277

dadiorchen opened this issue Jan 30, 2023 · 9 comments
Assignees
Labels

Comments

@dadiorchen
Copy link
Contributor

dadiorchen commented Jan 30, 2023

Currently, when the app trying to render the map data, it request the db and join the data crossing tables, it is slow, we need to use a special way to give a good way to query and respond quickly, a rough idea is to denormalize the data and avoid crossing join.

The goal:

  • A solution, including the SQL and test report about the improvement.
  • And the db table/view desgin

Some consideration:

  • As shown below, these are two examples that performs poorly when rendering, a problem is the organization table is a tree structure, how do we speed the way to fetch all trees filtered by org (tree)
  • A previous volunteer proposed a JSON-style of way to go (as shown below), does it make sense, and how's the PostgreSQL json querying?
  • Other fields/data need to display/filter-by, basically, we now need to use JOIN and WHERE to do the trick, it is in general slow:
    • Filter by planter_id
    • Filter by wallet_id
    • Filter by species_id
    • Display species info
    • Display wallet info
    • Display token info
    • Display planter info
    • Display org info
  • The whole design concerns the way to update/generate the data incrementaly, it might lead to the usage of message queue, but this exceeds the scale of this issue.

A table structure proposed by @arunbakt :

                      Table "map_features.capture_feature"
       Column        |           Type           | Collation | Nullable | Default
---------------------+--------------------------+-----------+----------+---------
 id                  | uuid                     |           | not null |
 lat                 | numeric                  |           | not null |
 lon                 | numeric                  |           | not null |
 location            | geometry(Point,4326)     |           | not null |
 field_user_id       | bigint                   |           | not null |
 field_username      | character varying        |           | not null |
 device_identifier   | character varying        |           |          |
 attributes          | jsonb                    |           |          |
 tracking_session_id | uuid                     |           |          |
 map_name            | jsonb                    |           |          |
 token_id            | uuid                     |           |          |
 wallet_name         | character varying        |           |          |
 created_at          | timestamp with time zone |           | not null |
 updated_at          | timestamp with time zone |           | not null |
 species_name        | character varying        |           |          |
 capture_taken_at    | timestamp with time zone |           |          |
Indexes:
    "capture_feature_pkey" PRIMARY KEY, btree (id)
    "capturef_attrbs_idx" gin (attributes)
    "capturef_crdate_idx" btree (created_at)
    "capturef_fieldusr_idx" btree (field_user_id)
    "capturef_fieldusrname_id" btree (field_username)
    "capturef_map_name_idx" gin (map_name jsonb_path_ops)
    "capturef_spcsname_idx" btree (species_name)
    "capturef_token_idx" btree (token_id)
    "capturef_trckgsess_idx" btree (tracking_session_id)
    "capturef_wallet_idx" btree (wallet_name)

For example: the tree species, could be put into table: map_features.capture_feature:

attributes: {
species_id: xxx,
species_name: xxx,
}

So we can query the tree with species:
select * from map_features.capture_feature where attributes->species_id = xxxx

@dadiorchen
Copy link
Contributor Author

dadiorchen commented Jan 30, 2023

The two SQL case with poor performance

        /* sql case1 */
  
        WITH RECURSIVE organization_children AS (
           SELECT entity.id, entity_relationship.parent_id, 1 as depth, entity_relationship.type, entity_relationship.role
           FROM entity
           LEFT JOIN entity_relationship ON entity_relationship.child_id = entity.id 
           WHERE entity.id IN (SELECT id FROM entity WHERE map_name = 'TheHaitiTreeProject')
          UNION
           SELECT next_child.id, entity_relationship.parent_id, depth + 1, entity_relationship.type, entity_relationship.role
           FROM entity next_child
           JOIN entity_relationship ON entity_relationship.child_id = next_child.id 
           JOIN organization_children c ON entity_relationship.parent_id = c.id
          )
            ,org_tree_id AS (
              SELECT id FROM (
              SELECT trees.id as id from trees
                WHERE 
                  planter_id IN (
                    SELECT id FROM planter
                    JOIN (
                      SELECT id AS entity_id FROM organization_children LIMIT 20
                    ) org ON planter.organization_id = org.entity_id
                  )
              UNION
                select id from trees where planting_organization_id = (
                select id from entity where map_name = 'TheHaitiTreeProject'
                )
              ) ids

            )
     	
  SELECT 
    'cluster' AS type,
    'case1 with zoom target tile' AS log,
    cluster_1.id,
    cluster_1.estimated_geometric_location,
    cluster_1.latlon,
    cluster_1.region_type,
    cluster_1.count,
    cluster_1.count_text,
    zoom_target.centroid zoom_to
  FROM (
    
      /* sql case1 tile */
      
      SELECT 
      'cluster' AS type,
      'case1 tile' AS log,
      NULL AS zoom_to,
      region_id id, 
      st_point(LEAST(st_x(centroid), 170), st_y(centroid)) estimated_geometric_location,St_asgeojson(st_point(LEAST(st_x(centroid), 170), st_y(centroid))) latlon,
      type_id as region_type,
      count(tree_region.id) count,
      CASE WHEN count(tree_region.id) > 1000 
      THEN  (count(tree_region.id) / 1000) || 'K'
      ELSE count(tree_region.id) || ''
      END AS count_text
      FROM active_tree_region tree_region
      INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

      WHERE zoom_level = 2
      
      
      GROUP BY region_id, centroid, type_id
  ) cluster_1
  LEFT JOIN 
  (SELECT
    DISTINCT ON
    (region.id) region.id region_id,
    contained.region_id most_populated_subregion_id,
    contained.total,
    contained.zoom_level,
fetch
    ST_ASGeoJson(contained.centroid) centroid
  FROM
    (
    SELECT
      region_id,
      zoom_level
    FROM active_tree_region tree_region
    INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

    WHERE
      zoom_level = 2
      
      
    GROUP BY
      region_id,
      zoom_level ) populated_region
  JOIN region ON
    region.id = populated_region.region_id
  JOIN (
    SELECT
      region_id,
      zoom_level,
      count(tree_region.id) AS total,
      centroid
    FROM active_tree_region tree_region
    INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

    WHERE
      zoom_level = 4
      
      
    GROUP BY
      region_id,
      zoom_level,
      centroid ) contained ON
    ST_CONTAINS(region.geom,
    contained.centroid)
  WHERE
    TRUE
  ORDER BY
    region.id,
    total DESC
  ) zoom_target
  ON cluster_1.id = zoom_target.region_id

        WITH RECURSIVE organization_children AS (
           SELECT entity.id, entity_relationship.parent_id, 1 as depth, entity_relationship.type, entity_relationship.role
           FROM entity
           LEFT JOIN entity_relationship ON entity_relationship.child_id = entity.id 
           WHERE entity.id IN (SELECT id FROM entity WHERE map_name = 'bosque-la-tigra')
          UNION
           SELECT next_child.id, entity_relationship.parent_id, depth + 1, entity_relationship.type, entity_relationship.role
           FROM entity next_child
           JOIN entity_relationship ON entity_relationship.child_id = next_child.id 
           JOIN organization_children c ON entity_relationship.parent_id = c.id
          )
            ,org_tree_id AS (
              SELECT id FROM (
                SELECT trees.id as id from trees
                  WHERE 
                    planter_id IN (
                      SELECT id FROM planter
                      JOIN (
                        SELECT id AS entity_id FROM organization_children LIMIT 20
                      ) org ON planter.organization_id = org.entity_id
                    )
                UNION
                  select id from trees where planting_organization_id = (
                  select id from entity where map_name = 'bosque-la-tigra'
                  )
                ) ids
            )
     	
        SELECT 'cluster'                                           AS type,
        0 AS id,
        'case3 tile' AS log,
        NULL AS zoom_to,
        St_asgeojson(St_centroid(clustered_locations))                 latlon,
        St_centroid(clustered_locations) estimated_geometric_location,
        St_numgeometries(clustered_locations)                          count,
        CASE WHEN St_numgeometries(clustered_locations) > 1000 
        THEN  (St_numgeometries(clustered_locations) / 1000) || 'K'
        ELSE St_numgeometries(clustered_locations) || ''
        END AS count_text
        FROM   (
        SELECT Unnest(St_clusterwithin(estimated_geometric_location, 0.02)) clustered_locations
        FROM   trees 
        INNER JOIN org_tree_id ON org_tree_id.id = trees.id
        WHERE  active = true 
         
         
          
        ) clusters

@dadiorchen
Copy link
Contributor Author

@Mohmn I observed a slow case, it takes about 1 minute to finish

      /* sql case1 */

        WITH RECURSIVE organization_children AS (
           SELECT entity.id, entity_relationship.parent_id, 1 as depth, entity_relationship.type, entity_relationship.role
           FROM entity
           LEFT JOIN entity_relationship ON entity_relationship.child_id = entity.id
           WHERE entity.id IN (SELECT id FROM entity WHERE map_name = 'ashas')
          UNION
           SELECT next_child.id, entity_relationship.parent_id, depth + 1, entity_relationship.type, entity_relationship.role
           FROM entity next_child
           JOIN entity_relationship ON entity_relationship.child_id = next_child.id
           JOIN organization_children c ON entity_relationship.parent_id = c.id
          )
            ,org_tree_id AS (
              SELECT id FROM (
              SELECT trees.id as id from trees
                WHERE
                  planter_id IN (
                    SELECT id FROM planter
                    JOIN (
                      SELECT id AS entity_id FROM organization_children LIMIT 20
                    ) org ON planter.organization_id = org.entity_id
                  )
              UNION
                select id from trees where planting_organization_id = (
                select id from entity where map_name = 'ashas'
                )
              ) ids

            )

  SELECT
    'cluster' AS type,
    'case1 with zoom target tile' AS log,
    cluster_1.id,
    cluster_1.estimated_geometric_location,
    cluster_1.latlon,
    cluster_1.region_type,
    cluster_1.count,
    cluster_1.count_text,
    zoom_target.centroid zoom_to
  FROM (

      /* sql case1 tile */

      SELECT
      'cluster' AS type,
      'case1 tile' AS log,
      NULL AS zoom_to,
      region_id id,
      st_point(LEAST(st_x(centroid), 170), st_y(centroid)) estimated_geometric_location,St_asgeojson(st_point(LEAST(st_x(centroid), 170), st_y(centroid))) latlon,
      type_id as region_type,
      count(tree_region.id) count,
      CASE WHEN count(tree_region.id) > 1000
      THEN  (count(tree_region.id) / 1000) || 'K'
      ELSE count(tree_region.id) || ''
      END AS count_text
      FROM active_tree_region tree_region
      INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

      WHERE zoom_level = 2


      GROUP BY region_id, centroid, type_id
  ) cluster_1
  LEFT JOIN
  (SELECT
    DISTINCT ON
    (region.id) region.id region_id,
    contained.region_id most_populated_subregion_id,
    contained.total,
    contained.zoom_level,
    ST_ASGeoJson(contained.centroid) centroid
  FROM
    (
    SELECT
      region_id,
      zoom_level
    FROM active_tree_region tree_region
    INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

    WHERE
      zoom_level = 2


    GROUP BY
      region_id,
      zoom_level ) populated_region
  JOIN region ON
    region.id = populated_region.region_id
  JOIN (
    SELECT
      region_id,
      zoom_level,
      count(tree_region.id) AS total,
      centroid
    FROM active_tree_region tree_region
    INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

    WHERE
      zoom_level = 4


    GROUP BY
      region_id,
      zoom_level,
      centroid ) contained ON
    ST_CONTAINS(region.geom,
    contained.centroid)
  WHERE
    TRUE
  ORDER BY
    region.id,
    total DESC
  ) zoom_target
  ON cluster_1.id = zoom_target.region_id

The explain :

                                                                                            QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------
 Merge Left Join  (cost=1957025368.33..1957850596.97 rows=5050048 width=208)
   Merge Cond: (tree_region.region_id = region.id)
   CTE organization_children
     ->  Recursive Union  (cost=4.60..481.78 rows=951 width=27)
           ->  Hash Right Join  (cost=4.60..11.53 rows=1 width=27)
                 Hash Cond: (entity_relationship.child_id = entity.id)
                 ->  Seq Scan on entity_relationship  (cost=0.00..5.85 rows=285 width=23)
                 ->  Hash  (cost=4.59..4.59 rows=1 width=4)
                       ->  Nested Loop  (cost=0.56..4.59 rows=1 width=4)
                             ->  Index Scan using entity_map_name_idx on entity entity_1  (cost=0.28..2.29 rows=1 w
idth=4)
                                   Index Cond: ((map_name)::text = 'ashas'::text)
                             ->  Index Only Scan using entity_pkey on entity  (cost=0.28..2.29 rows=1 width=4)
                                   Index Cond: (id = entity_1.id)
           ->  Nested Loop  (cost=0.60..45.12 rows=95 width=27)
                 ->  Hash Join  (cost=0.33..8.19 rows=95 width=27)
                       Hash Cond: (entity_relationship_1.parent_id = c.id)
                       ->  Seq Scan on entity_relationship entity_relationship_1  (cost=0.00..5.85 rows=285 width=2
3)
                       ->  Hash  (cost=0.20..0.20 rows=10 width=8)
                             ->  WorkTable Scan on organization_children c  (cost=0.00..0.20 rows=10 width=8)
                 ->  Index Only Scan using entity_pkey on entity next_child  (cost=0.28..0.39 rows=1 width=4)
                       Index Cond: (id = entity_relationship_1.child_id)
   CTE org_tree_id
     ->  Unique  (cost=417127.15..427680.89 rows=2110749 width=4)
           ->  Sort  (cost=417127.15..422404.02 rows=2110749 width=4)
                 Sort Key: trees.id
                 ->  Append  (cost=361.65..178908.11 rows=2110749 width=4)
                       ->  Nested Loop  (cost=361.65..130504.63 rows=2044691 width=4)
                             ->  HashAggregate  (cost=361.22..370.11 rows=889 width=4)
                                   Group Key: planter.id
                                   ->  Nested Loop  (cost=0.28..359.00 rows=889 width=4)
                                         ->  Limit  (cost=0.00..0.40 rows=20 width=4)
                                               ->  CTE Scan on organization_children  (cost=0.00..19.02 rows=951 wi
dth=4)
                                         ->  Index Scan using planter_organization_id_idx on planter  (cost=0.28..1
7.48 rows=44 width=8)
                                               Index Cond: (organization_id = organization_children.id)
                             ->  Index Scan using trees_user_id on trees  (cost=0.43..123.38 rows=2300 width=8)
                                   Index Cond: (planter_id = planter.id)
                       ->  Index Scan using trees_planting_organization_id_idx on trees trees_1  (cost=2.73..16742.
24 rows=66058 width=4)
                             Index Cond: (planting_organization_id = $4)
                             InitPlan 2 (returns $4)
                               ->  Index Scan using entity_map_name_idx on entity entity_2  (cost=0.28..2.29 rows=1
 width=4)
                                     Index Cond: ((map_name)::text = 'ashas'::text)
   ->  GroupAggregate  (cost=3466033.34..3987703.78 rows=5050048 width=240)
         Group Key: tree_region.region_id, tree_region.centroid, tree_region.type_id
         ->  Sort  (cost=3466033.34..3498101.21 rows=12827146 width=44)
               Sort Key: tree_region.region_id, tree_region.centroid, tree_region.type_id
               ->  Hash Join  (cost=1018643.91..1500659.79 rows=12827146 width=44)
                     Hash Cond: (org_tree_id.id = tree_region.tree_id)
                     ->  CTE Scan on org_tree_id  (cost=0.00..42214.98 rows=2110749 width=4)
                     ->  Hash  (cost=911132.31..911132.31 rows=5050048 width=48)
                           ->  Index Scan using active_tree_region_zoom_level_idx on active_tree_region tree_region
  (cost=0.57..911132.31 rows=5050048 width=48)
                                 Index Cond: (zoom_level = 2)
   ->  Unique  (cost=1953110064.82..1953217073.44 rows=6633880 width=52)
         ->  Sort  (cost=1953110064.82..1953163569.13 rows=21401725 width=52)
               Sort Key: region.id, (count(tree_region_2.id)) DESC
               ->  Hash Join  (cost=1847399962.83..1949668265.85 rows=21401725 width=52)
                     Hash Cond: (populated_region.region_id = region.id)
                     ->  Subquery Scan on populated_region  (cost=1525341.52..1525586.62 rows=12255 width=4)
                           ->  HashAggregate  (cost=1525341.52..1525464.07 rows=12255 width=8)
                                 Group Key: tree_region_1.region_id, tree_region_1.zoom_level
                                 ->  Hash Join  (cost=998916.91..1461205.79 rows=12827146 width=8)
                                       Hash Cond: (org_tree_id_1.id = tree_region_1.tree_id)
                                       ->  CTE Scan on org_tree_id org_tree_id_1  (cost=0.00..42214.98 rows=2110749
 width=4)
                                       ->  Hash  (cost=911132.31..911132.31 rows=5050048 width=12)
                                             ->  Index Scan using active_tree_region_zoom_level_idx on active_tree_
region tree_region_1  (cost=0.57..911132.31 rows=5050048 width=12)
                                                   Index Cond: (zoom_level = 2)
                     ->  Hash  (cost=1599236829.87..1599236829.87 rows=11585187955 width=44)
                           ->  Nested Loop  (cost=3557563.03..1599236829.87 rows=11585187955 width=44)
                                 ->  GroupAggregate  (cost=3557562.62..3776295.39 rows=5239101 width=48)
                                       Group Key: tree_region_2.region_id, tree_region_2.zoom_level, tree_region_2.
centroid
                                       ->  Sort  (cost=3557562.62..3590830.97 rows=13307341 width=44)
                                             Sort Key: tree_region_2.region_id, tree_region_2.centroid
                                             ->  Hash Join  (cost=1026604.82..1515083.66 rows=13307341 width=44)
                                                   Hash Cond: (org_tree_id_2.id = tree_region_2.tree_id)
                                                   ->  CTE Scan on org_tree_id org_tree_id_2  (cost=0.00..42214.98
rows=2110749 width=4)
                                                   ->  Hash  (cost=915069.06..915069.06 rows=5239101 width=48)
                                                         ->  Index Scan using active_tree_region_zoom_level_idx on
active_tree_region tree_region_2  (cost=0.57..915069.06 rows=5239101 width=48)
                                                               Index Cond: (zoom_level = 4)
                                 ->  Index Scan using region_geom_index_gist on region  (cost=0.41..282.41 rows=221
1 width=132)
                                       Index Cond: (geom ~ tree_region_2.centroid)
                                       Filter: _st_contains(geom, tree_region_2.centroid)
(80 rows)

@dadiorchen
Copy link
Contributor Author

dadiorchen commented May 11, 2023

Weird, when I run the explain agian, it is much much faster then previous one:

 
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
-----------------------
 Subquery Scan on clusters  (cost=1226193.95..1226203.71 rows=100 width=200)
   CTE organization_children
     ->  Recursive Union  (cost=4.60..481.78 rows=951 width=27)
           ->  Hash Right Join  (cost=4.60..11.53 rows=1 width=27)
                 Hash Cond: (entity_relationship.child_id = entity.id)
                 ->  Seq Scan on entity_relationship  (cost=0.00..5.85 rows=285 width=23)
                 ->  Hash  (cost=4.59..4.59 rows=1 width=4)
                       ->  Nested Loop  (cost=0.56..4.59 rows=1 width=4)
                             ->  Index Scan using entity_map_name_idx on entity entity_1  (cost=0.28..2.29 rows=1 w
idth=4)
                                   Index Cond: ((map_name)::text = 'bosque-la-tigra'::text)
                             ->  Index Only Scan using entity_pkey on entity  (cost=0.28..2.29 rows=1 width=4)
                                   Index Cond: (id = entity_1.id)
           ->  Nested Loop  (cost=0.60..45.12 rows=95 width=27)
                 ->  Hash Join  (cost=0.33..8.19 rows=95 width=27)
                       Hash Cond: (entity_relationship_1.parent_id = c.id)
                       ->  Seq Scan on entity_relationship entity_relationship_1  (cost=0.00..5.85 rows=285 width=2
3)
                       ->  Hash  (cost=0.20..0.20 rows=10 width=8)
                             ->  WorkTable Scan on organization_children c  (cost=0.00..0.20 rows=10 width=8)
                 ->  Index Only Scan using entity_pkey on entity next_child  (cost=0.28..0.39 rows=1 width=4)
                       Index Cond: (id = entity_relationship_1.child_id)
   CTE org_tree_id
     ->  Unique  (cost=359076.60..367568.59 rows=1698399 width=4)
           ->  Sort  (cost=359076.60..363322.60 rows=1698399 width=4)
                 Sort Key: trees_1.id
                 ->  Append  (cost=331.90..170058.44 rows=1698399 width=4)
                       ->  Nested Loop  (cost=331.90..126754.82 rows=1627982 width=4)
                             ->  HashAggregate  (cost=331.47..338.11 rows=664 width=4)
                                   Group Key: planter.id
                                   ->  Nested Loop  (cost=0.28..329.81 rows=664 width=4)
                                         ->  Limit  (cost=0.00..0.40 rows=20 width=4)
                                               ->  CTE Scan on organization_children  (cost=0.00..19.02 rows=951 wi
dth=4)
                                         ->  Index Scan using planter_organization_id_idx on planter  (cost=0.28..1
6.13 rows=33 width=8)
                                               Index Cond: (organization_id = organization_children.id)
                             ->  Index Scan using trees_user_id on trees trees_1  (cost=0.43..165.87 rows=2452 widt
h=8)
                                   Index Cond: (planter_id = planter.id)
                       ->  Index Scan using trees_planting_organization_id_idx on trees trees_2  (cost=2.73..17827.
64 rows=70417 width=4)
                             Index Cond: (planting_organization_id = $4)
                             InitPlan 2 (returns $4)
                               ->  Index Scan using entity_map_name_idx on entity entity_2  (cost=0.28..2.29 rows=1
 width=4)
                                     Index Cond: ((map_name)::text = 'bosque-la-tigra'::text)
   ->  ProjectSet  (cost=841159.58..841160.10 rows=100 width=32)
         ->  Aggregate  (cost=841159.58..841159.59 rows=1 width=32)
               ->  Hash Join  (cost=740399.61..837094.88 rows=1625877 width=32)
                     Hash Cond: (org_tree_id.id = trees.id)
                     ->  CTE Scan on org_tree_id  (cost=0.00..33967.98 rows=1698399 width=4)
                     ->  Hash  (cost=623403.67..623403.67 rows=5759755 width=36)
                           ->  Seq Scan on trees  (cost=0.00..623403.67 rows=5759755 width=36)
                                 Filter: active
(48 rows)

treetracker=>

@dadiorchen
Copy link
Contributor Author

A case that takes 90s:


        /* sql case1 */

        WITH RECURSIVE organization_children AS (
           SELECT entity.id, entity_relationship.parent_id, 1 as depth, entity_relationship.type, entity_relationship.role
           FROM entity
           LEFT JOIN entity_relationship ON entity_relationship.child_id = entity.id
           WHERE entity.id IN (SELECT id FROM entity WHERE map_name = 'fcctrees')
          UNION
           SELECT next_child.id, entity_relationship.parent_id, depth + 1, entity_relationship.type, entity_relationship.role
           FROM entity next_child
           JOIN entity_relationship ON entity_relationship.child_id = next_child.id
           JOIN organization_children c ON entity_relationship.parent_id = c.id
          )
            ,org_tree_id AS (
              SELECT id FROM (
              SELECT trees.id as id from trees
                WHERE
                  planter_id IN (
                    SELECT id FROM planter
                    JOIN (
                      SELECT id AS entity_id FROM organization_children LIMIT 20
                    ) org ON planter.organization_id = org.entity_id
                  )
              UNION
                select id from trees where planting_organization_id = (
                select id from entity where map_name = 'fcctrees'
                )
              ) ids

            )

  SELECT
    'cluster' AS type,
    'case1 with zoom target tile' AS log,
    cluster_1.id,
    cluster_1.estimated_geometric_location,
    cluster_1.latlon,
    cluster_1.region_type,
    cluster_1.count,
    cluster_1.count_text,
    zoom_target.centroid zoom_to
  FROM (

      /* sql case1 tile */

      SELECT
      'cluster' AS type,
      'case1 tile' AS log,
      NULL AS zoom_to,
      region_id id,
      st_point(LEAST(st_x(centroid), 170), st_y(centroid)) estimated_geometric_location,St_asgeojson(st_point(LEAST(st_x(centroid), 170), st_y(centroid))) latlon,
      type_id as region_type,
      count(tree_region.id) count,
      CASE WHEN count(tree_region.id) > 1000
      THEN  (count(tree_region.id) / 1000) || 'K'
      ELSE count(tree_region.id) || ''
      END AS count_text
      FROM active_tree_region tree_region
      INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

      WHERE zoom_level = 2


      GROUP BY region_id, centroid, type_id
  ) cluster_1
  LEFT JOIN
  (SELECT
    DISTINCT ON
    (region.id) region.id region_id,
    contained.region_id most_populated_subregion_id,
    contained.total,
    contained.zoom_level,
    ST_ASGeoJson(contained.centroid) centroid
  FROM
    (
    SELECT
      region_id,
      zoom_level
    FROM active_tree_region tree_region
    INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

    WHERE
      zoom_level = 2


    GROUP BY
      region_id,
      zoom_level ) populated_region
  JOIN region ON
    region.id = populated_region.region_id
  JOIN (
    SELECT
      region_id,
      zoom_level,
      count(tree_region.id) AS total,
      centroid
    FROM active_tree_region tree_region
    INNER JOIN org_tree_id ON org_tree_id.id = tree_region.tree_id

    WHERE
      zoom_level = 4


    GROUP BY
      region_id,
      zoom_level,
      centroid ) contained ON
    ST_CONTAINS(region.geom,
    contained.centroid)
  WHERE
    TRUE
  ORDER BY
    region.id,
    total DESC
  ) zoom_target
  ON cluster_1.id = zoom_target.region_id

@dadiorchen
Copy link
Contributor Author

The insert to build the denormalized table by Mohmin:

insert into public.denormalized_trees_v2
(id,created_at,updated_at,estimated_geometric_point,planter_id,planter_name,species_id,species_name,token_id,wallet_id,wallet_name,organization_id,tags,organizations)
select t.id,t.time_created,time_updated,t.estimated_geometric_location,t.planter_id,plt.first_name,spc.uuid,spc.name,tkn.id,wlt.id,wlt.name,plt.organization_id,jsonb_agg(tag),
jsonb_agg(org)
-- for array
--Array(SELECT entity_id FROM getEntityRelationshipChildren(plt.organization_id))
from public.trees t
left join public.planter as plt on plt.id = t.planter_id
left join public.tree_species spc on spc.id = t.species_id
 left join public.tree_tag tree_tag on t.id = tree_tag.tree_id
left join public.tag tag on tag.id = tree_tag.tag_id
left join wallet.token as tkn on tkn.capture_id::text = t.uuid::text
left join wallet.wallet as wlt on tkn.wallet_id  = wlt.id
JOIN getEntityRelationshipChildren(plt.organization_id) AS org ON true
group by t.id,plt.first_name,spc.id,tkn.id,wlt.id, plt.organization_id

@dadiorchen
Copy link
Contributor Author

For the wallet case, seems we didn't get too much improvement by denormlization, right? @Mohmn

treetracker=> select count(*)
;treetracker-> from denormalized_trees
where wallet_id = '6ace3ada-deea-4a98-b2cc-4d242251abd4'
treetracker-> where wallet_id = '6ace3ada-deea-4a98-b2cc-4d242251abd4';
 count
--------
 552257
(1 row)

Time: 9740.282 ms (00:09.740)
treetracker=> explain select count(*)
from denormalized_trees
where wallet_id = '6ace3ada-deea-4a98-b2cc-4d242251abd4';
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=347647.72..347647.73 rows=1 width=8)
   ->  Index Scan using hash_index_wallet_id on denormalized_trees  (cost=0.00..346312.63 rows=534036 width=0)
         Index Cond: (wallet_id = '6ace3ada-deea-4a98-b2cc-4d242251abd4'::uuid)
(3 rows)

Time: 1.302 ms
treetracker=>  select count(*)
treetracker->  from trees
treetracker->  INNER JOIN wallet.token ON wallet.token.capture_id::text = trees.uuid
treetracker->  INNER JOIN wallet.wallet ON wallet.wallet.id = wallet.token.wallet_id
treetracker->  where wallet.wallet.id = '6ace3ada-deea-4a98-b2cc-4d242251abd4';;
 count
--------
 552257
(1 row)

Time: 12180.780 ms (00:12.181)
Time: 1.577 ms
treetracker=> explain select count(*)
 from trees
 INNER JOIN wallet.token ON wallet.token.capture_id::text = trees.uuid
 INNER JOIN wallet.wallet ON wallet.wallet.id = wallet.token.wallet_id
 where wallet.wallet.id = '6ace3ada-deea-4a98-b2cc-4d242251abd4';;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=308039.44..308039.45 rows=1 width=8)
   ->  Nested Loop  (cost=1001.27..306680.71 rows=543489 width=0)
         ->  Index Only Scan using wallet_pkey on wallet  (cost=0.28..2.30 rows=1 width=16)
               Index Cond: (id = '6ace3ada-deea-4a98-b2cc-4d242251abd4'::uuid)
         ->  Gather  (cost=1000.99..301243.52 rows=543489 width=16)
               Workers Planned: 2
               ->  Nested Loop  (cost=0.99..245894.62 rows=226454 width=16)
                     ->  Parallel Index Scan using token_wallet_id_idx on token  (cost=0.43..41324.14 rows=226454 width=32)
                           Index Cond: (wallet_id = '6ace3ada-deea-4a98-b2cc-4d242251abd4'::uuid)
                     ->  Index Only Scan using trees_uuid_idx1 on trees  (cost=0.56..0.90 rows=1 width=37)
                           Index Cond: (uuid = (token.capture_id)::text)
(11 rows)

Time: 2.301 ms
Time: 0.619 ms
treetracker=>

@dadiorchen
Copy link
Contributor Author

dadiorchen commented May 31, 2023

The organization case is great, @Mohmn there is a good improvement, I added a index to the jsonb

treetracker=> explain analyse select count(*)
from denormalized_trees
where organizations @> '[{"entity_id": 178}]'::jsonb or organizations @> '[{"parent_id": 178}]'::jsonb;
                                                                                 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------
 Aggregate  (cost=9621.68..9621.69 rows=1 width=8) (actual time=3199.201..3199.204 rows=1 loops=1)
   ->  Bitmap Heap Scan on denormalized_trees  (cost=272.26..9598.49 rows=9278 width=0) (actual time=607.180..3057.867 rows=1
794475 loops=1)
         Recheck Cond: ((organizations @> '[{"entity_id": 178}]'::jsonb) OR (organizations @> '[{"parent_id": 178}]'::jsonb))
         Heap Blocks: exact=87081
         ->  BitmapOr  (cost=272.26..272.26 rows=9282 width=0) (actual time=582.412..582.414 rows=0 loops=1)
               ->  Bitmap Index Scan on denormalized_trees_organizations_gin_idx  (cost=0.00..133.81 rows=4641 width=0) (actu
al time=314.360..314.360 rows=1794475 loops=1)
                     Index Cond: (organizations @> '[{"entity_id": 178}]'::jsonb)
               ->  Bitmap Index Scan on denormalized_trees_organizations_gin_idx  (cost=0.00..133.81 rows=4641 width=0) (actu
al time=268.049..268.050 rows=1794475 loops=1)
                     Index Cond: (organizations @> '[{"parent_id": 178}]'::jsonb)
 Planning Time: 0.160 ms
 Execution Time: 3199.262 ms
(11 rows)

Time: 3202.568 ms (00:03.203)
treetracker=> explain analyze select count(*)
from trees
where planter_id in (
  select id from planter where organization_id in (
  SELECT entity_id FROM getEntityRelationshipChildren(178)
  )
);
                                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
----------------------------------------
 Aggregate  (cost=192471.17..192471.18 rows=1 width=8) (actual time=5599.061..5599.066 rows=1 loops=1)
   ->  Nested Loop  (cost=518.53..181949.52 rows=4208663 width=0) (actual time=10.553..5405.406 rows=1794475 loops=1)
         ->  HashAggregate  (cost=518.09..539.99 rows=2190 width=4) (actual time=10.527..13.476 rows=1358 loops=1)
               Group Key: planter.id
               ->  Hash Join  (cost=17.25..512.62 rows=2190 width=4) (actual time=0.886..10.092 rows=1358 loops=1)
                     Hash Cond: (planter.organization_id = getentityrelationshipchildren.entity_id)
                     ->  Seq Scan on planter  (cost=0.00..453.33 rows=6733 width=8) (actual time=0.061..7.929 rows=6882 loops
=1)
                     ->  Hash  (cost=14.75..14.75 rows=200 width=4) (actual time=0.796..0.797 rows=18 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  HashAggregate  (cost=12.75..14.75 rows=200 width=4) (actual time=0.785..0.790 rows=18 loops=1)
                                 Group Key: getentityrelationshipchildren.entity_id
                                 ->  Function Scan on getentityrelationshipchildren  (cost=0.25..10.25 rows=1000 width=4) (ac
tual time=0.774..0.776 rows=18 loops=1)
         ->  Index Only Scan using trees_user_id on trees  (cost=0.43..63.62 rows=1922 width=4) (actual time=0.026..3.781 row
s=1321 loops=1358)
               Index Cond: (planter_id = planter.id)
               Heap Fetches: 623163
 Planning Time: 0.607 ms
 Execution Time: 5599.187 ms
(17 rows)

Time: 5795.393 ms (00:05.795)
treetracker=>

The SQL to index:

 63 CREATE INDEX denormalized_trees_organizations_gin_idx ON denormalized_trees USING gin (organizations);

@dadiorchen
Copy link
Contributor Author

dadiorchen commented Jun 23, 2023

The last comparison between: original, array

The orignial way to count the tree in org 178(freetown):

treetracker=>         WITH RECURSIVE organization_children AS (
treetracker(>            SELECT entity.id, entity_relationship.parent_id, 1 as depth, entity_relationship.type, entity_relati
onship.role
treetracker(>            FROM entity
treetracker(>            LEFT JOIN entity_relationship ON entity_relationship.child_id = entity.id
treetracker(>            WHERE entity.id = 178
treetracker(>           UNION
treetracker(>            SELECT next_child.id, entity_relationship.parent_id, depth + 1, entity_relationship.type, entity_rel
ationship.role
treetracker(>            FROM entity next_child
treetracker(>            JOIN entity_relationship ON entity_relationship.child_id = next_child.id
treetracker(>            JOIN organization_children c ON entity_relationship.parent_id = c.id
treetracker(>           )
treetracker->             ,org_tree_id AS (
treetracker(>               SELECT trees.id as id from trees
treetracker(>                 WHERE
treetracker(>                   planter_id IN (
treetracker(>                     SELECT id FROM planter
treetracker(>                     JOIN (
treetracker(>                       SELECT id AS entity_id FROM organization_children LIMIT 20
treetracker(>                     ) org ON planter.organization_id = org.entity_id
treetracker(>                   )
treetracker->         select count(*) from org_tree_id;;
                 ->  Hash Join  (cost=0.33..1.63 rows=10 width=76) (actual time=0.032..0.055 rows=8 loops=2)
                       Hash Cond: (entity_relationship_1.parent_id = c.id)
                       ->  Seq Scan on entity_relationship entity_relationship_1  (cost=0.00..1.15 rows=15 width=72) (actual 
time=0.008..0.015 rows=48 loops=2)
                       ->  Hash  (cost=0.20..0.20 rows=10 width=8) (actual time=0.008..0.009 rows=9 loops=2)
                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                             ->  WorkTable Scan on organization_children c  (cost=0.00..0.20 rows=10 width=8) (actual time=0.
003..0.005 rows=9 loops=2)
                 ->  Index Only Scan using entity_pkey on entity next_child  (cost=0.28..1.43 rows=1 width=4) (actual time=0.
005..0.005 rows=1 loops=17)
                       Index Cond: (id = entity_relationship_1.child_id)
                       Heap Fetches: 17
   CTE org_tree_id
     ->  Nested Loop  (cost=308.27..114058.91 rows=1791084 width=4) (actual time=32.765..55810.777 rows=1794475 loops=1)
           ->  HashAggregate  (cost=307.84..317.16 rows=932 width=4) (actual time=32.738..35.612 rows=1358 loops=1)
                 Group Key: planter.id
                 ->  Nested Loop  (cost=0.28..305.51 rows=932 width=4) (actual time=1.158..32.079 rows=1358 loops=1)
                       ->  Limit  (cost=0.00..0.40 rows=20 width=4) (actual time=0.729..1.046 rows=18 loops=1)
                             ->  CTE Scan on organization_children  (cost=0.00..2.02 rows=101 width=4) (actual time=0.728..1.
037 rows=18 loops=1)
                       ->  Index Scan using planter_organization_id_idx on planter  (cost=0.28..14.78 rows=47 width=8) (actua
l time=0.151..1.709 rows=75 loops=18)
                             Index Cond: (organization_id = organization_children.id)
           ->  Index Scan using trees_user_id on trees  (cost=0.43..102.82 rows=1922 width=8) (actual time=0.262..40.848 rows
=1321 loops=1358)
                 Index Cond: (planter_id = planter.id)
   ->  CTE Scan on org_tree_id  (cost=0.00..35821.68 rows=1791084 width=0) (actual time=32.768..56633.438 rows=1794475 loops=
1)
 Planning Time: 4.320 ms
 Execution Time: 56858.340 ms
(35 rows)

The result of array type org info:

treetracker=> explain analyse 
treetracker-> select count(*)
treetracker-> from denormalized_trees
treetracker-> where organization_arr2 @> ARRAY[178];
                                                                       QUERY PLAN                                             
                           
------------------------------------------------------------------------------------------------------------------------------
---------------------------
 Finalize Aggregate  (cost=503815.80..503815.81 rows=1 width=8) (actual time=1417.891..1486.643 rows=1 loops=1)
   ->  Gather  (cost=503815.59..503815.80 rows=2 width=8) (actual time=1417.763..1486.635 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=502815.59..502815.60 rows=1 width=8) (actual time=1400.226..1400.227 rows=1 loops=3)
               ->  Parallel Seq Scan on denormalized_trees  (cost=0.00..500947.39 rows=747281 width=0) (actual time=0.227..136
1.992 rows=598158 loops=3)
                     Filter: (organization_arr2 @> '{178}'::integer[])
                     Rows Removed by Filter: 948874
 Planning Time: 0.114 ms
 Execution Time: 1486.693 ms
(10 rows)

These two SQL return the same result: 1794475

@dadiorchen
Copy link
Contributor Author

Some record of creating the data, @Mohmn I create another array column which is simpler to give a organization list that own the tree.

--- function to get all organization id from planter id
CREATE OR REPLACE FUNCTION get_organization_list_by_planter_id(planter_id integer)
RETURNS integer[] AS 
$BODY$
WITH RECURSIVE ancient_organization(org_id) AS (
    select organization_id from planter where id = planter_id
  UNION ALL
    select parent_id from entity_relationship er, ancient_organization ao
    where er.child_id = ao.org_id
)
--- return all org id
select ARRAY_AGG(org_id) from ancient_organization;
$BODY$
LANGUAGE SQL

--- update denormalized_trees
update denormalized_trees
set organization_arr2 = get_organization_list_by_planter_id(planter_id)

--- create index
CREATE INDEX organization_arr2_idx on "denormalized_trees" USING GIN ("organization_arr2");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🎉 Done
Development

No branches or pull requests

2 participants