Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi part expand query is very slow #12

Closed
starstuck opened this issue Jul 18, 2016 · 3 comments
Closed

Multi part expand query is very slow #12

starstuck opened this issue Jul 18, 2016 · 3 comments

Comments

@starstuck
Copy link
Contributor

When we run a query having multiple globs, it takes very long to expand metrics, 100 times longer than queries expanding only one.

For example dalmatiner query

SELECT 
  sum(sum('5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e'.'stats'.'counters'.'web'.'login'.*.'successful'.'count' BUCKET '5c'), 5m) AS 'Successful Logins',
  sum(sum('5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e'.'stats'.'counters'.'web'.'register'.*.'successful'.'count' BUCKET '5c'), 5m) AS 'Successful Registers'
  BETWEEN 1468837800 AND now

Will trigger SQL query to expand wildcards:

SELECT DISTINCT key FROM metrics WHERE bucket = '5c' AND
((id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'key_length' AND value = '8') AND
     (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'ddb' AND value = '5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e') AND
         (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_2' AND value = 'stats') AND
             (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_3' AND value = 'counters') AND
                 (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_4' AND value = 'web') AND
                     (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_5' AND value = 'register') AND
                         (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_7' AND value = 'successful') AND
                             id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_8' AND value = 'count'))))))))
 OR
 (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'key_length' AND value = '8') AND
     (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_1' AND value = '5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e') AND
         (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_2' AND value = 'stats') AND
             (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_3' AND value = 'counters') AND
                 (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_4' AND value ='web') AND
                     (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_5' AND value = 'login') AND
                         (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_7' AND value = 'successful') AND
                             id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_8' AND value = 'count')))))))));

This query will take over 4sec to run.

Strangely, if I run just half of this query, like:

SELECT DISTINCT key FROM metrics WHERE bucket = '5c' AND
  (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'key_length' AND value = '8') AND
     (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_1' AND value = '5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e') AND
         (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_2' AND value = 'stats') AND
             (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_3' AND value = 'counters') AND
                 (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_4' AND value = 'web') AND
                     (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_5' AND value = 'register') AND
                         (id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_7' AND value = 'successful') AND
                             id IN (SELECT metric_id FROM dimensions WHERE  namespace = 'ddb' AND name = 'part_8' AND value = 'count'))))))));

, it will take just 20ms.

@starstuck starstuck changed the title Multiple extend query is very slow Multi part expand query is very slow Jul 18, 2016
@starstuck
Copy link
Contributor Author

I ran both queries from posgresql's EXPLAIN ANALYZE

For single path query it gives:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=574.11..574.12 rows=1 width=109) (actual time=0.613..0.613 rows=1 loops=1)
   Group Key: metrics.key
   ->  Nested Loop Semi Join  (cost=4.95..574.11 rows=1 width=109) (actual time=0.260..0.607 rows=1 loops=1)
         ->  Nested Loop Semi Join  (cost=4.38..557.83 rows=1 width=173) (actual time=0.256..0.599 rows=2 loops=1)
               ->  Nested Loop Semi Join  (cost=3.82..541.55 rows=1 width=165) (actual time=0.154..0.569 rows=12 loops=1)
                     ->  Nested Loop Semi Join  (cost=3.25..525.27 rows=1 width=157) (actual time=0.150..0.534 rows=12 loops=1)
                           ->  Nested Loop Semi Join  (cost=2.69..509.00 rows=1 width=149) (actual time=0.147..0.502 rows=12 loops=1)
                                 ->  Nested Loop Semi Join  (cost=2.12..492.72 rows=1 width=141) (actual time=0.143..0.467 rows=12 loops=1)
                                       ->  Nested Loop Semi Join  (cost=1.56..476.44 rows=1 width=133) (actual time=0.139..0.434 rows=12 loops=1)
                                             ->  Nested Loop  (cost=0.99..460.16 rows=1 width=125) (actual time=0.061..0.333 rows=14 loops=1)
                                                   ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_6  (cost=0.56..147.14 rows=37 width=8) (actual time=0.040..0.107 rows=29 loops=1)
                                                         Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_7'::text) AND (value = 'successful'::text))
                                                   ->  Index Scan using metrics_idx_id on metrics  (cost=0.43..8.45 rows=1 width=117) (actual time=0.007..0.007 rows=0 loops=29)
                                                         Index Cond: (id = dimensions_6.metric_id)
                                                         Filter: (bucket = '5c'::text)
                                                         Rows Removed by Filter: 1
                                             ->  Index Only Scan using dimensions_idx on dimensions  (cost=0.56..8.42 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=14)
                                                   Index Cond: ((metric_id = metrics.id) AND (namespace = 'ddb'::text) AND (name = 'key_length'::text) AND (value = '8'::text))
                                                   Heap Fetches: 12
                                       ->  Index Only Scan using dimensions_idx on dimensions dimensions_1  (cost=0.56..8.42 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=12)
                                             Index Cond: ((metric_id = metrics.id) AND (namespace = 'ddb'::text) AND (name = 'part_1'::text) AND (value = '5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e'::text))
                                             Heap Fetches: 12
                                 ->  Index Only Scan using dimensions_idx on dimensions dimensions_2  (cost=0.56..8.42 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=12)
                                       Index Cond: ((metric_id = metrics.id) AND (namespace = 'ddb'::text) AND (name = 'part_2'::text) AND (value = 'stats'::text))
                                       Heap Fetches: 12
                           ->  Index Only Scan using dimensions_idx on dimensions dimensions_3  (cost=0.56..8.42 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=12)
                                 Index Cond: ((metric_id = metrics.id) AND (namespace = 'ddb'::text) AND (name = 'part_3'::text) AND (value = 'counters'::text))
                                 Heap Fetches: 12
                     ->  Index Only Scan using dimensions_idx on dimensions dimensions_4  (cost=0.56..8.42 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=12)
                           Index Cond: ((metric_id = metrics.id) AND (namespace = 'ddb'::text) AND (name = 'part_4'::text) AND (value = 'web'::text))
                           Heap Fetches: 12
               ->  Index Only Scan using dimensions_idx on dimensions dimensions_5  (cost=0.56..8.42 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12)
                     Index Cond: ((metric_id = metrics.id) AND (namespace = 'ddb'::text) AND (name = 'part_5'::text) AND (value = 'register'::text))
                     Heap Fetches: 2
         ->  Index Only Scan using dimensions_idx on dimensions dimensions_7  (cost=0.56..8.42 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=2)
               Index Cond: ((metric_id = metrics.id) AND (namespace = 'ddb'::text) AND (name = 'part_8'::text) AND (value = 'count'::text))
               Heap Fetches: 1
 Planning time: 16.981 ms
 Execution time: 0.692 ms
(39 rows)

@starstuck
Copy link
Contributor Author

For multiple paths query it gives:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=345190.06..345191.41 rows=135 width=109) (actual time=2794.304..2794.305 rows=2 loops=1)
   Group Key: metrics.key
   ->  Seq Scan on metrics  (cost=74178.07..345189.72 rows=136 width=109) (actual time=1686.495..2794.295 rows=2 loops=1)
         Filter: ((hashed SubPlan 1) AND (hashed SubPlan 2) AND (hashed SubPlan 3) AND (hashed SubPlan 4) AND (hashed SubPlan 5) AND (hashed SubPlan 6) AND (bucket = '5c'::text) AND (((hashed SubPlan 7) AND (hashed SubPlan 8)) OR ((hashed SubPlan 9) AND (hashed SubPlan 10))))
         Rows Removed by Filter: 4006053
         SubPlan 1
           ->  Bitmap Heap Scan on dimensions  (cost=450.05..40142.11 rows=11097 width=8) (actual time=84.115..1511.418 rows=306572 loops=1)
                 Recheck Cond: ((namespace = 'ddb'::text) AND (name = 'key_length'::text) AND (value = '8'::text))
                 Rows Removed by Index Recheck: 9248582
                 Heap Blocks: exact=41553 lossy=106465
                 ->  Bitmap Index Scan on dimensions_idx_namespace_name_value  (cost=0.00..447.28 rows=11097 width=0) (actual time=77.607..77.607 rows=306572 loops=1)
                       Index Cond: ((namespace = 'ddb'::text) AND (name = 'key_length'::text) AND (value = '8'::text))
         SubPlan 2
           ->  Bitmap Heap Scan on dimensions dimensions_1  (cost=275.23..25269.86 rows=6797 width=8) (actual time=2.158..11.892 rows=12553 loops=1)
                 Recheck Cond: ((namespace = 'ddb'::text) AND (name = 'part_2'::text) AND (value = 'stats'::text))
                 Heap Blocks: exact=3662
                 ->  Bitmap Index Scan on dimensions_idx_namespace_name_value  (cost=0.00..273.53 rows=6797 width=0) (actual time=1.779..1.779 rows=12553 loops=1)
                       Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_2'::text) AND (value = 'stats'::text))
         SubPlan 3
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_2  (cost=0.56..397.35 rows=102 width=8) (actual time=0.039..1.064 rows=2236 loops=1)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_3'::text) AND (value = 'counters'::text))
         SubPlan 4
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_3  (cost=0.56..343.43 rows=88 width=8) (actual time=0.021..6.740 rows=11945 loops=1)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_4'::text) AND (value = 'web'::text))
         SubPlan 5
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_4  (cost=0.56..147.14 rows=37 width=8) (actual time=0.022..0.049 rows=29 loops=1)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_7'::text) AND (value = 'successful'::text))
         SubPlan 6
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_5  (cost=0.56..6823.55 rows=1761 width=8) (actual time=0.029..20.712 rows=17389 loops=1)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_8'::text) AND (value = 'count'::text))
         SubPlan 7
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_6  (cost=0.56..8.59 rows=1 width=8) (actual time=0.018..0.018 rows=0 loops=1)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'ddb'::text) AND (value = '5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e'::text))
         SubPlan 8
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_7  (cost=0.56..297.24 rows=76 width=8) (never executed)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_5'::text) AND (value = 'register'::text))
         SubPlan 9
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_8  (cost=0.56..401.20 rows=103 width=8) (actual time=0.019..4.826 rows=6258 loops=1)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_1'::text) AND (value = '5c7df5ed-5ba9-4d11-bbfe-b99a63e0054e'::text))
         SubPlan 10
           ->  Index Scan using dimensions_idx_namespace_name_value on dimensions dimensions_9  (cost=0.56..297.24 rows=76 width=8) (actual time=0.046..0.138 rows=145 loops=1)
                 Index Cond: ((namespace = 'ddb'::text) AND (name = 'part_5'::text) AND (value = 'login'::text))
 Planning time: 2.368 ms
 Execution time: 2794.498 ms
(44 rows)

@Licenser
Copy link
Contributor

That is quite interesting, looks like it'd be faster to just run multiple small querries instead of one combined one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants