Speed up scope counts #7868

seanlinsley · 2015-09-22T20:02:14Z

seanlinsley
Sep 22, 2015

In the case that you have a complex query applied to your index page (perhaps through a filter) that takes a long time, say 150 ms, you can end up losing around a second to scope counts. This happens because for each scope (the page in question had five), we perform a database request.

Postgres 9.4 supports an easy way to request these counts without performing multiple database requests: filters.

For example, given 5 users who are enabled and 7 users are disabled:

select
  count(*) filter (where status = 'enabled') as enabled_count,
  count(*) filter (where status = 'disabled') as disabled_count
from users;
 enabled_count | disabled_count 
---------------+----------------
             5 |              7

The downside is that no other database yet supports it (it is part of the SQL standard though).

Questions:

how do we reliably detect that Postgres 9.4 is being used?
given an Active Record scope, is there any way to extract its SQL so we can inject it into a filter?

timoschilling · 2015-09-23T14:31:30Z

timoschilling
Sep 23, 2015
Collaborator

Can you provide some benchmarks for a possible improvement?

0 replies

varyonic · 2017-01-12T18:37:27Z

varyonic
Jan 12, 2017

Would scope counts be a good place to use AJAX? Perhaps a plugin like @holyketzer's async_panel?

0 replies

holyketzer · 2017-01-12T21:00:41Z

holyketzer
Jan 12, 2017

I suppose this optimized query doesn't give meaningful performance boost. It's complexity should be a sum of complexity of it's sub-queries, so your profit is only twice decreased network latency to the database, it's a very small profit.

0 replies

seanlinsley · 2017-12-10T20:50:19Z

seanlinsley
Dec 10, 2017
Author

@holyketzer while doing all the calculations as one query does incur the cost of the calculations, it's still faster than doing them separately because the database only has to load the records once, then can do the calculations while they're in memory. The gains multiply as you add more and more scopes.

In this example, 1 combined query is twice as fast as 6 sepearate queries: (1 second vs 2 seconds)

-- a table with 10 million rows
create table foo as select id from generate_series(1, 10000000) id;

SELECT 10000000
Time: 4908.401 ms

select count(*) from foo;
  count   
----------
 10000000
(1 row)

Time: 329.191 ms

select count(*) from foo where id % 2 = 0;
  count  
---------
 5000000
(1 row)

Time: 382.320 ms

select count(*) from foo where id % 2 = 1;
  count  
---------
 5000000
(1 row)

Time: 357.944 ms

select count(*) from foo where id % 3 = 0;
  count  
---------
 3333333
(1 row)

Time: 345.678 ms

select count(*) from foo where id % 4 = 0;
  count  
---------
 2500000
(1 row)

Time: 351.968 ms

select count(*) from foo where id % 5 = 0;
  count  
---------
 2000000
(1 row)

Time: 338.696 ms

select
  count(*) as all_count,
  count(*) filter (where id % 2 = 0) as even_count,
  count(*) filter (where id % 2 = 1) as odd_count,
  count(*) filter (where id % 3 = 0) as three_divisible_count,
  count(*) filter (where id % 4 = 0) as four_divisible_count,
  count(*) filter (where id % 5 = 0) as five_divisible_count
from foo;

 all_count | even_count | odd_count | three_divisible_count | four_divisible_count | five_divisible_count 
-----------+------------+-----------+-----------------------+----------------------+----------------------
  10000000 |    5000000 |   5000000 |               3333333 |              2500000 |              2000000
(1 row)

Time: 936.567 ms

But in my experience, the biggest gain will be in not having to tax the database with expensive filter queries over and over again, particularly filters that join in a very large association.

0 replies

seanlinsley · 2017-12-10T20:57:15Z

seanlinsley
Dec 10, 2017
Author

We can also potentially eliminate the database request that Kaminari does to build pagination.

I mentioned in the orginal ticket description that filter is only available for Postgres, but luckily there's a cross-database way to write the query:

select
  count(case when status = 'enabled' then 1 end) as enabled_count,
  count(case when status = 'disabled' then 1 end) as disabled_count
from users;

0 replies

seanlinsley · 2017-12-10T21:28:46Z

seanlinsley
Dec 10, 2017
Author

It's possible to combine this query with the main query that renders the index page by using a window function, meaning that we might be able to render the entire UI with just one database request.

Here's a set of patches I wrote to test the idea:

ActiveAdmin::Scope.prepend Module.new {
  attr_reader :embedded_count

  def initialize(name, method = nil, options = {}, &block)
    super
    @embedded_count = options[:embedded_count]
  end
}

class ActiveAdmin::Views::Scopes
  def get_scope_count(scope)
    if scope.embedded_count
      # TODO: it needs access to the already-loaded records that'll be used to render the index, in order to avoid doing a database request here via `first`
      collection_before_scope.first.public_send scope.embedded_count
    else
      collection_size(scope_chain(scope, collection_before_scope))
    end
  end
end

scope :all,            embedded_count: :all_count, default: true
scope :enabled_count,  embedded_count: :enabled_count
scope :disabled_count, embedded_count: :disabled_count

controller do
  def scoped_collection
    super.select <<-SQL.squish
      users.*,
      count(users.*) over () as all_count,
      count(users.*) filter (where users.status = 'enabled') over () as enabled_count,
      count(users.*) filter (where users.status = 'disabled') over () as disabled_count
    SQL
  end
end

I haven't benchmarked it yet. It's possible that it may be slower than just doing a separate query, depending on how databases implement the count internally (whether the count is just done once, or if it's done for each individual record).

0 replies

seanlinsley · 2017-12-10T21:49:10Z

seanlinsley
Dec 10, 2017
Author

🤔 so it looks like Postgres does the calculations only once, but they're consistently slower than when run separately?

select
  count(*) as count,
  count(*) filter (where id % 2 = 0) as even,
  count(*) filter (where id % 2 = 1) as odd
from foo;

  count   |  even   |   odd   
----------+---------+---------
 10000000 | 5000000 | 5000000
(1 row)

Time: 557.827 ms

select
  foo.*,
  count(*) over () as count,
  count(*) filter (where id % 2 = 0) over () as even,
  count(*) filter (where id % 2 = 1) over () as odd
from foo
limit 10;

 id |  count   |  even   |   odd   
----+----------+---------+---------
  1 | 10000000 | 5000000 | 5000000
  2 | 10000000 | 5000000 | 5000000
  3 | 10000000 | 5000000 | 5000000
  4 | 10000000 | 5000000 | 5000000
  5 | 10000000 | 5000000 | 5000000
  6 | 10000000 | 5000000 | 5000000
  7 | 10000000 | 5000000 | 5000000
  8 | 10000000 | 5000000 | 5000000
  9 | 10000000 | 5000000 | 5000000
 10 | 10000000 | 5000000 | 5000000
(10 rows)

Time: 3173.349 ms

explain analyze select
  foo.*,
  count(*) over () as count,
  count(*) filter (where id % 2 = 0) over () as even,
  count(*) filter (where id % 2 = 1) over () as odd
from foo
limit 10;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.42 rows=10 width=28) (actual time=3986.701..3986.707 rows=10 loops=1)
   ->  WindowAgg  (cost=0.00..419249.80 rows=10000048 width=28) (actual time=3986.701..3986.707 rows=10 loops=1)
         ->  Seq Scan on foo  (cost=0.00..144248.48 rows=10000048 width=4) (actual time=0.016..874.944 rows=10000000 loops=1)
 Planning time: 0.064 ms
 Execution time: 4008.452 ms
(5 rows)

Time: 4008.842 ms

explain analyze select
  foo.*,
  count(*) over () as count,
  count(*) filter (where id % 2 = 0) over () as even,
  count(*) filter (where id % 2 = 1) over () as odd
from foo
limit 1000000;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..41924.78 rows=1000000 width=28) (actual time=3921.544..4189.906 rows=1000000 loops=1)
   ->  WindowAgg  (cost=0.00..419249.80 rows=10000048 width=28) (actual time=3921.543..4092.089 rows=1000000 loops=1)
         ->  Seq Scan on foo  (cost=0.00..144248.48 rows=10000048 width=4) (actual time=0.013..869.517 rows=10000000 loops=1)
 Planning time: 0.053 ms
 Execution time: 4263.458 ms
(5 rows)

Time: 4263.826 ms

Ah, window functions don't support parallel query execution, while regular count does:

explain analyze select
  count(*) as count,
  count(*) filter (where id % 2 = 0) as even,
  count(*) filter (where id % 2 = 1) as odd
from foo;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=133958.44..133958.45 rows=1 width=24) (actual time=840.453..840.453 rows=1 loops=1)
   ->  Gather  (cost=133958.10..133958.41 rows=3 width=24) (actual time=840.390..840.446 rows=4 loops=1)
         Workers Planned: 3
         Workers Launched: 3
         ->  Partial Aggregate  (cost=132958.10..132958.11 rows=1 width=24) (actual time=833.231..833.232 rows=1 loops=4)
               ->  Parallel Seq Scan on foo  (cost=0.00..76506.22 rows=3225822 width=4) (actual time=0.021..283.485 rows=2500000 loops=4)
 Planning time: 0.070 ms
 Execution time: 851.399 ms
(8 rows)

Time: 851.822 ms

Still, 2 queries is much better than 2 + (1 query per scope).

0 replies

seanlinsley · 2017-12-13T16:26:44Z

seanlinsley
Dec 13, 2017
Author

As of Rails 5.2 we should be able to do where_clause.ast.to_sql to extract the SQL from a scope (with its bind parameters).

That means this could be a feature that's on by default, with no configuration changes necessary on the user's part.

If people have scopes that join in associations, those should probably stay as their own database query.

0 replies

wspurgin · 2017-12-14T17:28:43Z

wspurgin
Dec 14, 2017

For pre-5.2 there are other "methods" to extract the where clause with binds from a relation, but it's still all through private APIs in Active Record. All the methods to extract that clause will be different in 4.2, and 5.0. And 5.1 is the same as 5.2 (I believe). You can checkout the history of the relation.rb#to_sql is the easiest way to recreate the extraction.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up scope counts #7868

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Speed up scope counts #7868

seanlinsley Sep 22, 2015

Replies: 9 comments

timoschilling Sep 23, 2015 Collaborator

varyonic Jan 12, 2017

holyketzer Jan 12, 2017

seanlinsley Dec 10, 2017 Author

seanlinsley Dec 10, 2017 Author

seanlinsley Dec 10, 2017 Author

seanlinsley Dec 10, 2017 Author

seanlinsley Dec 13, 2017 Author

wspurgin Dec 14, 2017

seanlinsley
Sep 22, 2015

timoschilling
Sep 23, 2015
Collaborator

varyonic
Jan 12, 2017

holyketzer
Jan 12, 2017

seanlinsley
Dec 10, 2017
Author

seanlinsley
Dec 10, 2017
Author

seanlinsley
Dec 10, 2017
Author

seanlinsley
Dec 10, 2017
Author

seanlinsley
Dec 13, 2017
Author

wspurgin
Dec 14, 2017