Speed up scope counts #7868
Replies: 9 comments
-
Can you provide some benchmarks for a possible improvement? |
Beta Was this translation helpful? Give feedback.
-
Would scope counts be a good place to use AJAX? Perhaps a plugin like @holyketzer's async_panel? |
Beta Was this translation helpful? Give feedback.
-
I suppose this optimized query doesn't give meaningful performance boost. It's complexity should be a sum of complexity of it's sub-queries, so your profit is only twice decreased network latency to the database, it's a very small profit. |
Beta Was this translation helpful? Give feedback.
-
@holyketzer while doing all the calculations as one query does incur the cost of the calculations, it's still faster than doing them separately because the database only has to load the records once, then can do the calculations while they're in memory. The gains multiply as you add more and more scopes. In this example, 1 combined query is twice as fast as 6 sepearate queries: (1 second vs 2 seconds) -- a table with 10 million rows
create table foo as select id from generate_series(1, 10000000) id;
SELECT 10000000
Time: 4908.401 ms
select count(*) from foo;
count
----------
10000000
(1 row)
Time: 329.191 ms
select count(*) from foo where id % 2 = 0;
count
---------
5000000
(1 row)
Time: 382.320 ms
select count(*) from foo where id % 2 = 1;
count
---------
5000000
(1 row)
Time: 357.944 ms
select count(*) from foo where id % 3 = 0;
count
---------
3333333
(1 row)
Time: 345.678 ms
select count(*) from foo where id % 4 = 0;
count
---------
2500000
(1 row)
Time: 351.968 ms
select count(*) from foo where id % 5 = 0;
count
---------
2000000
(1 row)
Time: 338.696 ms
select
count(*) as all_count,
count(*) filter (where id % 2 = 0) as even_count,
count(*) filter (where id % 2 = 1) as odd_count,
count(*) filter (where id % 3 = 0) as three_divisible_count,
count(*) filter (where id % 4 = 0) as four_divisible_count,
count(*) filter (where id % 5 = 0) as five_divisible_count
from foo;
all_count | even_count | odd_count | three_divisible_count | four_divisible_count | five_divisible_count
-----------+------------+-----------+-----------------------+----------------------+----------------------
10000000 | 5000000 | 5000000 | 3333333 | 2500000 | 2000000
(1 row)
Time: 936.567 ms But in my experience, the biggest gain will be in not having to tax the database with expensive |
Beta Was this translation helpful? Give feedback.
-
We can also potentially eliminate the database request that Kaminari does to build pagination. I mentioned in the orginal ticket description that select
count(case when status = 'enabled' then 1 end) as enabled_count,
count(case when status = 'disabled' then 1 end) as disabled_count
from users; |
Beta Was this translation helpful? Give feedback.
-
It's possible to combine this query with the main query that renders the index page by using a window function, meaning that we might be able to render the entire UI with just one database request. Here's a set of patches I wrote to test the idea: ActiveAdmin::Scope.prepend Module.new {
attr_reader :embedded_count
def initialize(name, method = nil, options = {}, &block)
super
@embedded_count = options[:embedded_count]
end
}
class ActiveAdmin::Views::Scopes
def get_scope_count(scope)
if scope.embedded_count
# TODO: it needs access to the already-loaded records that'll be used to render the index, in order to avoid doing a database request here via `first`
collection_before_scope.first.public_send scope.embedded_count
else
collection_size(scope_chain(scope, collection_before_scope))
end
end
end scope :all, embedded_count: :all_count, default: true
scope :enabled_count, embedded_count: :enabled_count
scope :disabled_count, embedded_count: :disabled_count
controller do
def scoped_collection
super.select <<-SQL.squish
users.*,
count(users.*) over () as all_count,
count(users.*) filter (where users.status = 'enabled') over () as enabled_count,
count(users.*) filter (where users.status = 'disabled') over () as disabled_count
SQL
end
end I haven't benchmarked it yet. It's possible that it may be slower than just doing a separate query, depending on how databases implement the count internally (whether the count is just done once, or if it's done for each individual record). |
Beta Was this translation helpful? Give feedback.
-
🤔 so it looks like Postgres does the calculations only once, but they're consistently slower than when run separately? select
count(*) as count,
count(*) filter (where id % 2 = 0) as even,
count(*) filter (where id % 2 = 1) as odd
from foo;
count | even | odd
----------+---------+---------
10000000 | 5000000 | 5000000
(1 row)
Time: 557.827 ms
select
foo.*,
count(*) over () as count,
count(*) filter (where id % 2 = 0) over () as even,
count(*) filter (where id % 2 = 1) over () as odd
from foo
limit 10;
id | count | even | odd
----+----------+---------+---------
1 | 10000000 | 5000000 | 5000000
2 | 10000000 | 5000000 | 5000000
3 | 10000000 | 5000000 | 5000000
4 | 10000000 | 5000000 | 5000000
5 | 10000000 | 5000000 | 5000000
6 | 10000000 | 5000000 | 5000000
7 | 10000000 | 5000000 | 5000000
8 | 10000000 | 5000000 | 5000000
9 | 10000000 | 5000000 | 5000000
10 | 10000000 | 5000000 | 5000000
(10 rows)
Time: 3173.349 ms
explain analyze select
foo.*,
count(*) over () as count,
count(*) filter (where id % 2 = 0) over () as even,
count(*) filter (where id % 2 = 1) over () as odd
from foo
limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.42 rows=10 width=28) (actual time=3986.701..3986.707 rows=10 loops=1)
-> WindowAgg (cost=0.00..419249.80 rows=10000048 width=28) (actual time=3986.701..3986.707 rows=10 loops=1)
-> Seq Scan on foo (cost=0.00..144248.48 rows=10000048 width=4) (actual time=0.016..874.944 rows=10000000 loops=1)
Planning time: 0.064 ms
Execution time: 4008.452 ms
(5 rows)
Time: 4008.842 ms
explain analyze select
foo.*,
count(*) over () as count,
count(*) filter (where id % 2 = 0) over () as even,
count(*) filter (where id % 2 = 1) over () as odd
from foo
limit 1000000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..41924.78 rows=1000000 width=28) (actual time=3921.544..4189.906 rows=1000000 loops=1)
-> WindowAgg (cost=0.00..419249.80 rows=10000048 width=28) (actual time=3921.543..4092.089 rows=1000000 loops=1)
-> Seq Scan on foo (cost=0.00..144248.48 rows=10000048 width=4) (actual time=0.013..869.517 rows=10000000 loops=1)
Planning time: 0.053 ms
Execution time: 4263.458 ms
(5 rows)
Time: 4263.826 ms Ah, window functions don't support parallel query execution, while regular explain analyze select
count(*) as count,
count(*) filter (where id % 2 = 0) as even,
count(*) filter (where id % 2 = 1) as odd
from foo;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=133958.44..133958.45 rows=1 width=24) (actual time=840.453..840.453 rows=1 loops=1)
-> Gather (cost=133958.10..133958.41 rows=3 width=24) (actual time=840.390..840.446 rows=4 loops=1)
Workers Planned: 3
Workers Launched: 3
-> Partial Aggregate (cost=132958.10..132958.11 rows=1 width=24) (actual time=833.231..833.232 rows=1 loops=4)
-> Parallel Seq Scan on foo (cost=0.00..76506.22 rows=3225822 width=4) (actual time=0.021..283.485 rows=2500000 loops=4)
Planning time: 0.070 ms
Execution time: 851.399 ms
(8 rows)
Time: 851.822 ms Still, 2 queries is much better than 2 + (1 query per scope). |
Beta Was this translation helpful? Give feedback.
-
As of Rails 5.2 we should be able to do That means this could be a feature that's on by default, with no configuration changes necessary on the user's part. If people have scopes that join in associations, those should probably stay as their own database query. |
Beta Was this translation helpful? Give feedback.
-
For pre-5.2 there are other "methods" to extract the where clause with binds from a relation, but it's still all through private APIs in Active Record. All the methods to extract that clause will be different in 4.2, and 5.0. And 5.1 is the same as 5.2 (I believe). You can checkout the history of the |
Beta Was this translation helpful? Give feedback.
-
In the case that you have a complex query applied to your index page (perhaps through a filter) that takes a long time, say 150 ms, you can end up losing around a second to scope counts. This happens because for each scope (the page in question had five), we perform a database request.
Postgres 9.4 supports an easy way to request these counts without performing multiple database requests: filters.
For example, given 5 users who are enabled and 7 users are disabled:
The downside is that no other database yet supports it (it is part of the SQL standard though).
Questions:
filter
?Beta Was this translation helpful? Give feedback.
All reactions