Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arel Extra #189

Merged
merged 16 commits into from
Oct 6, 2019
Merged

Arel Extra #189

merged 16 commits into from
Oct 6, 2019

Conversation

pedrocarmona
Copy link
Contributor

@pedrocarmona pedrocarmona commented Sep 15, 2019

Edit 18 September

  • Rename to Arel (pagy_arel).
  • Made query with Arel methods.
  • [update] Works with Postgres and MySQL.

Initial PR
Improve the query for counting rows of an active record collection with group by.

When user have an active record relation with includes group by clauses, the collection.count(:all) retrieves a hash.
In backend.rb, pagy checks if a result is a hash and returns count based on that. In this PR is introduced aggregated extra, in which the values are never loaded into active record as an hashes, and it knows how to perform count all of rows in group by relations.

  • Only tested in Postgres though. Not sure about other DBs.
  • Did not do actual performance test.

@ddnexus
Copy link
Owner

ddnexus commented Sep 16, 2019

Thank you Pedro!
That looks quite useful! Nice that you organized it as an extra, also consistent with the unwritten principles applied to other extras!

Since it will be an AR/Arel-only extra, it would probably be better if we could name it as something like arel_aggregated or arel_aggregate or arel_count... or whatever is more meaningful (and possibly shorter), still including arel that looks a fundamental component of the logic.

I have a couple of questions:

  • Is there any better way than using a sql string to query the count? It could be more portable between DBs...
  • Is that working with any collection, even not grouped?

@codecov
Copy link

codecov bot commented Sep 18, 2019

Codecov Report

Merging #189 into dev will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@         Coverage Diff         @@
##            dev   #189   +/-   ##
===================================
  Coverage   100%   100%           
===================================
  Files        25     26    +1     
  Lines       514    528   +14     
===================================
+ Hits        514    528   +14
Impacted Files Coverage Δ
lib/pagy/extras/arel.rb 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a2d9b0...2063c47. Read the comment docs.

@pedrocarmona pedrocarmona changed the title Aggregated / Grouped size Arel Extra Sep 18, 2019
@pedrocarmona
Copy link
Contributor Author

updated PR description

@pedrocarmona
Copy link
Contributor Author

now it will work with all collections, grouped and non-grouped (using #group_values)

@ddnexus
Copy link
Owner

ddnexus commented Sep 18, 2019

Hi Pedro, it's starting to look very good now! Thank you!

Just a few of comments:

  • I love the name!
  • When I asked whether "that" would work also with non-grouped collections, I meant to say "the query" not the code. The fewer conditions you test for, the better your code “tastes”. is a good principle, so does that query (or another query) work for both conditions without introducing a condition?
  • It looks like your tests are missing a line

always use `over` keyword even in non grouped collections
@pedrocarmona
Copy link
Contributor Author

Hi Domizio, followed your suggestion and removed condition.
Indeed the query works for both non grouped and grouped collections, so no need to apply condition.

@ddnexus
Copy link
Owner

ddnexus commented Sep 18, 2019

Than you Pedro.
You have a couple of robocop complains that make the tests fail.

@pedrocarmona
Copy link
Contributor Author

When using this in normal collections is not very performant, because it will execute a Seq Scan.

@ddnexus
Copy link
Owner

ddnexus commented Sep 18, 2019

Does that mean that the condition would save time after all?

@ddnexus
Copy link
Owner

ddnexus commented Sep 18, 2019

BTW, I like the idea to use Arel to get the right count instead of having to pass a custom count... we should check if we could find a way that does so efficiently, even if it doesn't "taste" so good :).

I am thinking that it would be a great extra to use in ALL the apps that use AR if it would be efficient. I mean: why would anyone use the vanilla backend if arel does it better?

@pedrocarmona
Copy link
Contributor Author

pedrocarmona commented Sep 18, 2019

Made a smallbenchmark using demo data from [movielens] (Small: 100,000 ratings)(https://grouplens.org/datasets/movielens/):

https://gist.github.com/pedrocarmona/00d6980758343cbd5abae4787a7250f6

it shows that pagy_arel_conditional_count is the best

@ddnexus
Copy link
Owner

ddnexus commented Sep 18, 2019

WOW. The vanilla pagy count with group is crazy slow compared to arel!
I don't know which one of the arel options I would choose though.

In that specific count it looks like the condition slows it down just a bit when it is grouped and improves a bit more in case of non-grouped, which is probably related to the total complexity of the query, since the condition is kind of fixed time IMO.

I may still have a little preference for the non-conditional one.
I think I prefer the non-conditional one, since you can still use the vanilla pagy non-grouping, if you really want to gain a few IPSs.

What do you think?

@pedrocarmona
Copy link
Contributor Author

Regarding benchmark:

Updated the benchmark, I had a loop inside of 1000 iterations inside which was not need. I also added 2 alternatives to the vanilla to check the damage of using is_a? and also storing into a variable. But with the new way of benchmark, there doesn't seam to be a big difference.

Regarding pagy PR changes:

Between benchmark pagy_arel_conditional_count and benchmark pagy_arel_count, I have preference for pagy_arel_conditional_count, because it handles both cases well. The pagy_arel_count is really bad in non grouped collections.

For me the vanilla has advantage of being super simple and readable code, and also, is built on top of the public active record api, which will not easily change - the pagy_arel_conditional_count uses group_values which might be a bit more susceptible to changes as new versions of rails are released.

I think most people should choose the vanilla pagy backend because it is built on top of stable public rails api. The pagy_arel (using the benchmark pagy_arel_conditional_count) could be used for performance in grouped collections (which might not be all cases).

@ddnexus
Copy link
Owner

ddnexus commented Sep 19, 2019

Thank you for using benckmark/ips: I always use it since it is accurate and very easy to read.
All your comments make sense. Please, use the condition as you suggested.

After the benchmarks I think I am going to change the way pagy counts the grouped collections: the group_values is very specific of AR, but also count(:all) is, so it makes sense. I should probably add a rescue that should inform the user about what to do if the collection is not an AR scope.

@ddnexus
Copy link
Owner

ddnexus commented Sep 24, 2019

Hey Pedro, do you plan other changes/additions or the code is complete?
Because if the code is complete, we miss only to update the doc for it.
Do you prefer to do it or do you want me to help?

@pedrocarmona
Copy link
Contributor Author

Hi Domizio,
Added the docs, it is kind of a draft, let me know if you have more suggestions.

While making the docs, I realised I didn't tested both pagy and pagy_arel against a small group (for instance a group resulting in 10 lines. I will add to benchmark that case soon.

@ddnexus ddnexus changed the base branch from master to dev October 6, 2019 04:48
@ddnexus ddnexus merged commit 4856782 into ddnexus:dev Oct 6, 2019
@ddnexus ddnexus added the merged label Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants