-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide find_in_batches and find_each #87
Conversation
@robacarp I think this would be a great addition. I would also like a |
@robacarp I wanted to check in and see if you have made any progress on this feature. Looking forward to adding this new functionality! 💯 |
@drujensen I'm happy to finish this up, but I'm a little unsure of the direction the specs should take since several of the other methods aren't spec'd at all. |
Yeah, the tests are more full integration tests instead of unit tests. I found this more useful since we need to test all 3 dbs. Create a model and populate with test data. Verify find in batches works against all 3 drivers. Let me know if you need help creating these. |
@drujensen I see, I didn't expect specs for these methods to be in this file: https://github.com/amberframework/granite-orm/blob/master/spec/adapter/pg_spec.cr so I overlooked them completely. |
@drujensen Updated, now with specs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thanks for adding some tests. is there any edge cases we should test like when find_in_batches
doesn't have enough items to create a full batch?
@drujensen yes, there are a few more edge cases which still need to be tested:
Any others you think are worth spec-ing? |
@robacarp that looks like an excellent list. Will you have time to add these? If not, we can look at doing them in another PR down the road. |
@drujensen I will, and should within the week. I can't guarantee that though 😅 This is a little out of scope of this pull, but the repetition between the three driver test suites makes me a little weary. It makes it difficult to see if there actually are any subtle differences between the drivers which need to be maintained. Is something like this out of the question? {% adapters = ["pg","mysql","sqlite"] %}
{% for adapter in adapters %}
{%
parent_class = "Parent#{adapter}".id
user_class = "User#{adapter}".id
child_class = "Child#{adapter}".id
role_class = "Role#{adapter}".id
%}
class {{ parent_class.id }} < Granite::ORM::Base
adapter {{ adapter.id }}
has_many :users
has_many :childs, through: :users
field name : String
timestamps
end
class {{ user_class.id }}< Granite::ORM::Base
adapter {{ adapter.id }}
belongs_to :parent
belongs_to :child
field name : String
field pass : String
field total : Int32
timestamps
end
class {{ child_class.id }}< Granite::ORM::Base
adapter {{ adapter.id }}
has_many :users
has_many :parents, through: :users
field name : String
end
class {{ role_class.id }}< Granite::ORM::Base
adapter {{ adapter.id }}
primary custom_id : Int32
field name : String
end
{{ parent_class }}.exec("DROP TABLE IF EXISTS parents;")
{{ parent_class }}.exec("CREATE TABLE parents (
id BIGSERIAL PRIMARY KEY,
name VARCHAR,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
")
{{ user_class }}.exec("DROP TABLE IF EXISTS users;")
{{ user_class }}.exec("CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
name VARCHAR,
pass VARCHAR,
total INT,
parent_id BIGINT,
child_id BIGINT,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
")
{{ child_class }}.exec("DROP TABLE IF EXISTS childs;")
{{ child_class }}.exec("CREATE TABLE childs (
id BIGSERIAL PRIMARY KEY,
name VARCHAR,
user_id BIGINT,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
")
{{ role_class }}.exec("DROP TABLE IF EXISTS roles;")
{{ role_class }}.exec("CREATE TABLE roles (
custom_id SERIAL PRIMARY KEY,
name VARCHAR
);
")
describe {{ adapter }} do
Spec.before_each do
{{ parent_class }}.clear
{{ child_class }}.clear
{{ role_class }}.clear
end
describe "#all" do
it "finds all the users" do
user = {{ user_class }}.new
user.name = "Test {{ user_class }}"
user.total = 10
user.save
user = {{ user_class }}.new
user.name = "Test {{ user_class }} 2"
user.save
users = {{ user_class }}.all
users.size.should eq 2
end
it "finds users matching clause using named substitution" do
user = {{ user_class }}.new
user.name = "Bob"
user.total = 1000
user.save
user = {{ user_class }}.new
user.name = "Joe"
user.total = 2000
user.save
user = {{ user_class }}.new
user.name = "Joline"
user.total = 3000
user.save
users = {{ user_class }}.all("WHERE name LIKE $1", ["Jo%"])
users.size.should eq 2
end
it "finds users matching clause using multiple named substitutions" do
user = {{ user_class }}.new
user.name = "Bob"
user.total = 1000
user.save
user = {{ user_class }}.new
user.name = "Joe"
user.total = 2000
user.save
user = {{ user_class }}.new
user.name = "Joline"
user.total = 3000
user.save
users = {{ user_class }}.all("WHERE name LIKE ANY(ARRAY[$1, $2])", ["Joe%", "Joline%"])
users.size.should eq 2
end
it "finds users matching clause using question mark substitution" do
user = {{ user_class }}.new
user.name = "Bob"
user.total = 1000
user.save
user = {{ user_class }}.new
user.name = "Joe"
user.total = 2000
user.save
user = {{ user_class }}.new
user.name = "Joline"
user.total = 3000
user.save
users = {{ user_class }}.all("WHERE name LIKE ?", ["Jo%"])
users.size.should eq 2
end
it "finds users matching clause using multiple question mark substitutions" do
user = {{ user_class }}.new
user.name = "Bob"
user.total = 1000
user.save
user = {{ user_class }}.new
user.name = "Joe"
user.total = 2000
user.save
user = {{ user_class }}.new
user.name = "Joline"
user.total = 3000
user.save
users = {{ user_class }}.all("WHERE name LIKE ANY(ARRAY[?, ?])", ["Joe%", "Joline%"])
users.size.should eq 2
end
end
{% end %} This is just bulk find/replace'd code, for demonstration. But I think this would DRY up the test suite, make it more clear that the bulk of methods across the drivers are identical, and most importantly highlight places where they differ. |
No, not at all. That is an excellent suggestion! There are some nuances between each driver but the adapters should try to handle those. What if you create a new test suite in the root folder for your feature and then use this technique for your testing? We can look at moving the other test suites over to this technique if we find it works well. |
@drujensen The most recent commit has a spec which attempts to be polymorphic about testing Let me know your thoughts. |
@robacarp The code is kinda ugly because of the large amount of macros but I think its better than copying the test 3 times per driver. I like that the feature testing is in one place for all three drivers. I think it will make things easier down the road. @amberframework/core-team wdyt? |
@@ -99,6 +99,7 @@ describe Granite::Adapter::Mysql do | |||
Spec.before_each do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this is a dangerous think to do. Spec.before_each
will run before every single test in the test suite. It is not scoped by file nor describe
block.
Often that leads to unintended consequences down the road.
That said, these statements look like they would do no harm, and if they would for a future test, there would likely be a compile time error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marksiemers Right, we are aware if that limitation. Do you have a better recommendation than doing it this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this code, I think it is fine - I don't recommend or expect any changes.
I'm more wary of it in generated code - as the end user may not know the implications of putting things in there.
Any solution I have isn't perfect. You can wrap the code you want in a macro, and insert that macro "before_each" test where it applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific pattern was copied straight out of the other tests, but I'm aware that it's not ideal. I toyed around with some other patterns, but it ultimately nothing cleaned up the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought I had is that, should this pattern propagate and be used in a greater portion of the test suite, it may be useful to create a set of models which are used throughout the test suite. Dropped/created in spec_helper, and truncated as here but with the definition of that truncation clearly placed in spec_helper as well.
Hopefully it would be easier to deal with the subtleties of database drivers which have different levels of adherence to the SQL spec in different ways, and that polymorphism would be constrained to the spec_helper as well (or perhaps in a kind of run-at-spec-start migration file for each driver).
As for the heavy use of macros, there may be a better way to write code and I'm open to suggestions. I think it's really important that a project like this have as much parity between drivers as possible. One good way to enforce that is through uniform testing. As features are added, it becomes more and more likely driver tests would diverge in subtle ways.
I'm unsure if this is in the spirit of this project or not, so I wanted to put some code out there and start a discussion before investing enough time to complete the algorithm and write specs.
.all
is a fine way to deal with a couple hundred or so rows in a table, but above that it becomes a performance risk. ActiveRecord provides a nice way to get around this problem by fetching N rows from the database at once, and iterating over them in step.find_each
iterates individual records andfind_in_batches
iterates batches as arrays of records. I consider this pattern essential for large datasets.I noticed that there aren't any tests around the existing finders, which might mean that I'd struggle to set an acceptable spec pattern for these finders.
Assuming that these functions are a desireable addition to granite, here is a list of things I think might be needed to merge this in:
find_in_batches
does not deal at all with the parameterized query componentsit also should check for limit and offset already present in the sql string? The only precedent here isfirst
which does not check for the presence ofLIMIT
at all, so maybe not.returning a lot of rows with SQLThis should be implemented elsewhere / later, I think. It'll require diving into the database drivers explicitly.LIMIT
andOFFSET
isn't the best pattern for these functions, but I think it is minimally viable for all the drivers. Postgres has cursors, I'm not sure what the rest of the databases have. The weakness ofLIMIT
/OFFSET
is that if the database is modified by the code here or elsewhere, duplicated or missing rows are possible.Thoughts?