Provide find_in_batches and find_each #87

robacarp · 2017-11-06T14:40:26Z

I'm unsure if this is in the spirit of this project or not, so I wanted to put some code out there and start a discussion before investing enough time to complete the algorithm and write specs.

.all is a fine way to deal with a couple hundred or so rows in a table, but above that it becomes a performance risk. ActiveRecord provides a nice way to get around this problem by fetching N rows from the database at once, and iterating over them in step. find_each iterates individual records and find_in_batches iterates batches as arrays of records. I consider this pattern essential for large datasets.

I noticed that there aren't any tests around the existing finders, which might mean that I'd struggle to set an acceptable spec pattern for these finders.

Assuming that these functions are a desireable addition to granite, here is a list of things I think might be needed to merge this in:

find_in_batches does not deal at all with the parameterized query components
~~it also should check for limit and offset already present in the sql string? The only precedent here is first which does not check for the presence of LIMIT at all, so maybe not.~~
returning a lot of rows with SQL LIMIT and OFFSET isn't the best pattern for these functions, but I think it is minimally viable for all the drivers. Postgres has cursors, I'm not sure what the rest of the databases have. The weakness of LIMIT/OFFSET is that if the database is modified by the code here or elsewhere, duplicated or missing rows are possible. This should be implemented elsewhere / later, I think. It'll require diving into the database drivers explicitly.
specs for this code?

Thoughts?

drujensen · 2017-11-08T03:00:08Z

@robacarp I think this would be a great addition. I would also like a last as well since I tend to use that often.

drujensen · 2017-11-11T17:31:44Z

@robacarp I wanted to check in and see if you have made any progress on this feature. Looking forward to adding this new functionality! 💯

robacarp · 2017-11-13T15:35:31Z

@drujensen I'm happy to finish this up, but I'm a little unsure of the direction the specs should take since several of the other methods aren't spec'd at all.

drujensen · 2017-11-13T16:12:26Z

Yeah, the tests are more full integration tests instead of unit tests. I found this more useful since we need to test all 3 dbs.

Create a model and populate with test data. Verify find in batches works against all 3 drivers.

Let me know if you need help creating these.

robacarp · 2017-11-14T00:16:16Z

@drujensen I see, I didn't expect specs for these methods to be in this file: https://github.com/amberframework/granite-orm/blob/master/spec/adapter/pg_spec.cr so I overlooked them completely.

robacarp · 2017-11-14T21:54:18Z

@drujensen Updated, now with specs.

drujensen

lgtm. thanks for adding some tests. is there any edge cases we should test like when find_in_batches doesn't have enough items to create a full batch?

robacarp · 2017-11-14T23:41:03Z

@drujensen yes, there are a few more edge cases which still need to be tested:

batch: returning less than batch_size results in a smaller batch
both: never yielding when query is empty
both: starting from some offset other than 0
both: passing in a parameterized query and getting the expected results iterated

Any others you think are worth spec-ing?

drujensen · 2017-11-14T23:47:41Z

@robacarp that looks like an excellent list. Will you have time to add these? If not, we can look at doing them in another PR down the road.

robacarp · 2017-11-15T00:18:07Z

@drujensen I will, and should within the week. I can't guarantee that though 😅

This is a little out of scope of this pull, but the repetition between the three driver test suites makes me a little weary. It makes it difficult to see if there actually are any subtle differences between the drivers which need to be maintained. Is something like this out of the question?

{% adapters = ["pg","mysql","sqlite"] %}

{% for adapter in adapters %}

  {%
   parent_class = "Parent#{adapter}".id
   user_class = "User#{adapter}".id
   child_class = "Child#{adapter}".id
   role_class = "Role#{adapter}".id
  %}

class {{ parent_class.id }} < Granite::ORM::Base
  adapter {{ adapter.id }}

  has_many :users
  has_many :childs, through: :users

  field name : String
  timestamps
end

class {{ user_class.id }}< Granite::ORM::Base
  adapter {{ adapter.id }}

  belongs_to :parent
  belongs_to :child

  field name : String
  field pass : String
  field total : Int32
  timestamps
end

class {{ child_class.id }}< Granite::ORM::Base
  adapter {{ adapter.id }}

  has_many :users
  has_many :parents, through: :users

  field name : String
end


class {{ role_class.id }}< Granite::ORM::Base
  adapter {{ adapter.id }}

  primary custom_id : Int32
  field name : String
end

{{ parent_class }}.exec("DROP TABLE IF EXISTS parents;")
{{ parent_class }}.exec("CREATE TABLE parents (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);
")

{{ user_class }}.exec("DROP TABLE IF EXISTS users;")
{{ user_class }}.exec("CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR,
  pass VARCHAR,
  total INT,
  parent_id BIGINT,
  child_id BIGINT,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);
")

{{ child_class }}.exec("DROP TABLE IF EXISTS childs;")
{{ child_class }}.exec("CREATE TABLE childs (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR,
  user_id BIGINT,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);
")

{{ role_class }}.exec("DROP TABLE IF EXISTS roles;")
{{ role_class }}.exec("CREATE TABLE roles (
  custom_id SERIAL PRIMARY KEY,
  name VARCHAR
);
")


describe {{ adapter }} do
  Spec.before_each do
    {{ parent_class }}.clear
    {{ child_class }}.clear
    {{ role_class }}.clear
  end

  describe "#all" do
    it "finds all the users" do
      user = {{ user_class }}.new
      user.name = "Test {{ user_class }}"
      user.total = 10
      user.save
      user = {{ user_class }}.new
      user.name = "Test {{ user_class }} 2"
      user.save
      users = {{ user_class }}.all
      users.size.should eq 2
    end

    it "finds users matching clause using named substitution" do
      user = {{ user_class }}.new
      user.name = "Bob"
      user.total = 1000
      user.save
      user = {{ user_class }}.new
      user.name = "Joe"
      user.total = 2000
      user.save
      user = {{ user_class }}.new
      user.name = "Joline"
      user.total = 3000
      user.save

      users = {{ user_class }}.all("WHERE name LIKE $1", ["Jo%"])
      users.size.should eq 2
    end

    it "finds users matching clause using multiple named substitutions" do
      user = {{ user_class }}.new
      user.name = "Bob"
      user.total = 1000
      user.save
      user = {{ user_class }}.new
      user.name = "Joe"
      user.total = 2000
      user.save
      user = {{ user_class }}.new
      user.name = "Joline"
      user.total = 3000
      user.save

      users = {{ user_class }}.all("WHERE name LIKE ANY(ARRAY[$1, $2])", ["Joe%", "Joline%"])
      users.size.should eq 2
    end

    it "finds users matching clause using question mark substitution" do
      user = {{ user_class }}.new
      user.name = "Bob"
      user.total = 1000
      user.save
      user = {{ user_class }}.new
      user.name = "Joe"
      user.total = 2000
      user.save
      user = {{ user_class }}.new
      user.name = "Joline"
      user.total = 3000
      user.save

      users = {{ user_class }}.all("WHERE name LIKE ?", ["Jo%"])
      users.size.should eq 2
    end

    it "finds users matching clause using multiple question mark substitutions" do
      user = {{ user_class }}.new
      user.name = "Bob"
      user.total = 1000
      user.save
      user = {{ user_class }}.new
      user.name = "Joe"
      user.total = 2000
      user.save
      user = {{ user_class }}.new
      user.name = "Joline"
      user.total = 3000
      user.save

      users = {{ user_class }}.all("WHERE name LIKE ANY(ARRAY[?, ?])", ["Joe%", "Joline%"])
      users.size.should eq 2
    end
  end

{% end %}

This is just bulk find/replace'd code, for demonstration. But I think this would DRY up the test suite, make it more clear that the bulk of methods across the drivers are identical, and most importantly highlight places where they differ.

drujensen · 2017-11-15T00:21:55Z

No, not at all. That is an excellent suggestion! There are some nuances between each driver but the adapters should try to handle those.

What if you create a new test suite in the root folder for your feature and then use this technique for your testing? We can look at moving the other test suites over to this technique if we find it works well.

robacarp · 2017-11-15T18:50:46Z

@drujensen The most recent commit has a spec which attempts to be polymorphic about testing #find_each and #find_in_batches, and the code isn't bad. Since granite doesn't provide migration functionality, it got messy around primary keys and other places in the table definitions, so I kept the schema for the test table to an absolute minimum. However, I think this is a good thing and will help to enforce testing only one thing at a time.

Let me know your thoughts.

drujensen · 2017-11-15T19:24:17Z

@robacarp The code is kinda ugly because of the large amount of macros but I think its better than copying the test 3 times per driver. I like that the feature testing is in one place for all three drivers. I think it will make things easier down the road.

@amberframework/core-team wdyt?

marksiemers · 2017-11-15T20:48:45Z

spec/adapter/mysql_spec.cr

@@ -99,6 +99,7 @@ describe Granite::Adapter::Mysql do
  Spec.before_each do


Unfortunately, this is a dangerous think to do. Spec.before_each will run before every single test in the test suite. It is not scoped by file nor describe block.

Often that leads to unintended consequences down the road.

That said, these statements look like they would do no harm, and if they would for a future test, there would likely be a compile time error.

@marksiemers Right, we are aware if that limitation. Do you have a better recommendation than doing it this way?

For this code, I think it is fine - I don't recommend or expect any changes.

I'm more wary of it in generated code - as the end user may not know the implications of putting things in there.

Any solution I have isn't perfect. You can wrap the code you want in a macro, and insert that macro "before_each" test where it applies.

This specific pattern was copied straight out of the other tests, but I'm aware that it's not ideal. I toyed around with some other patterns, but it ultimately nothing cleaned up the pattern.

One thought I had is that, should this pattern propagate and be used in a greater portion of the test suite, it may be useful to create a set of models which are used throughout the test suite. Dropped/created in spec_helper, and truncated as here but with the definition of that truncation clearly placed in spec_helper as well.

Hopefully it would be easier to deal with the subtleties of database drivers which have different levels of adherence to the SQL spec in different ways, and that polymorphism would be constrained to the spec_helper as well (or perhaps in a kind of run-at-spec-start migration file for each driver).

As for the heavy use of macros, there may be a better way to write code and I'm open to suggestions. I think it's really important that a project like this have as much parity between drivers as possible. One good way to enforce that is through uniform testing. As features are added, it becomes more and more likely driver tests would diverge in subtle ways.

Provide find_in_batches and find_each

ed35e5d

eliasjpr assigned drujensen Nov 7, 2017

specs for find_each and find_in_batches

14f12bd

drujensen approved these changes Nov 14, 2017

View reviewed changes

Fills out tests for find_each and find_in_batches

345fa6c

marksiemers approved these changes Nov 15, 2017

View reviewed changes

drujensen merged commit f57c7ac into amberframework:master Nov 18, 2017

robacarp deleted the find_each branch November 18, 2017 06:03

robacarp mentioned this pull request Dec 9, 2017

Cleaning up the test suite #100

Merged

faustinoaq added this to Done in Framework 2018 May 5, 2018

faustinoaq removed this from Done in Framework 2018 Jun 12, 2018

faustinoaq added this to Done in Amber Framework 2017 Jun 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide find_in_batches and find_each #87

Provide find_in_batches and find_each #87

robacarp commented Nov 6, 2017 •

edited

drujensen commented Nov 8, 2017

drujensen commented Nov 11, 2017

robacarp commented Nov 13, 2017

drujensen commented Nov 13, 2017

robacarp commented Nov 14, 2017

robacarp commented Nov 14, 2017

drujensen left a comment

robacarp commented Nov 14, 2017 •

edited

drujensen commented Nov 14, 2017

robacarp commented Nov 15, 2017

drujensen commented Nov 15, 2017

robacarp commented Nov 15, 2017

drujensen commented Nov 15, 2017

marksiemers Nov 15, 2017

drujensen Nov 15, 2017 •

edited

marksiemers Nov 15, 2017

robacarp Nov 16, 2017

robacarp Nov 16, 2017

		@@ -99,6 +99,7 @@ describe Granite::Adapter::Mysql do
		Spec.before_each do

Provide find_in_batches and find_each #87

Provide find_in_batches and find_each #87

Conversation

robacarp commented Nov 6, 2017 • edited

drujensen commented Nov 8, 2017

drujensen commented Nov 11, 2017

robacarp commented Nov 13, 2017

drujensen commented Nov 13, 2017

robacarp commented Nov 14, 2017

robacarp commented Nov 14, 2017

drujensen left a comment

Choose a reason for hiding this comment

robacarp commented Nov 14, 2017 • edited

drujensen commented Nov 14, 2017

robacarp commented Nov 15, 2017

drujensen commented Nov 15, 2017

robacarp commented Nov 15, 2017

drujensen commented Nov 15, 2017

marksiemers Nov 15, 2017

Choose a reason for hiding this comment

drujensen Nov 15, 2017 • edited

Choose a reason for hiding this comment

marksiemers Nov 15, 2017

Choose a reason for hiding this comment

robacarp Nov 16, 2017

Choose a reason for hiding this comment

robacarp Nov 16, 2017

Choose a reason for hiding this comment

robacarp commented Nov 6, 2017 •

edited

robacarp commented Nov 14, 2017 •

edited

drujensen Nov 15, 2017 •

edited