Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk import #168

Merged
merged 24 commits into from
Apr 26, 2018
Merged

Bulk import #168

merged 24 commits into from
Apr 26, 2018

Conversation

Blacksmoke16
Copy link
Contributor

@Blacksmoke16 Blacksmoke16 commented Mar 30, 2018

PR for #156

Progress

  • Create method/initial MySQL adapter code
  • Find a better way to handle quoting of String datatypes.
  • Test with various MySQL/granite datatypes
  • Handle ON DUPLICATE KEY UPDATE/IGNORE
  • Add Postgres/sqlite adapter versions
  • Add tests
  • Update Readme

Questions:

  1. How should the lifecycle hooks be handled?
    Either, hooks would not run for each model being inserted, but would get triggered once all are inserted and such.

next unless model.valid?
stmt << "("
stmt << model.to_h[primary_name].to_s + ", " unless primary_auto
stmt << fields.map { |field| model.to_h[field].is_a?(String) ? "'" + model.to_h[field].to_s + "'" : model.to_h[field] }.join(", ")
Copy link
Contributor Author

@Blacksmoke16 Blacksmoke16 Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of how I'm handling the quoting of String type fields. Any better ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adapters now provide a quote method but the functionality is has some bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quoting method I do use for column/table names. However, since it quotes with the backtick that couldn't be used for string values afaik?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't see that you are quoting values. A parameterized query is better for that, and I believe the adapters automatically do what they need to with values in parameterized queries.

Copy link
Contributor Author

@Blacksmoke16 Blacksmoke16 Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial thought, however when I tried it I kept getting like db params invalid format or something like that (not home atm to look). The params should just be an array of values, which the first value of array maps to first ? placeholder correct?

I'll mess with it again and see if I can get the params working prob this weekend.

Copy link
Member

@robacarp robacarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I think it's going to be a good feature.

.gitignore Outdated
@@ -4,6 +4,7 @@
/.crystal/
/doc/
/config/*.db
/.idea/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is an editor thing. You should setup and ignore your editor files in a system wide global gitignore file instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Blacksmoke16 interesting, Are you coding with Intellij Idea? just curious, What plugin are you using to have crystal support? 😅

Copy link
Contributor Author

@Blacksmoke16 Blacksmoke16 Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@faustinoaq I am yes :) Using Rubymine with https://github.com/intellij-crystal/intellij-crystal

Is pretty meh in regards to full native support like Ruby in Rubymine, but it provides syntax highlighting and that's good enough for me atm.

next unless model.valid?
stmt << "("
stmt << model.to_h[primary_name].to_s + ", " unless primary_auto
stmt << fields.map { |field| model.to_h[field].is_a?(String) ? "'" + model.to_h[field].to_s + "'" : model.to_h[field] }.join(", ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adapters now provide a quote method but the functionality is has some bugs.

# the array must contain only one model class
# invalid model records will be skipped
def self.import(model_array)
raise ArgumentError.new("Model class mismatch: expected array of only #{self} models.") unless model_array.all? { |model| model.class == self }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this definition be changed so that the compiler will enforce the type constraint instead of a runtime error?

First guess, something like this might work:

def self.import(model_array : Array(self))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo yes! I will do that tonight (still getting used to static typed land :P)

@faustinoaq faustinoaq added this to To Do in Framework 2018 via automation Mar 30, 2018
@faustinoaq faustinoaq moved this from To Do to In progress in Framework 2018 Mar 30, 2018
Add type filter to import class method
Reformat some code
next unless model.valid?
stmt << '('
stmt << "?," unless primary_auto
stmt << Array.new(fields.size, '?').join(',')
Copy link
Contributor Author

@Blacksmoke16 Blacksmoke16 Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unless primary_auto references above/below here can be removed once #169 gets merged, since fields will cover it by default.

Probably also be able to remove the primary_auto param since that is only reason i needed it.

@Blacksmoke16
Copy link
Contributor Author

@robacarp What do you think the best way to handle the lifecycle hooks is for this?

Best approach/worth doing/etc?

@Blacksmoke16
Copy link
Contributor Author

Blacksmoke16 commented Apr 1, 2018

Am currently having some issues with timestamps and/or the import statement setting timestamps.

https://travis-ci.org/amberframework/granite-orm/builds/360749166#L579

Am not sure what is causing it...have to keep messing with it.

EDIT:
Locally it also breaks the timestamps tests, just by inserting in the records. However, if i do not inert then try and to Model.find x it works fine. It is only when trying to read a value AFTER bulk inserting a record does it blow up.

Error I am getting locally:

spec_1   |  11) mysql timestamps truncates the subsecond parts of updated_at
spec_1   | 
spec_1   |        MySql::ResultSet#read returned a String. A (Int64 | Nil) was expected.
spec_1   |        lib/db/src/db/result_set.cr:0:9 in 'read'
spec_1   |        spec/spec_models.cr:6:1 in 'set_attributes'
spec_1   |        spec/spec_models.cr:6:1 in 'from_sql'
spec_1   |        src/granite_orm/querying.cr:84:13 in 'find_by?'
spec_1   |        src/granite_orm/querying.cr:90:5 in 'find_by'
spec_1   |        src/granite_orm/querying.cr:77:12 in 'find'
spec_1   |        /usr/share/crystal/src/spec/expectations.cr:0:7 in '~proc10Proc(Nil)'
spec_1   |        /usr/share/crystal/src/spec/methods.cr:255:3 in 'it'
spec_1   |        /usr/share/crystal/src/spec/expectations.cr:0:7 in '~proc6Proc(Nil)'
spec_1   |        /usr/share/crystal/src/spec/context.cr:255:3 in 'describe'
spec_1   |        /usr/share/crystal/src/spec/methods.cr:16:5 in 'describe'
spec_1   |        spec/granite_orm/fields/timestamps_spec.cr:4:1 in '__crystal_main'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:11:3 in '_crystal_main'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:112:5 in 'main_user_code'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:101:7 in 'main'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:135:3 in 'main'
spec_1   |        __libc_start_main
spec_1   |        _start
spec_1   |        ???

A list of all the affected test:

spec_1   | crystal spec spec/granite_orm/transactions/import_spec.cr:3 # pg .import should import 3 new objects
spec_1   | crystal spec spec/granite_orm/transactions/import_spec.cr:3 # mysql .import should import 3 new objects
spec_1   | crystal spec spec/granite_orm/transactions/import_spec.cr:3 # sqlite .import should import 3 new objects
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps consistently uses UTC for created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps consistently uses UTC for updated_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps truncates the subsecond parts of created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps truncates the subsecond parts of updated_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps consistently uses UTC for created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps consistently uses UTC for updated_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps truncates the subsecond parts of created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps truncates the subsecond parts of updated_at

@robacarp
Copy link
Member

robacarp commented Apr 2, 2018

@Blacksmoke16

  • My guess about the error you're getting is that you're not adding the id to the select statement, or that the fields are in the wrong order.
  • Re: lifecycle events, what did you have in mind? Something other than the before/after create?

@Blacksmoke16
Copy link
Contributor Author

Blacksmoke16 commented Apr 2, 2018

I'll look into that, thanks!

Probably best to just keep it before_create, run import statement, after_create? Mostly my question is how that would specifically work since each model isn't saved normally so without running the callback in the .each of models_array it wouldn't really be similar to Model.new(xxx).save if that makes sense....

EDIT: To be more clear, should the callbacks be ran for EACH model like before_save, add to insert string, after_create. OR should the callbacks be executed before and after the import statement is ran.

@robacarp
Copy link
Member

robacarp commented Apr 2, 2018

@Blacksmoke16 ah, of course. I think they should run for each record. For example a user object which needs a password hash is not a valid record until the before_create executes.

Proposing a workflow here:

  • Model.import(array_of_records) is called
  • before_create is called on each object
  • every object is inserted
  • bail if something errored
  • after_create is called on each object
  • Model.import returns

@amberframework/granite-orm-contributors thoughts?

@robacarp
Copy link
Member

robacarp commented Apr 2, 2018

@Blacksmoke16 FYI, depending on your personal git workflow, you may need to update your .git/config to point to the corrected granite repository, as of my mistake in #151.

end.chomp(',')

if update_keys = options["on_duplicate_key_update"]?
statement += " ON CONFLICT (#{quote(primary_name)}) DO UPDATE SET "
Copy link
Member

@robacarp robacarp Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems right, but would you mind explaining the behavior in a terse comment for future maintainers? (And me...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I plan on updating the READ me with info on how to use everything. But basic this was part of Postgres' 9.5 version:

https://www.postgresql.org/docs/9.5/static/sql-insert.html

It's basic the PostGres version of MySQL ON DUPLICATE KEY UPDATE

From the docs:

INSERT INTO distributors (did, dname)
    VALUES (5, 'Gizmo Transglobal'), (6, 'Associated Computing, Inc')
    ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname;

Is basic saying if there is a conflict on the did value, then dname should be updated to the new value that would have otherwise been excluded.

@@ -76,6 +76,47 @@ class Granite::Adapter::Pg < Granite::Adapter::Base
end
end

def import(table_name : String, primary_name : String, fields, model_array, **options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your **options can be a little cleaner and have a similar effect:

def import(table_name : String, primary_name : String, fields, model_array : Array(self), *, update_on_duplicate_key = false)

Copy link
Contributor Author

@Blacksmoke16 Blacksmoke16 Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean i would need to have a defined param for each options right?

As of now there is on_duplicate_key_ignore: Bool, on_duplicate_key_update: Array(Strings) to be updated, and possibly in future also validate: true, as giving option to ignore validation might speed up large imports. I don't have strong feelings one way or another.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably have the best of both worlds using a crystal overloaded method signature. If you need, here are the docs on overloading

Copy link
Member

@robacarp robacarp Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t have strong opinions on this either. I just noticed a rubyism that might be able to be cleaned up with some crystal.

@Blacksmoke16
Copy link
Contributor Author

Just a little update on this. Been busy with finals coming up and such, will make some more commits this weekend.

# The save method will check to see if the primary exists yet. If it does it
# will call the update method, otherwise it will call the create method.
# This will update the timestamps apropriately.
# This will update the timestamps appropriately.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

update_keys.each do |key|
statement += "#{quote(key)}=VALUES(#{quote(key)}), "
end
statement = statement.chomp(", ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does mysql not have an ON DUPLICATE IGNORE option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robacarp
Copy link
Member

robacarp commented Apr 7, 2018

@Blacksmoke16 I don't know because I didn't test this, but I think you'll be able to trigger the callbacks from the class level by using these methods. If it's not straightforward to trigger the callbacks externally, I'm fine pushing the callbacks to a later pull for the import feature. There's no need to get way out into the weeds.

@Blacksmoke16
Copy link
Contributor Author

I'll poke around with it this weekend and we can go from there. Thanks!

@Blacksmoke16
Copy link
Contributor Author

@robacarp from my testing I am able to trigger the callback successfully to a degree. Since we are in a class method, the method that is to be called in the model doesn't exist.

in macro '__run_before_create' expanded macro: macro_139699359621456:26, line 2:

   1.       
>  2.          increment
   3.       
   4.     

undefined local variable or method 'increment'

Where increment is the name of my before_create method.

@Blacksmoke16 Blacksmoke16 changed the title [WIP] Bulk import Bulk import Apr 12, 2018
README.md Outdated

Each model has an `import` class level method to import an array of models in one bulk insert statement.
```Crystal
models [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, missed = on these, i'll fix that tonight.

end
end

def self.import(model_array : Array(self), update_on_duplicate : Bool, columns : Array(String))
Copy link
Member

@drujensen drujensen Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you would never use update_on_duplicate and ignore_on_duplicate what if we use an enum instead of boolean flags that has options for how to handle duplicates? maybe duplicates: ignore duplicates: update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a nice way to also require columns key when using the update option?

Copy link
Member

@drujensen drujensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Blacksmoke16 did you figure out the callbacks?

Also, what happens when one of the rows fails to import? Do all of them rollback?

@Blacksmoke16
Copy link
Contributor Author

Sadly no, the issue is trying to run a model's instance method from the class scope. As of now that one __run_before_create method does not have access to them so it cannot run. Possibly can revisit this in another issue, to at least get the feature out.

I would have to test it if you have a specific case, otherwise iirc the whole import fails since it is just one SQL query.

Copy link
Member

@robacarp robacarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work here, I think this is a nice feature to add.

Copy link
Member

@drujensen drujensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 Excellent work.

@drujensen drujensen merged commit 644491e into amberframework:master Apr 26, 2018
Framework 2018 automation moved this from In progress to Done Apr 26, 2018
@Blacksmoke16 Blacksmoke16 deleted the bulk-import branch May 9, 2018 04:34
@faustinoaq
Copy link
Contributor

I just added Bulk import to amber documentation as well 🎉

https://amberframework.gitbook.io/amber/guides/models/granite/bulk-insertions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants