Bulk import #168

Blacksmoke16 · 2018-03-30T02:46:59Z

PR for #156

Progress

Create method/initial MySQL adapter code
Find a better way to handle quoting of String datatypes.
Test with various MySQL/granite datatypes
Handle ON DUPLICATE KEY UPDATE/IGNORE
Add Postgres/sqlite adapter versions
Add tests
Update Readme

Questions:

How should the lifecycle hooks be handled?
Either, hooks would not run for each model being inserted, but would get triggered once all are inserted and such.

Blacksmoke16 · 2018-03-30T02:49:15Z

src/adapter/mysql.cr

+        next unless model.valid?
+        stmt << "("
+        stmt << model.to_h[primary_name].to_s + ", " unless primary_auto
+        stmt << fields.map { |field| model.to_h[field].is_a?(String) ? "'" + model.to_h[field].to_s + "'" : model.to_h[field] }.join(", ")


I'm not a fan of how I'm handling the quoting of String type fields. Any better ideas?

The adapters now provide a quote method but the functionality is has some bugs.

The quoting method I do use for column/table names. However, since it quotes with the backtick that couldn't be used for string values afaik?

Sorry, I didn't see that you are quoting values. A parameterized query is better for that, and I believe the adapters automatically do what they need to with values in parameterized queries.

That was my initial thought, however when I tried it I kept getting like db params invalid format or something like that (not home atm to look). The params should just be an array of values, which the first value of array maps to first ? placeholder correct?

I'll mess with it again and see if I can get the params working prob this weekend.

robacarp

Thanks for working on this, I think it's going to be a good feature.

robacarp · 2018-03-30T14:59:43Z

.gitignore

@@ -4,6 +4,7 @@
 /.crystal/
 /doc/
 /config/*.db
+/.idea/


Oh, this is an editor thing. You should setup and ignore your editor files in a system wide global gitignore file instead.

@Blacksmoke16 interesting, Are you coding with Intellij Idea? just curious, What plugin are you using to have crystal support? 😅

@faustinoaq I am yes :) Using Rubymine with https://github.com/intellij-crystal/intellij-crystal

Is pretty meh in regards to full native support like Ruby in Rubymine, but it provides syntax highlighting and that's good enough for me atm.

robacarp · 2018-03-30T15:01:33Z

src/adapter/mysql.cr

+        next unless model.valid?
+        stmt << "("
+        stmt << model.to_h[primary_name].to_s + ", " unless primary_auto
+        stmt << fields.map { |field| model.to_h[field].is_a?(String) ? "'" + model.to_h[field].to_s + "'" : model.to_h[field] }.join(", ")


The adapters now provide a quote method but the functionality is has some bugs.

robacarp · 2018-03-30T15:02:26Z

src/granite_orm/transactions.cr

+    # the array must contain only one model class
+    # invalid model records will be skipped
+    def self.import(model_array)
+      raise ArgumentError.new("Model class mismatch:  expected array of only #{self} models.") unless model_array.all? { |model| model.class == self }


Can this definition be changed so that the compiler will enforce the type constraint instead of a runtime error?

First guess, something like this might work:

def self.import(model_array : Array(self))

Ooo yes! I will do that tonight (still getting used to static typed land :P)

Add type filter to import class method Reformat some code

Blacksmoke16 · 2018-03-30T23:24:56Z

src/adapter/mysql.cr

+        next unless model.valid?
+        stmt << '('
+        stmt << "?," unless primary_auto
+        stmt << Array.new(fields.size, '?').join(',')


The unless primary_auto references above/below here can be removed once #169 gets merged, since fields will cover it by default.

Probably also be able to remove the primary_auto param since that is only reason i needed it.

Blacksmoke16 · 2018-03-31T15:01:41Z

@robacarp What do you think the best way to handle the lifecycle hooks is for this?

Best approach/worth doing/etc?

Handle timestamps Start of specs Ran crystal tool format

Blacksmoke16 · 2018-04-01T00:03:03Z

Am currently having some issues with timestamps and/or the import statement setting timestamps.

https://travis-ci.org/amberframework/granite-orm/builds/360749166#L579

Am not sure what is causing it...have to keep messing with it.

EDIT:
Locally it also breaks the timestamps tests, just by inserting in the records. However, if i do not inert then try and to Model.find x it works fine. It is only when trying to read a value AFTER bulk inserting a record does it blow up.

Error I am getting locally:

spec_1   |  11) mysql timestamps truncates the subsecond parts of updated_at
spec_1   | 
spec_1   |        MySql::ResultSet#read returned a String. A (Int64 | Nil) was expected.
spec_1   |        lib/db/src/db/result_set.cr:0:9 in 'read'
spec_1   |        spec/spec_models.cr:6:1 in 'set_attributes'
spec_1   |        spec/spec_models.cr:6:1 in 'from_sql'
spec_1   |        src/granite_orm/querying.cr:84:13 in 'find_by?'
spec_1   |        src/granite_orm/querying.cr:90:5 in 'find_by'
spec_1   |        src/granite_orm/querying.cr:77:12 in 'find'
spec_1   |        /usr/share/crystal/src/spec/expectations.cr:0:7 in '~proc10Proc(Nil)'
spec_1   |        /usr/share/crystal/src/spec/methods.cr:255:3 in 'it'
spec_1   |        /usr/share/crystal/src/spec/expectations.cr:0:7 in '~proc6Proc(Nil)'
spec_1   |        /usr/share/crystal/src/spec/context.cr:255:3 in 'describe'
spec_1   |        /usr/share/crystal/src/spec/methods.cr:16:5 in 'describe'
spec_1   |        spec/granite_orm/fields/timestamps_spec.cr:4:1 in '__crystal_main'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:11:3 in '_crystal_main'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:112:5 in 'main_user_code'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:101:7 in 'main'
spec_1   |        /usr/share/crystal/src/crystal/main.cr:135:3 in 'main'
spec_1   |        __libc_start_main
spec_1   |        _start
spec_1   |        ???

A list of all the affected test:

spec_1   | crystal spec spec/granite_orm/transactions/import_spec.cr:3 # pg .import should import 3 new objects
spec_1   | crystal spec spec/granite_orm/transactions/import_spec.cr:3 # mysql .import should import 3 new objects
spec_1   | crystal spec spec/granite_orm/transactions/import_spec.cr:3 # sqlite .import should import 3 new objects
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps consistently uses UTC for created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps consistently uses UTC for updated_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps truncates the subsecond parts of created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # pg timestamps truncates the subsecond parts of updated_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps consistently uses UTC for created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps consistently uses UTC for updated_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps truncates the subsecond parts of created_at
spec_1   | crystal spec spec/granite_orm/fields/timestamps_spec.cr:4 # mysql timestamps truncates the subsecond parts of updated_at

robacarp · 2018-04-02T17:02:35Z

@Blacksmoke16

My guess about the error you're getting is that you're not adding the id to the select statement, or that the fields are in the wrong order.
Re: lifecycle events, what did you have in mind? Something other than the before/after create?

Blacksmoke16 · 2018-04-02T17:06:34Z

I'll look into that, thanks!

Probably best to just keep it before_create, run import statement, after_create? Mostly my question is how that would specifically work since each model isn't saved normally so without running the callback in the .each of models_array it wouldn't really be similar to Model.new(xxx).save if that makes sense....

EDIT: To be more clear, should the callbacks be ran for EACH model like before_save, add to insert string, after_create. OR should the callbacks be executed before and after the import statement is ran.

robacarp · 2018-04-02T17:42:51Z

@Blacksmoke16 ah, of course. I think they should run for each record. For example a user object which needs a password hash is not a valid record until the before_create executes.

Proposing a workflow here:

Model.import(array_of_records) is called
before_create is called on each object
every object is inserted
bail if something errored
after_create is called on each object
Model.import returns

@amberframework/granite-orm-contributors thoughts?

robacarp · 2018-04-02T17:44:27Z

@Blacksmoke16 FYI, depending on your personal git workflow, you may need to update your .git/config to point to the corrected granite repository, as of my mistake in #151.

robacarp · 2018-04-02T23:48:45Z

src/adapter/pg.cr

+    end.chomp(',')
+
+    if update_keys = options["on_duplicate_key_update"]?
+      statement += " ON CONFLICT (#{quote(primary_name)}) DO UPDATE SET "


This seems right, but would you mind explaining the behavior in a terse comment for future maintainers? (And me...)

Yea, I plan on updating the READ me with info on how to use everything. But basic this was part of Postgres' 9.5 version:

https://www.postgresql.org/docs/9.5/static/sql-insert.html

It's basic the PostGres version of MySQL ON DUPLICATE KEY UPDATE

From the docs:

INSERT INTO distributors (did, dname) VALUES (5, 'Gizmo Transglobal'), (6, 'Associated Computing, Inc') ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname;

Is basic saying if there is a conflict on the did value, then dname should be updated to the new value that would have otherwise been excluded.

robacarp · 2018-04-02T23:51:06Z

src/adapter/pg.cr

@@ -76,6 +76,47 @@ class Granite::Adapter::Pg < Granite::Adapter::Base
    end
  end

+  def import(table_name : String, primary_name : String, fields, model_array, **options)


your **options can be a little cleaner and have a similar effect:

def import(table_name : String, primary_name : String, fields, model_array : Array(self), *, update_on_duplicate_key = false)

That would mean i would need to have a defined param for each options right?

As of now there is on_duplicate_key_ignore: Bool, on_duplicate_key_update: Array(Strings) to be updated, and possibly in future also validate: true, as giving option to ignore validation might speed up large imports. I don't have strong feelings one way or another.

You can probably have the best of both worlds using a crystal overloaded method signature. If you need, here are the docs on overloading

I don’t have strong opinions on this either. I just noticed a rubyism that might be able to be cleaned up with some crystal.

Blacksmoke16 · 2018-04-04T19:19:40Z

Just a little update on this. Been busy with finals coming up and such, will make some more commits this weekend.

robacarp · 2018-04-07T16:15:53Z

src/granite_orm/transactions.cr

    # The save method will check to see if the primary exists yet. If it does it
    # will call the update method, otherwise it will call the create method.
-    # This will update the timestamps apropriately.
+    # This will update the timestamps appropriately.


robacarp · 2018-04-07T16:17:33Z

src/adapter/mysql.cr

+      update_keys.each do |key|
+        statement += "#{quote(key)}=VALUES(#{quote(key)}), "
+      end
+      statement = statement.chomp(", ")


does mysql not have an ON DUPLICATE IGNORE option?

It does but its https://github.com/amberframework/granite/pull/168/files/4b10e27aaa4d3f76b14e13976899b13232ce0233#diff-ce7945b70352823cf7315115d935f0ccR86

INSERT IGNORE ...

robacarp · 2018-04-07T16:22:24Z

@Blacksmoke16 I don't know because I didn't test this, but I think you'll be able to trigger the callbacks from the class level by using these methods. If it's not straightforward to trigger the callbacks externally, I'm fine pushing the callbacks to a later pull for the import feature. There's no need to get way out into the weeds.

Blacksmoke16 · 2018-04-07T16:26:54Z

I'll poke around with it this weekend and we can go from there. Thanks!

Blacksmoke16 · 2018-04-08T01:39:09Z

@robacarp from my testing I am able to trigger the callback successfully to a degree. Since we are in a class method, the method that is to be called in the model doesn't exist.

in macro '__run_before_create' expanded macro: macro_139699359621456:26, line 2:

   1.       
>  2.          increment
   3.       
   4.     

undefined local variable or method 'increment'

Where increment is the name of my before_create method.

Blacksmoke16 · 2018-04-12T20:07:17Z

README.md

+
+Each model has an `import` class level method to import an array of models in one bulk insert statement.
+```Crystal
+models [


Oops, missed = on these, i'll fix that tonight.

drujensen · 2018-04-22T17:30:43Z

src/granite_orm/transactions.cr

+      end
+    end
+
+    def self.import(model_array : Array(self), update_on_duplicate : Bool, columns : Array(String))


since you would never use update_on_duplicate and ignore_on_duplicate what if we use an enum instead of boolean flags that has options for how to handle duplicates? maybe duplicates: ignore duplicates: update?

Would there be a nice way to also require columns key when using the update option?

drujensen

@Blacksmoke16 did you figure out the callbacks?

Also, what happens when one of the rows fails to import? Do all of them rollback?

Blacksmoke16 · 2018-04-22T20:59:38Z

Sadly no, the issue is trying to run a model's instance method from the class scope. As of now that one __run_before_create method does not have access to them so it cannot run. Possibly can revisit this in another issue, to at least get the feature out.

I would have to test it if you have a specific case, otherwise iirc the whole import fails since it is just one SQL query.

robacarp

Thanks for your work here, I think this is a nice feature to add.

drujensen

💯 Excellent work.

faustinoaq · 2018-05-21T21:55:39Z

I just added Bulk import to amber documentation as well 🎉

https://amberframework.gitbook.io/amber/guides/models/granite/bulk-insertions

Start of bulk-import method

814de94

Blacksmoke16 commented Mar 30, 2018

View reviewed changes

robacarp reviewed Mar 30, 2018

View reviewed changes

faustinoaq added pr:needs-review pr:wip and removed pr:needs-review labels Mar 30, 2018

faustinoaq added this to To Do in Framework 2018 via automation Mar 30, 2018

faustinoaq moved this from To Do to In progress in Framework 2018 Mar 30, 2018

Use parameterized statement

d1b2134

Add type filter to import class method Reformat some code

Blacksmoke16 commented Mar 30, 2018

View reviewed changes

Blacksmoke16 added 4 commits March 30, 2018 23:15

Handle duplicate key IGNORE/UPDATE

93b6e67

Add commas between update values

6435d63

merge in the fields unity code

85c36a0

Remove now undeeded primary_auto variable

73e7de5

Blacksmoke16 added 3 commits March 31, 2018 15:20

Add import for pg adapter

1a03145

Also quote other column key

54dd5b7

Add adapter for sqlite

c74d126

Handle timestamps Start of specs Ran crystal tool format

robacarp reviewed Apr 2, 2018

View reviewed changes

faustinoaq and others added 4 commits April 4, 2018 16:28

Merge branch 'master' into bulk-import

e0e1be4

Fix spec

84364d9

Add more tests, fix sqlite adapter

2d15769

Update readme

bbd2f7f

robacarp reviewed Apr 7, 2018

View reviewed changes

Blacksmoke16 added 2 commits April 7, 2018 13:24

Fix indentations

87268a0

Add overload methods for update/ignore_on_duplicate, update readme

f6dcf2a

Blacksmoke16 changed the title ~~[WIP] Bulk import~~ Bulk import Apr 12, 2018

Blacksmoke16 commented Apr 12, 2018

View reviewed changes

Add = sign to model arrays in readme

e3e525a

robacarp requested a review from a team April 12, 2018 22:25

faustinoaq added the pr:needs-review label Apr 13, 2018

eliasjpr mentioned this pull request Apr 19, 2018

Should Granite have a built in Migrator? #182

Closed

faustinoaq removed the pr:wip label Apr 19, 2018

Merge branch 'master' into bulk-import

371d0b4

drujensen reviewed Apr 22, 2018

View reviewed changes

robacarp approved these changes Apr 23, 2018

View reviewed changes

Blacksmoke16 and others added 3 commits April 25, 2018 00:15

Specify imports do not trigger callbacks yet

9fe4bcb

proper grammar

5b31b80

Merge branch 'master' into bulk-import

027be38

drujensen approved these changes Apr 26, 2018

View reviewed changes

drujensen removed the pr:needs-review label Apr 26, 2018

drujensen merged commit 644491e into amberframework:master Apr 26, 2018

Framework 2018 automation moved this from In progress to Done Apr 26, 2018

This was referenced Apr 26, 2018

Ability to bulk import array of models #156

Closed

Support callbacks on bulk imports #185

Closed

Blacksmoke16 deleted the bulk-import branch May 9, 2018 04:34

Bulk import #168

Bulk import #168

Conversation

Blacksmoke16 commented Mar 30, 2018 • edited Loading

Progress

Questions:

Blacksmoke16 Mar 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blacksmoke16 Mar 30, 2018 • edited Loading

Choose a reason for hiding this comment

robacarp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blacksmoke16 Mar 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blacksmoke16 Mar 30, 2018 • edited Loading

Choose a reason for hiding this comment

Blacksmoke16 commented Mar 31, 2018

Blacksmoke16 commented Apr 1, 2018 • edited Loading

robacarp commented Apr 2, 2018

Blacksmoke16 commented Apr 2, 2018 • edited Loading

robacarp commented Apr 2, 2018

robacarp commented Apr 2, 2018

robacarp Apr 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blacksmoke16 Apr 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robacarp Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Blacksmoke16 commented Apr 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robacarp commented Apr 7, 2018

Blacksmoke16 commented Apr 7, 2018

Blacksmoke16 commented Apr 8, 2018

Choose a reason for hiding this comment

drujensen Apr 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drujensen left a comment

Choose a reason for hiding this comment

Blacksmoke16 commented Apr 22, 2018

robacarp left a comment

Choose a reason for hiding this comment

drujensen left a comment

Choose a reason for hiding this comment

faustinoaq commented May 21, 2018

Blacksmoke16 commented Mar 30, 2018 •

edited

Loading

Blacksmoke16 Mar 30, 2018 •

edited

Loading

Blacksmoke16 Mar 30, 2018 •

edited

Loading

Blacksmoke16 Mar 30, 2018 •

edited

Loading

Blacksmoke16 Mar 30, 2018 •

edited

Loading

Blacksmoke16 commented Apr 1, 2018 •

edited

Loading

Blacksmoke16 commented Apr 2, 2018 •

edited

Loading

robacarp Apr 2, 2018 •

edited

Loading

Blacksmoke16 Apr 2, 2018 •

edited

Loading

robacarp Apr 3, 2018 •

edited

Loading

drujensen Apr 22, 2018 •

edited

Loading