Easier batching of bulk inserts #710
Replies: 6 comments 4 replies
-
Hey @SebastienGllmt, thanks for the suggestions! I think we can have a Basically, it will perform... for you for chunk in inserts.chunks(chunk_size) {
Self::insert_many(chunk.to_vec()).exec(&db).await?;
} Regarding parameters limit with number of column take into account, it's db dependent and it should be provided by the user. User can substitute it in the batch size just like what you have done. PR are welcomed! :P |
Beta Was this translation helpful? Give feedback.
-
I imagine such an API to be like PR welcomed. Edit: the |
Beta Was this translation helpful? Give feedback.
-
I ran into the same problem this afternoon. I'm considering just going with a less performant approach and just chunking on a low number but I thought I'd check in here to see if anyone's made any progress toward a more elegant solution |
Beta Was this translation helpful? Give feedback.
-
I don't consider the suggested solutions as performant. When thinking about batch inserts remember that inserting 20k rows can be as fast as inserting 10k rows. Hence you don't want to waste time executing the query in relatively small chunks. I think this problem would not occur if seaorm translates batch inserts to an insert with unnest and array.
|
Beta Was this translation helpful? Give feedback.
-
Using |
Beta Was this translation helpful? Give feedback.
-
One edge case which needs careful consideration in case of any function using internal chunking: What happens if the call is not run in a transaction but is split into chunks? a. Should an implicit transaction be started to keep up the behavior that inserts normally all succeed or all fail? Not an issue with PG if PG uses the array(s) + unnest approach. |
Beta Was this translation helpful? Give feedback.
-
Summary
Postgres only allows a maximum of u16::MAX input parameters. If you exceed this amount, you will get an error in sqlx. Getting around this limitation while using entities is kind of ugly
Motivation
sea-orm provides a function on entities called
insert_many
such as the the following exampleThis insertion requires two SQL parameters because insert_many uses the
VALUES
syntaxThis works fine up until you hit the u16 limit of parameters. The naive workaround for the Postgres limit would be to do something like this
However, this won't work because we have to divide by the number of columns!
That means the proper solution (as far as I can tell) ends up being
This is already kind of ugly for a one-time patch in your codebase, but having this kind of chunk logic all over the place makes things even worse
It would be nice if insert_many either handled this for you or if there was some other utility function for this.
Beta Was this translation helpful? Give feedback.
All reactions