-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk inserts are tricky to do efficiently #59
Comments
|
@blackbeam Thanks for the detailed reply:
It looks like at the protocol level, the MySQL server receives the query with the filename, then sends back a request to the client asking for the contents of that file, which the client then sends to the server. Based on this it seems the client could actually send data from memory rather than a file, which opens up a lot of possibilities.
|
For reference, I'm inserting ~2 million rows into one table with fairly large rows, and ~10 million rows into another table which is much more lightweight. Currently it takes around ~2 hours to insert all the data, even using a batch size of 512. (yep, discovered that utf-8 bug ~30 mins into the import 😒 ) and despite the large quantity of data I don't think it should take that long. |
@Diggsey have you tried things like |
@0xpr03 I solved this using |
Sorry, do you mind showing the example of your LocalInfileHandler that you implemented? I am lost as to what traits the handler should be implementing. |
@hpca01 there's an example in the documentation: https://docs.rs/mysql/18.0.0/mysql/struct.LocalInfileHandler.html |
@Diggsey I apologize as I am still new to the language, I got the whole thing to work by doing the following:
I wanted to re-write it to exclude the closure and have a concrete struct that implements whatever trait is required to make it easier for the next person. |
The trait that's required is I'm not really sure what you hope to make easier by moving it to a struct, but |
@Diggsey thanks so much for the clear explanation 😄 I will just leave it the way it is, and write some detailed comments instead. Thanks again for your patience, and your work on the crate. |
Hey @blackbeam! I have such use case where I need to insert (or update) thousands of rows per second, so my only real option (as far as I know) is your 2nd one, handcrafting the query with I've tried to use your pseudorust code with the latest 20.0.1 version, however, I get an error:
For objects I've used Is there a way to fix my issue? Will there be a "native" option in the library to do these kind of operations easily later? |
@orangesoup, hi. |
Thank, will check! |
Thanks @blackbeam for sharing the code snippet (the second option). I found a tiny issue with the code, stmt.pop(); right before line let mut params = Vec::new(); Other than that the bulk insert code works quite nicely. |
@midnightcodr, I've updated the snippet. Thanks. |
@blackbeam 2nd option's pub fn bulk_insert<F, P, T>(
pool: &crate::Pool,
table: String,
cols: Vec<String>,
objects: Vec<T>,
fun: F,
) -> crate::Result<()>
where
F: Fn(&T) -> P,
P: Into<Params>,
... By making such update, |
I had to implement some kind of batching myself to improve performance. This involved building up a query string over time, something like
INSERT INTO table (fields...) VALUES (?, ?, ?), (?, ?, ?), (?, ?, ?), ...
The query builder is not particularly well suited to this, and building up the query via string concatenation seems relatively inefficient. Also I could only support positional parameters, since the name parsing code is not exposed (which means I couldn't use
Params::make_positional
).I don't know if there's a more efficient way to do this at the protocol level?
At the very least I think one or more of the following would be useful:
The text was updated successfully, but these errors were encountered: