Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insert_batch error when array is too large #2680

Closed
fzhan opened this issue Oct 10, 2013 · 10 comments
Closed

insert_batch error when array is too large #2680

fzhan opened this issue Oct 10, 2013 · 10 comments

Comments

@fzhan
Copy link

fzhan commented Oct 10, 2013

I sometimes need to insert over 5000 items into the db, using insert_batch will overload the maximum string length allowed by preg_match. In this case I can only write a joint query to go around it.

The problem is, I can't tell when to use joint query or insert_batch, because array size does not necessary means total length allowed by preg_match.

Is there a way insert_batch and update_batch can handle large array? or is there a limit that we know of and could run insert_batch and update_batch by trunks.

@mattduran420
Copy link

http://www.karlrixon.co.uk/writing/php-regular-expression-fails-silently-on-long-strings/

So if you read this it looks like there is a cap on how much data that the php regex engine can handle. There is a setting in the php config file that allows you to raise the cap. Try playing with that and see if it works. The link I included explains it further.

@fzhan
Copy link
Author

fzhan commented Oct 10, 2013

@mduran16 Thanks, still I think it should be determined by the insert_batch function.

Plus, it's not smart to insert 5000 arrays at once just by dumping the array into SQL and parse the query using regex one by one, I do think there should be another way of doing that.

@mattduran420
Copy link

Well you can just iterate through each array and insert them individually, but I believe the objective of the batch functions is to minimize the amount of queries needed.

@fzhan
Copy link
Author

fzhan commented Oct 10, 2013

@mduran16 You are right, and I think we both know that for over 5000 inserts:

  1. one line plain query is the fastest
  2. insert_batch which generates the one line query comes second
  3. iteration which insert one by one is the slowest

Using 3 as a way around 2 puts this function in a awkward state - why do we need it?

@mattduran420
Copy link

@fzhan so the batch insert /update use regex to process the arrays. That's fine and dandy as they are completing #1. The problem with regex is that php caps it by default. This is for a few reasons, mainly if it weren't to be capped someone could wreak some havoc with a 1gb txt file. So there isn't anything wrong with the batch functions, they are just limited by the php config, which can be edited as needed.

Now with iterating and single queries, you have more control over things such has error handling. For example if you were running +10k queries and you wanted to see if any failed so that you could observe a pattern. That's something that can't be done with batch.

@fzhan
Copy link
Author

fzhan commented Oct 10, 2013

@mduran16 I agree with you. I've updated the algorithm and php caps.

The thing is, all the error handling were done before batch process, and that the batch process itself throws php error is another case. I think we should leave this issue open, when the error handling (regex) is done, or a better solution is provided.

@aanbar
Copy link
Contributor

aanbar commented Oct 11, 2013

Using InnoDB as your storage engine (Speaking MySQL), It allows you to do transaction so you can iterate over 10,000 rows then do a commit & it would be fast enough + you won't get these errors.

@fzhan
Copy link
Author

fzhan commented Oct 12, 2013

Hi @aanbar , thanks for the tips, I am using InnoDB, but not using transactions. I will look into this option. Cheers,

@narfbg
Copy link
Contributor

narfbg commented Oct 14, 2013

Both insert_batch() and update_batch() split the data into chunks and only handle 100 rows at a time. For example, if you pass an array with 420 elements to insert_batch() then 5 separate inserts would be executed. Unfortunately, this is the only limit that we can put into it as there's practically no way to see the actual data size in bytes, nor the limit that PCRE or the database itself have for a single ... eh, operation.

In other words, there's nothing that we can do about this + such large amounts of data should be imported separately, via LOAD DATA INFILE or whatever import mechanism your database has.

That being said, you should ask for more help on the forums or IRC, we only handle bug reports in here.

@narfbg narfbg closed this as completed Oct 14, 2013
@fzhan
Copy link
Author

fzhan commented Oct 18, 2013

@narfbg Okay, got it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants