Basic implementation of :streaming #223

Merged
merged 13 commits into from Feb 14, 2012

Projects

None yet

7 participants

@benmcredmond

A simple implementation of :streaming using mysql_use_result rather than mysql_store_result. Been working pretty well for me so far.

@brianmario brianmario and 1 other commented on an outdated diff Nov 10, 2011
ext/mysql2/result.c
}
- if (cacheRows && wrapper->lastRowProcessed == wrapper->numberOfRows) {
- // we've already read the entire dataset from the C result into our
- // internal array. Lets hand that over to the user since it's ready to go
- for (i = 0; i < wrapper->numberOfRows; i++) {
- rb_yield(rb_ary_entry(wrapper->rows, i));
+ if (streaming) {
+ if(!wrapper->streamingComplete) {
+ VALUE row;
+
+ do {
+ row = rb_mysql_result_fetch_row(self, db_timezone, app_timezone, symbolizeKeys, asArray, castBool, cast);
@brianmario
brianmario Nov 10, 2011

would you mind indenting the stuff inside the do block?

@benmcredmond
benmcredmond Nov 10, 2011

Argh, sorry, vim has been acting up for me today on the indentation front. Will fix this and below.

@brianmario brianmario commented on an outdated diff Nov 10, 2011
ext/mysql2/result.c
VALUE row;
if (cacheRows && i < rowsProcessed) {
row = rb_ary_entry(wrapper->rows, i);
} else {
row = rb_mysql_result_fetch_row(self, db_timezone, app_timezone, symbolizeKeys, asArray, castBool, cast);
if (cacheRows) {
- rb_ary_store(wrapper->rows, i, row);
+ rb_ary_store(wrapper->rows, i, row);
@brianmario
brianmario Nov 10, 2011

also indent here?

@brianmario
Owner

Initial review looks good!

We should figure out a way to test this. Also, if each is interrupted before iterating over all the results and the user tries to iterate again - what should happens?

@benmcredmond

Yeah that's an awkward one. Currently, it'll pick up where it left off — which isn't ideal but there aren't really any other options. We can't throw an exception because there needs to be a way for the user to finish iterating through the results (if they want to use this connection again). Thoughts?

@brianmario brianmario and 2 others commented on an outdated diff Nov 10, 2011
@@ -212,6 +212,21 @@ This is especially helpful since it saves the cost of creating the row in Ruby i
If you only plan on using each row once, then it's much more efficient to disable this behavior by setting the `:cache_rows` option to false.
This would be helpful if you wanted to iterate over the results in a streaming manner. Meaning the GC would cleanup rows you don't need anymore as you're iterating over the result set.
+### Streaming
+
+`Mysql2::Client` can optionally only fetch rows from the server on demand by setting `:streaming => true`. This is handy when handling very large result sets which might not fit in memory on the client.
+
+``` ruby
+result = client.query("SELECT * FROM really_big_Table", :streaming => true)
@brianmario
brianmario Nov 10, 2011

Not a huge deal by any means, but I kinda feel like :stream => true seems like a better name for this? Waddya think?

@benmcredmond
benmcredmond Nov 10, 2011

Yup, I prefer that too.

@dmitriy-kiriyenko
dmitriy-kiriyenko Dec 2, 2011

Yeah. :stream => true looks better for me.

Ben McRedmond added some commits Nov 10, 2011
@jeshuaborges

I cannot wait to use this.

@brianmario
Owner

Getting a spec failure from "should cache previously yielded results by default" was it passing for you?

@dmitriy-kiriyenko

=)
Never use git add .

@mksm

Awesome! Goodbye mysql gem! :D

@benmcredmond

"should cache previously yielded results by default" is failing for me too, Brian. Weird bug, going to look into it now.

@benmcredmond

Fixed spec issue, they all run for me now. Any ideas on how to test this, Brian? (Or anyone?)

@brianmario
Owner

Thanks for fixing that up.

For testing, maybe we can start by doing something similar to the cache rows tests (checking object ids and such). I'll try and think about how we can more specifically test the fact that we're "streaming".

benmcredmond added some commits Dec 10, 2011
@benmcredmond benmcredmond Added some tests 07261d9
@benmcredmond benmcredmond Moved/tidied tests. Bug fix in client.c
The bug fix in client.c is less than ideal but there isn't really
another option without some rearchitecting. Basically, wrapper->active
should not be set to 0 when mysql_use_result is being used until we
have iterated through the full result set (as the wrpaper is infact
still active). Unfortunately, we would need to change this within
Mysql2::Result, which isn't possible at the moment unless we pass the
client to the result object (which seems overkill). So, for now, the
query will fail with a "commands out of sync" error.

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	modified:   ext/mysql2/client.c
#	modified:   spec/mysql2/client_spec.rb
#	modified:   spec/mysql2/result_spec.rb
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	Gemfile.lock
d40bc00
@benmcredmond

I've added some tests for this.

"should yield different value for #first if streaming" and "should yield the same value for #first if not streaming": kind of test that we are streaming, I think they are good for now.

The rest of the tests just make sure that streaming is working properly (and helped me find a bug!).

@capotej

+1, this would be awesome to use

@seamusabshere

This is great to see! @brianmario, what are your plans for it?

@brianmario
Owner

I just need to test it some more, definitely want to merge this soon

@benmcredmond

Brian, any update on this?

@brianmario brianmario merged commit e512ab9 into brianmario:master Feb 14, 2012
@brianmario
Owner

Sorry, been crazy busy. I'll do my best to give it another round of testing then push out another release as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment