Iterator#next #3

Closed
Raynos opened this Issue Jan 20, 2013 · 21 comments

Comments

Projects
None yet
4 participants
Owner

Raynos commented Jan 20, 2013

Can it be one callback?

Say instead of end pass null, null, null to the next callback.

This way we won't have a "either one or the other callback gets called" situation.

Owner

juliangruber commented Jan 20, 2013

Or have #next and #end but #next doesn't get the endCallback

Owner

Raynos commented Jan 20, 2013

@juliangruber #end is like .close not like .on("end"

Owner

juliangruber commented Jan 20, 2013

Then #end and #next are too confusing. Maybe the naming should change? What about #dispose instead of #end?

And #next always gets null as the err argument, that's not cool.

Owner

Raynos commented Jan 20, 2013

Or #close

var it = createIterator(db, { ... })

it.next(function (err, key, value) {
  if (err) {
    // this iterator is broken
  }

  if (key === null) {
    // ended
  } else {
    // do stuff
    it.close(function (err) {
      // closed?
    })
  }
})
Owner

juliangruber commented Jan 20, 2013

Then I'd leave it as it is, doing the !key check isn't better than an extra function.

Owner

rvagg commented Jan 20, 2013

We could pass in an instance of EventEmitter to the binding and just use that, emit 'data' and 'end' events (and 'error' when we properly handle error situations). Then next() can be argument-less.
Or is that a bit too abstract from the actual implementation? The double-callback is there because it was easy, but as a public API it's pure code-smell and I'd like to see a nicer choice.

Owner

rvagg commented Jan 20, 2013

Also, I should mention, that status isn't actually handled (yet), so while we collect it from an iterator->status() we don't use it; so, nextCallback() should have an active first argument in case of error, we just don't use it currently.

Owner

Raynos commented Jan 20, 2013

@juliangruber the !key check makes it more deterministic.

@rvagg I like mongo's cursor as it's a pull model. ( http://mongodb.github.com/node-mongodb-native/api-generated/cursor.html#nextobject )

it.next(function (err, result) {
  // read a value from it.
  if (result === null) { /* then no more */ }
  else { /* do stuff with result.value and result.key */ }
})

// close it we are done
it.close();

I think 'data' events are massive smell because that's push and that's now how iterators should work.

Owner

rvagg commented Jan 20, 2013

yeah, fair enough, result===null would work fine.

Owner

dominictarr commented Jan 21, 2013

I like this suggestion. Also, I agree dispose is better than end. Since it means you want to throw the iterator in the garbage.

The idea objective here should be to expose a minimum api,
that will not get in your way if you want to implement a classic stream, or a streams2 stream on top.
It should just be as close to the underlying api as possible (but of course, changing blocking calls to async calls with callbacks)

I am strongly in favor of having just one callback,
because having two suggests that possibly, both are called.

A callback must be called exactly once.
An event can be emitted 0 or more times.

One callback is explicit, it says either: "here is the data", or "the range has ended".

three questions:

  1. is it possible to know you are on the last item without calling next again?
  2. previously we discussed retrieving a bunch of items at once, before calling back to js, as in leveled. Is this still a good idea? @juliangruber ?
  3. do we need limit in the binding? if that doesn't enable some optimization, we should leave that in levelUP.
Owner

rvagg commented Jan 21, 2013

+1 on all counts.

  1. no you can't know you're at the last item without calling next again
  2. it is a good idea to batch fetch many items, but there are 2 cases: (1) get all entries from an iterator and return it in an array (or some other structure) and (2) a traditional multi-get where you know the keys ahead of time. Being able to make a single call through the JS/C++ barrier for multiple items in either case would be a good feature even for a low-level lib like this.
  3. perhaps we don't need limit but we may want it if we want to fetch all keys for a given iterator (as above).
Owner

Raynos commented Jan 21, 2013

@rvagg if we're going to do multiple get's it should be next(n, cb) where n is bytes of data to fetch or number of keys to fetch.

I don't think we need limit, just stop calling next if we have enough.

Owner

dominictarr commented Jan 21, 2013

I agree with @Raynos (a strange feeling :)

Yes this my next question,
if there is a feature for getting a bunch of keys at once (bytes doesn't make sense)

then how is that passed to the user?

how about:

var i = db.iterator()
i.next(N = 10, function (err, array) {
  //[{key: k, value: v}, ...]
  //note the array may not have 10 items in it, so check the length.
  i.dispose() //if we only wanted the first 10 items!
})

if(array.length < N) then it must be the last group, so therefore,
if next is called when the iterator has finished, it should return an empty array.

maybe: if you don't pass N we get (err, key, value)

var i = db.iterator()
i.next(function (err, key, value) {
  //There is probably a reasonable case for this if creating an array and object per item is noticeable overhead.
  i.dispose() //if we only wanted the first item.
})

and if you want the entire range? what about:

i.next(true, function (err, array) {...})

Math.MAX_VALUE or Infinity is probably gonna be awkward when it gets to C.
I figure this will end up in C as an overloaded function (iirc), so a different type is better.

or, alias to all

i.all(function (err, array) {...})
Owner

rvagg commented Jan 21, 2013

perhaps avoid the iterator interface for these multi-gets even when it uses an iterator internally?

// plan multi-get that uses standard `get()` operations
db.mget([ 'foo', 'bar', 'baz' ], function (err, valuesArray) {})
// → [ 'v1', 'v2', 'v3' ]

// standard iterator options, uses an iterator to get the values
db.mget({ start: 'foo', end: 'baz' }, function (err, entriesArray) {})
// → [ { key: 'foo', value: 'v1' }, { key: 'bar', value: 'v2' }, { key: 'baz', value: 'v3' } ] 

Or perhaps use the key/value entries array type for both operation types?

Or does that kind of overloading a bit too much?

Owner

Raynos commented Jan 21, 2013

i.all is bad. I have no good use-case for it. i.next(N I have a use-case for which is efficient chunked reading.

As for having a nice api.

i.next(10, function (err, list) {
  assert.equal(list.length, 10)

  i.next(function (err, result) {
    assert.ok(result.value)
    assert.ok(result.key)
  })

  i.next(1, function (err, list) {
    assert.equal(list.length, 1)
  })

  i.next(0, function (err, list) {
    assert.equal(list.length, 0)
  })

  i.next(-1, function (err, list) {
    assert.equal(err.message, "negative number invalid")
  })

  i.next(100, function (err, list) {
    assert.equal(list.length, 32); // ended

    i.next(function (err, result) {
      assert.equal(result, null)
    })

    i.next(10, function (err, list) {
      assert.equal(list.length, 0)

      i.dispose(function () {
        i.next(function (err, result) {
          assert.equal(err.message, "cannot read from closed iterator")
        })
      })
    })
  })
})
Owner

Raynos commented Jan 21, 2013

multi get and reading 10 chunks from an iterator in one go are COMPLETELY different things.

The latter is an optimization for efficient iterating over an iterator

The former is sugar for doing many asynchronous reads in one go and should go away into a higher level library.

I think reading all the chunks of an iterator in one go is silly and goes against low memory footprint of streaming apis. Might as well add .getAll() to fetch the entire leveldb into memory.

Owner

dominictarr commented Jan 21, 2013

I have a few cases where I want to read all the elements in a range, and can't act until I have them all, so I don't need streaming - as @juliangruber says, it's expensive to go from c-js so I'd end up calling it wil i.next(LARGE_NUMBER, cb)

for the same reason mget also a c-js optimization. relying on the order of keys to identify the values will be bad, because it means you will have to return them in the same order - it would free up the implementation more to just return the [{key: k, value:v}, ...] style, also that is internally consistent.

Owner

dominictarr commented Jan 21, 2013

@Raynos you don't like i.next(err, key, value) {...}) ?

Also, what should the defined behavior be if i.next is called in parallel like @Raynos has written.
it should probably not be supported I think. I'd want it either to queue (too highlevel), or throw.

Owner

Raynos commented Jan 21, 2013

@dominictarr I like callbacks that do (err, result) because they are easier to do meta things on. Like execute which aggregates all the result values into a list or hash.

Multiple next's on the same iterator in parallel should throw because it ruins the order guarantee and queuing is too complex.

I think mget is a good idea but should be a separate issue for discussion.

@dominictarr LARGE_NUMBER is what you want. using all naively allows you to load 200GB of disk data into memory which crashes your process.

Owner

dominictarr commented Jan 21, 2013

Good point. Okay, so all is a bikeshed, I don't really care how we do this, as long as it's possible.

leveldb is like The Hole Haug if you use it niavely you are gonna hurt your self.

see this essay http://artlung.com/smorgasborg/C_R_Y_P_T_O_N_O_M_I_C_O_N.shtml

dominictarr referenced this issue in nodejs/node-v0.x-archive Feb 25, 2013

Closed

stream: Do not switch to objectMode implicitly #4835

Owner

rvagg commented Mar 4, 2013

done, single callback it.next(cb) where null data means end.

rvagg closed this Mar 4, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment