New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I do not think we can continue to return `Iterator` as a result of our queries. What should we return instead? #132

Closed
sgrif opened this Issue Jan 24, 2016 · 4 comments

Comments

Projects
None yet
2 participants
@sgrif
Member

sgrif commented Jan 24, 2016

In 0.4 we return a Cursor<ST, T> which is an Iterator<Item=T>. Right now, 0.5 has changed this to be Box<Iterator<Item=T>> to somewhat insulate us against future breakage. However, I do not think this will be sufficient in the long term. There are two primary reasons for this.

The first is that deserialization can technically fail. Right now we just panic if that ever happens, as there's no "reasonable" time this can happen (off the top of my head, reading null into a non-option, integer types returning fewer bytes than expected, and strings containing nul bytes are the only case). However, when we add support for other backends, this becomes more "reasonable" (the lack of a proper boolean type is the most obvious case). If we eagerly go through deserialization, we can just bubble up the error rather than panicking.

The most important case though is that we just can't return an iterator when we're dealing with associations. Let's say you want some users, and all of their posts, which would be the type Iterator<Item=(User, Vec<Post>)>. The problem is that we can't assume the order of the resulting rows, so they can come back like this:

[
    (sean, Some(post1)),
    (tess, Some(post2)),
    (sean, Some(post3)),
    (jim, None),
]

Since we can't assume that the posts belonging to a user are grouped, we actually need to iterate through the entire result set just to return the first value for #next. Since an exponential time algorithm is obviously undesirable, we would certainly want to iterate once and group up everything in a single pass. While we could still return an iterator in that case, it seems wasteful to do that if we're just allocating anyway.

So we need to return a concrete type with the data allocated (on the heap). The question is what do we return? The simplest answer is that we just return a Vec. The more I think about it, the more that seems reasonable, as the only other type you might want is some kind of Map, but you wouldn't actually have anything to use as the key so you can't actually do anything other than iterate over it. We'll probably end up using a Map of some sort during construction if we're dealing with one to many associations, but we could certainly store the results in a Vec and just use the map to keep the indices.

I'm interested to hear what others have to say on the issue. Are there types that people want us to return other than Vec<T> as the alternative to Box<Iterator<Item=T>>? Is there something I'm missing that would make returning an Iterator reasonable?

Whatever we do, I'd like to do it sooner rather than later. Even if we're just changing from Box<Iterator> to Vec, basically all existing code will break as we're removing the collect call. It's a pretty mechanical change, but I'd like to just rip off the bandaid there as quickly as possible.

So actually here's another way to phrase the question: If we accept that we need to return something other than Box<Iterator> (if the thing we return isn't Vec, it'll probably have some trait like FromQueryResult<T>), would returning Vec<T> and then changing to something more general cause significant breakage?

@sgrif sgrif added this to the 0.5 milestone Jan 24, 2016

@bradurani

This comment has been minimized.

bradurani commented Jan 25, 2016

It seems like in the future you might want to return more than just the data returned by the query. What if you have query timing turned on for instance, and want to return the execution time along with the results?

@bradurani

This comment has been minimized.

bradurani commented Jan 25, 2016

I'm still very new to Rust, so forgive me if I'm way off base, but returning Vec<T> requires you to do the grouping before reading the first result, which means you're doing extra work in the odd case that the query results are never used, right? With Box<Iterator<Item=T>>, you can group the results only after the first .next() and be lazier, yet still avoid the exponential algorithm, no?

@sgrif

This comment has been minimized.

Member

sgrif commented Jan 25, 2016

Given that you explicitly need to call a method to execute the query, I'm not super concerned with the case of "I executed SELECT * FROM users, but then decided I don't want the results". We've already paid the bigger costs of round tripping to the database.

@sgrif

This comment has been minimized.

Member

sgrif commented Feb 5, 2016

Fixed by #142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment