Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paging support? #23

Closed
BrianVallelunga opened this issue Mar 17, 2020 · 13 comments
Closed

Paging support? #23

BrianVallelunga opened this issue Mar 17, 2020 · 13 comments

Comments

@BrianVallelunga
Copy link

This looks like a great new library that I'd love to use in place of my own hacked together wrapper. Are you expecting the caller to do the paging, or should that be included when returning a query result? I'm not sure if ofAsyncEnum does that or not, but the code doesn't seem to check .HasMoreResults.

@aaronpowell
Copy link
Owner

I'm going to admit that I'm still wrapping my head around how the IAsyncEnumerable and AsyncSeq work, given we don't have the await foreach in F# that C# would tend to unpack.

Because of that, I'm not completely sure that AsyncSeq understands pagination the same way.

Do you have an example of what you're trying to do?

@BrianVallelunga
Copy link
Author

BrianVallelunga commented Mar 18, 2020

What I'm doing right now (with the v3 SDK) is this:

let public fetchAllItems(feedIterator: FeedIterator<'a>) =
    asyncSeq {
        while feedIterator.HasMoreResults do
            let! response = feedIterator.ReadNextAsync() |> Async.AwaitTask
            yield! response |> AsyncSeq.ofSeq
    }

This returns an AsyncSeq<'a> and seems to work, but I have my doubts if it is the "right" way. I believe someone helped me with the final yield! line.

@BrianVallelunga
Copy link
Author

BrianVallelunga commented Mar 18, 2020

For a bit more explanation, response is a FeedResponse<'a> which implements IEnumerable<'a>. response |> AsyncSeq.ofSeq then gives AsyncSeq<'a> which is then merged into the parent sequence via yield!.

See: https://theburningmonk.com/2011/09/fsharp-yield-vs-yield/

@aaronpowell
Copy link
Owner

Looking into the v3 SDK source and comparing it to v4, it looks like it works a bit differently. In v4 there isn't the FeedIterator<'T> that v3 used, instead it's AsyncPagable<'T>.

Now, digging through this a bit further it turns out that this is really a wrapper over FeedIterator and hides away the paging unless you call AsyncPagable.AsPages(), in which you provide the size of the pages you need.

So, I probably would have to have execPagedAsync where you can provide the right info and that could return it as an AsyncSeq then, but I'll have to play (trying to work out how to handle the Insert API presently).

@seankearon
Copy link

seankearon commented Mar 18, 2020

Could you not just use OFFSET and LIMIT in your query "SQL" and remember what page index you're on? Or have I missed something here?

Edit: I believe that's the official CosmosDB approach for paging.

@BrianVallelunga
Copy link
Author

@seankearon I don't think these two types of paging are the same thing, though I may be wrong. The type I'm referring to is more akin to batching. The Cosmos SDK client won't return everything all at once. You have to continually call it to get the next batch of data. I'll take a look at what's in V4 when I get a chance.

@seankearon
Copy link

Yeah, I'm wondering where the difference is between those two.

If an async stream is like using a drinking straw to drink from a lake. Then paging/batching is like using a bucket to drink from a lake.

If I'm passing you the bucket to drink from, do you care whether I fill it all at once or in little steps using my drinking straw? Probably not - you just want the next bucket of water.

I'm thinking that the way to fill up bucket n using the straw would be something like this:

let usersByPage(page: Int32) =
    host
    |> Cosmos.host
    |> Cosmos.connect key
    |> Cosmos.database "UserDb"
    |> Cosmos.container |> "UserContainer"
    |> Cosmos.query "SELECT u.FirstName, u.LastName FROM u WHERE u.LastName = @name OFFSET xyz LIMIT pqr"
    |> etc. etc. etc.
    |> Cosmos.execAsync<User>

But then, it's been a loooong day! :)

@aaronpowell
Copy link
Owner

I've finally had some time to come back and revisit this issue and work out if it's possible/viable to do pagination support.

TL;DR: Use the OFFSET and LIMIT as @seankearon has suggested, I don't think I can put anything into the API to do it for you. Best I can do it have a way to get batched results per iteration of the AsyncSeq.

We're going to dive through a rabbit hole now, so choose if you want to read on as I'm partially writing this down for my own sake. I'm going to trace through a bunch of the Azure.Cosmos code as it currently stands.

When you execute a GET query you call the method GetItemQueryResultsAsync (_Note: The Async suffix is added after -preview4, so my code doesn't use it, but it will eventually) and this creates a FeedIteratorCore which is what handles the ReadNextAsync operation to fetch records from CosmosDB.

This type is then wrapped in FuncAsyncPagable to return the AsyncPagable<Page<T>> response that is consumed by AsyncSeq in F# to make our nice API.

AsyncPagable, the base class of FuncAsyncPagable has the AsPages and MoveNext methods defined on it, MoveNext being what is ultimately called by the state machine to go over the iterator (it bubbles through a few other types, but it's ultimately where we land). What's interesting about the implementation is that it actually called AsPages anyway, so the AsPages method is our important one.

Our AsPages calls the func passed in here which is a call to GetPagesAsync on PageIteratorCore.

Now, if we trace through here, AsPages takes a continuationToken and pageHitSize, but GetPagesAsync on our iterator only takes the continuationToken, the pageHitSize is dropped along the way. My guess is that the pageHitSize doesn't map to anything that is available on the CosmosDB REST API, so it can't be used and is discarded in turn.

So, what's the difference between iterating over the AsyncPageable<T> and IAsyncEnumerator<Page<T>> (how it currently works vs calling .AsPages)? Whether or not you get a single result or a batch of results. Page<T> has a Values property which will contain 100 T items that you need to unpack. This means it comes down to "do you want to work with a single result each iteration, or with a batch of items?" (no where can I find exposed a "HasMore" property, that's just determined by whether you keep iterating).

I tested this against a large Cosmos store I have with the following code:

async Task Main()
{
	var client = new CosmosClient("...");
	
	var container = client.GetDatabase("...").GetContainer("...");
	
	var qd = new QueryDefinition("SELECT c.id FROM c");

	var nonCount = 0;
	"Non-paged query".Dump();
	await foreach (var response in container.GetItemQueryIterator<Dictionary<string, string>>(qd))
	{
		nonCount++;
	}

	"Paged query".Dump();
	var pageCount = 0;
	await foreach (var response in container.GetItemQueryIterator<Dictionary<string, string>>(qd).AsPages())
	{
		pageCount++;
	}
	
	$"Non-Paged ran {nonCount} times to Paged {pageCount}".Dump();
}

And here's the response:

Non-paged query
Paged query
Non-Paged ran 3811 times to Paged 39

The iteration count dropped and I ran a network trace on it, which saw the same number of network requests happening.

I might look at putting in a queryBatch or something like that which returns AsyncSeq<Page<T>> to give feature parity with the underlying API.

@BrianVallelunga
Copy link
Author

Thanks for the detailed write-up. I'm waiting to use this on my project until the v4 Cosmos API SDK is fully released, but this looks great.

@aaronpowell
Copy link
Owner

I've added a "pagination" option in a new branch: https://github.com/aaronpowell/FSharp.CosmosDb/tree/pagination

Basically all it does is adds a new method Cosmos.execBatchAsync which returns AsyncSeq<Page<T>> so you can get the some paged results but it's not really paged due to what I mentioned above.

@aaronpowell
Copy link
Owner

This will be coming in the next release.

@aaronpowell
Copy link
Owner

If anyone wants to test, grab the 0.3.0 pre-release packages from https://github.com/aaronpowell?tab=packages&repo_name=FSharp.CosmosDb

@aaronpowell
Copy link
Owner

Available on NuGet now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants