Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very long ResponseContinuation on certain query #61

Closed
Geetarman opened this issue Oct 13, 2015 · 25 comments
Closed

very long ResponseContinuation on certain query #61

Geetarman opened this issue Oct 13, 2015 · 25 comments

Comments

@Geetarman
Copy link

If I run this query
SELECT * FROM c WHERE c.ObjectType="Document"
I get a
ResponseContinuation = "+RID:16gQALRRBgM-AAAAAAAAAA==#RT:1"

if I run the following query (on the same collection)
SELECT * FROM c WHERE c.CustomerAreaId = "1"
the ResponseContinuation is 7755 bytes long

ResponseContinuation = "+RID:16gQALRRBgPrggsAAAAAAA==#RT:1#FPC:AgEuPC6KBu8CAPDd9/3e9/f9//f3+3v1/v7/+//v9/e/f3//fn+rRsDf7//ff/W9///96Pt9b/u+f3///v/f7//7+//3/t/vv3///v/+//3v3/v+///Bfb/f37+///++/77fu7uv3e+3vW/+//+/v+2+//vf//7/t/b97/9v2/v+9/v79/f7/f7v7zvt/f37/d9/v/v9ff+/3+v+/vf3//f7/...

Is this ResponseContinuation correct or should it just look like that in the first example.
If I edit it to remove everything after the #RT:1 it appears to work but it is very difficult to be sure with the amount of data in the collection. If the ResponseContinuation is correct I will have to re-engineer my paging mechanism in my web site as this is far too much data to transfer

@ghost
Copy link

ghost commented Oct 13, 2015

The continuation token helps avoid redundant work in the future roundtrips. We persist information in this token so we know exactly where to resume without needing to repeat any work again in the future round trips. Overall, this significantly reduces RUs cumulative across the query roundtrips.

May I ask, what’s the concern about 7KB. It is within the supported boundary for HTTP headers.

@Geetarman
Copy link
Author

Just that I am building a highly scalable website and I want to minimize the amount of data transferred. The request continuation is being transferred in the url as I assumed it was always small.. that was obviously a mistaken assumption on my part.

@ghost
Copy link

ghost commented Oct 13, 2015

Fair enough. Perhaps there is something you can do to keep the token on your server in a session token perhaps instead of round-tripping it to the client? But yes, if you need to send this to the client, then not putting it in the URL would be best.

@Geetarman
Copy link
Author

How big could that continuation token get.
Knowing that might help me decide the best solution. I might decide to store it in an unindexed collection.

@Kevin-TokyWoky
Copy link

Kevin-TokyWoky commented Feb 15, 2017

Same problem here.
We have an issue to transfer this token from the client machine to our server that will then make the docdb request.

We first passed it to our server in a query string. But already it was too long sometimes and the request would fail. So we passed that continuation token using a custom header. It worked for a while. Prolly more than 6 months or so.

And today... I just stumbled on a 12.8kb token.
And now the request is just too long for the browser even using headers. See attached file.
continuationToken.txt

We also base64 encode this token so it can be transferred anywhere safely (since it's a json with some {} and stuff), and it becomes 17.1kb in base64.

I could create a 'short' token on our side, which would be associated to the real docdb token. But that means that for each docdb request, I'd have to query another system to get the real token...
And to me, it's not my job, but m$ job to give us users a token that we can easily manipulate.

Shall this issue be reopened?

@Geetarman
Copy link
Author

For what it is worth I decided (really had no option) to store the continuation token in a cache object if it was greater that a certain length and replace it with a GUID. Therefore when the request comes in again I can determine if it is a real continuation token or cached upon which case I retrieve the 'true' continuation token. Currently the cache I use is a documentdb cache but you could use a table or Redis Cache.
I guess the problem is how long do you keep it. I keep a collection for caching purposes and it (auto) deletes them after a day (plenty enough time) and it works fine for me at the moment.
I use a dedicated unindexed collection for this purpose.

I would be nice though not to worry about it and remove the need to cache it myself...

@Kevin-TokyWoky
Copy link

Yes, I plan on using Redis. But first I'm trying to see if I can trim this token and only keep everything but the #FPC... I'm giving myself 30mins of hacking to try this :D

@Geetarman
Copy link
Author

Good luck. I think the term Continuation Token is a misnomer in this case - it is Continuation Data!

@Geetarman
Copy link
Author

Put a request on https://feedback.azure.com/forums/263030-documentdb
I'd vote for it....

@Kevin-TokyWoky
Copy link

Kevin-TokyWoky commented Feb 15, 2017

Ahah yes.
Well at first glance this shorter token I just created, removing the long FCP part, works.
I gotta compare with what we have on our production server and make sure it's the same results as with this hacked token.

@Geetarman
Copy link
Author

Hmmm, don't like the sound of that. If it isn't needed it must be a bug in documentdb in which case it should be reported. I have a vague feeling I tried something like that and had problems (it is a while ago)

@Kevin-TokyWoky
Copy link

Kevin-TokyWoky commented Feb 15, 2017

Same... I don't like it. Especially since I would rely on their token's format... which could change any day.
:/
But ryancrawcour mentioned that there were persistent data inside this token to save them work for future roundtrips. Which makes us do more work instead. And that's probably inside this FCP part.

Well... I think I'll go the Redis way and generate our own token. Not sure yet.

@rnagpal
Copy link
Contributor

rnagpal commented Feb 26, 2017

@zfang will follow up on this issue.

@Kevin-TokyWoky
Copy link

Kevin-TokyWoky commented Feb 27, 2017

Hello, thanks for reopening this.

So a bit of follow up since I almost forgot about this github issue:
following my conversation with Geetarman, I contacted the Azure support since we have a subscription.
I told my concerns about the token and I had a reply from Microsoft. See below:

  • For the query continuation token, it’s length could go up to 16KB. The query engine utilizes the token to serialize its state so that it could resume execution correctly. Along with the resume state, the query engine would also serialize some of the index lookup work on the continuation token to avoid repeating the same work for each continuation.
    If this is really a blocking issue for you, then I could give you some hints on trimming the continuation token before sending it back. By all means we do not recommend this unless this is an absolute must and is meant to be a temporary solution.
    From our side, we’re considering allowing the user to specify maximum continuation token length, with the caveat that if serializing the resume state did not fit in the specified max size, the query execution will fail with an error. We don’t have a timeline for this work yet though.

  • For the short term, you could trim the token by removing #FPC. Please keep in mind that in some cases you might get #FPP (i.e. either #FPC or #FPP).
    We’ll sure prioritize this work item and hopefully we could get around to it soon. 
    Best Regards,

Very nice to see things going forward, +1 to Microsoft. They listen.

As for us, we are indeed trimming the token right now, but we only remove the #FPC part, as at the time I didn't know about the #FPP part. So far seems to work great, but I suspect it must cost us a little more data point in our DocDB subscription since we remove some optimization from the token. Probably.

@ansario
Copy link

ansario commented Aug 3, 2017

@rnagpal it appears that our token is null even though we do have more results in our query and would need the continuation token to get the next results. We were using the method of stripping FPC and FPP.

"{"token":null,"range":{"min":"05C1E5D191B78A083134303331323800","max":"FF"}}"

Did anything change recently?

@Kevin-TokyWoky
Copy link

Kevin-TokyWoky commented Aug 3, 2017

Yep. It changed like... one month ago or something.
I changed my token regex on June 1st. My commit reads:

Instead of FPC at the end there is now FPP

Here is the regex we are now using:

private static readonly Regex ContinuationTokenDataRegex = new Regex(@"(\+RID:.*#RT:.*#TRC:.*#RTD:.*)#[FPC|FPP].*", RegexOptions.Compiled | RegexOptions.Singleline);

BTW as a bonus, here is our code to shorten the tokens:

        private static string ShortenToken(string phatToken)
        {
            try
            {
                dynamic jsonToken = JsonConvert.DeserializeObject(phatToken);
                Match matches = ContinuationTokenDataRegex.Match((string)jsonToken.token);
                if (matches.Groups.Count != 2)
                {
                    return phatToken;
                }

                string shorterToken = matches.Groups[1].Value;
                jsonToken.token = shorterToken;
                return jsonToken.ToString();
            }
            catch (Exception ex)
            {
                return phatToken;
            }
        }

EDIT: However we still have a valid token field

@ansario
Copy link

ansario commented Aug 3, 2017

@Kevin-TokyWoky that's fine, but the actual token is still coming back as null even though the response continuation itself is not null. So we can't do any regex on a null token.

@Kevin-TokyWoky
Copy link

Kevin-TokyWoky commented Aug 3, 2017

Yep, I don't know. I just checked on our side and everything works properly.
I can paginate stuff, the token is not null for us.

@kirankumarkolli
Copy link
Member

kirankumarkolli commented Aug 14, 2017

@ansario could you please share the 'activity id' and we will try to take a look.
Also it would be good if you can create a new issue to track it.

@kirankumarkolli
Copy link
Member

@ansario I am closing this issue as there is lot more other context as well. In-case you are still blocked feel free to raise a new issue.

@joopscheer
Copy link

joopscheer commented Aug 29, 2017

@Kevin-TokyWoky Thanks for your code example. I've had to change it a bit to get it to work in my code.

private static readonly Regex ContinuationTokenDataRegex = new Regex(@"(\+RID:.*#RT:.*#TRC:.*#RTD:.*)#[FPC|FPP].*", RegexOptions.Compiled | RegexOptions.Singleline);
private static string ShortenToken(string phatToken)
{
    if (string.IsNullOrEmpty(phatToken))
    {
        return phatToken;
    }

    try
    {
        dynamic jsonToken = JsonConvert.DeserializeObject(phatToken);
        var matches = ContinuationTokenDataRegex.Match((string)jsonToken.token);
        if (matches.Groups.Count != 2)
        {
            return phatToken;
        }

        jsonToken.token = matches.Groups[1].Value;
        return JsonConvert.SerializeObject(jsonToken);
    }
    catch
    {
        return phatToken;
    }
}

@jamesthurley
Copy link

jamesthurley commented Sep 4, 2019

From our side, we’re considering allowing the user to specify maximum continuation token length.

This has since been implemented as the ResponseContinuationTokenLimitInKb on the FeedOptions object.

@thomaslevesque
Copy link

This has since been implemented as the ResponseContinuationTokenLimitInKb on the FeedOptions object.

The original quote said (emphasis mine):

  • From our side, we’re considering allowing the user to specify maximum continuation token length, with the caveat that if serializing the resume state did not fit in the specified max size, the query execution will fail with an error. We don’t have a timeline for this work yet though.

So it's not really a solution. You can specify a max length for the token, but it will cause requests to fail...

@jamesthurley
Copy link

@thomaslevesque Happily the way they have implemented this is that they simply prune the continuation token to keep it under the desired limit, rather than failing with an error.

The caveat is that resuming the query may take a bit more work (and therefore RUs) if the continuation token has been pruned.

There is a bit more information which I found useful here: https://stackoverflow.com/a/54242859/37725

@thomaslevesque
Copy link

@jamesthurley good to know, thanks!
Too bad that the max length is expressed in KB, so we can't say e.g. "no more than 128 bytes". The "minimal" continuation token is only a few bytes, so there's still no easy way to get that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants