Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamodb: not possible to use "LastEvaluatedKey" from a Count query #237

Open
jonathaningram opened this issue Sep 23, 2014 · 5 comments
Open

Comments

@jonathaningram
Copy link

Correct me if I am wrong, but it seems like there's no way to run a count query where you want to take the LastEvaluatedKey and forward it on (i.e. to get the total count across all pages). E.g. here is a response from dynamodb for a count query on an index, where you actually want to "follow" the last key:

{
    "Count":209,
    "LastEvaluatedKey":{
        "Id":{
            "S":"359a4dce-52b7-487c-9a4a-7fa6baaa3934"
        },
        "Status":{
            "S":"Complete"
        }
    },
    "ScannedCount":209
}

(FYI this is for a query which is essentially "count records where status=Complete")

There is no variation of CountQuery that returns the last evaluated key, and you can't use QueryTable because even though it returns the last evaluated key, it doesn't return the count. If said method was modified to continue if Items doesn't exist (per https://github.com/crowdmob/goamz/issues/236) it may be possible to take the cap(results), but not sure if that would work, and it's a bit hacky.

What do you think? Happy to provide a solution if you think there's a good one.

@alimoeeny
Copy link
Contributor

Hi @jonathaningram , never used "count" query myself. Let's see if I understand how it works. Basically if you want to count the number of results in a query, you set 'count' to true in your query request (don't know if this is necessary), and you get count and lasteveluatedkey back. Then you keep the count and repeat the request with the lastevaluatedkey as ExclusiveStartKey until you get zero count back. The sum of all counts is your answer.
So if I understood the procedure correctly, to me it looks as if we have a couple of options. What do you think about this solution:
1- keep QueryOnIndex the way it is.
2- create a new function, we can call it advancesqueryindex or queryindexwithcount or something like that, which accepts count as a parameter and returns both count and lastevaluatedkey
3- have another function that uses the function from (2) and repeats the query until it gets the total count. we can call it something like totalindexcount or totalindexquerycount ...
What do you think,
And absolutely, please send your PR.
Thanks for your contribution,

@jonathaningram
Copy link
Author

@alimoeeny yep you set the Select like this:

q.AddSelect("COUNT")

Not sure if it's necessary to provide methods that aggregate the count for you, I think it's enough to just provide the method that returns the count and next key. So just as you have:

func (t *Table) QueryTable(q *Query) ([]map[string]*Attribute, *Key, error) {
}

We need some variation like this (the internals are just a rough copy and paste and mashing from QueryTable):

func (t *Table) CountTable(q *Query) (int64, *Key, error) {
    jsonResponse, err := t.Server.queryServer("DynamoDB_20120810.Query", q)
    if err != nil {
        return 0, nil, err
    }

    json, err := simplejson.NewJson(jsonResponse)
    if err != nil {
        return 0, nil, err
    }

    itemCount, err := json.Get("Count").Int()
    if err != nil {
        message := fmt.Sprintf("Unexpected response %s", jsonResponse)
        return 0, nil, errors.New(message)
    }

    var lastEvaluatedKey *Key
    if lastKeyMap := json.Get("LastEvaluatedKey").MustMap(); lastKeyMap != nil {
        lastEvaluatedKey = parseKey(t, lastKeyMap)
    }

    return itemCount, lastEvaluatedKey, nil
}

What do you think about that? The calling code looks something like this:

func CountCompleteJobs(int64, error) {
    t := jobsTable()
    q := dynamodb.NewQuery(t)
    q.AddIndex("Status-Index") // Note: use of an index here is inconsequential - could be a query without an index too
    q.AddKeyConditions([]dynamodb.AttributeComparison{
        *dynamodb.NewEqualStringAttributeComparison("Status", "complete"),
    })

    var count int64
    var lastKey *dynamodb.Key
    count, lastKey, err = t.CountTable(q)
    if err != nil {
        return 0, err
    }
    for lastKey != nil {
        var c int64
        q.AddExclusiveStartKey(t, lastKey)
        c, lastKey, err = t.CountTable(q)
        if err != nil {
            return 0, err
        }
        count += c
    }
    return count, nil
}

The name CountTable is not be great. Would like CountQuery but that's already taken and we can't change the signature of that. Maybe CountQueryLastEvaluated...not sure.

@alimoeeny
Copy link
Contributor

Thanks @jonathaningram .
Your choice. In the next iteration (V2.0) we will need to clean up the names anyway, then we'll call it CountQuery.
Go for it, please.

@ian-kent
Copy link
Contributor

If its of any use... I couldn't find any obvious way to count on a scan either - i.e., count everything with no conditions, same kind of issues (no count or last evaluated key available), so I hacked in this temporary fix: https://gist.github.com/ian-kent/93e7d33dd413d85876ae

Can send that as a PR if you think it's worth it?

@alimoeeny
Copy link
Contributor

Yeah, I imagine there are other people who can benefit.
I (like you and many others) have a bunch of util functions that does many of these kind of things, but never got to cleaning them up to contribute back.
Thank you @ian-kent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants