Paginated query over 2i with range return non existent data #498

Open
peczenyj opened this Issue Feb 27, 2014 · 1 comment

Comments

Projects
None yet
2 participants

Hi all

I am using Riak 1.4.2 + levelDB and I find something strange.

We have one index called 'expiration_epoch_int', it is like a TTL for a particular key in this bucket. To find expired data to delete is just query over expiration_epoch_int between 0 and 'now'. For a small amont of data it seems really good.

But today I find this: the first 10 results from this query return non-existent keys. It was already deleted. I receive one 'HTTP/1.1 404 Object Not Found' if I try to inspect.

If I use a small range, like around +/- 1 second from now, I can find good results (keys who exists in Riak) but if I start from 0 ( or 1) at least the begining are keys who does not exist.

If I use return_terms=true I can find the expiration_epoch_int too (it returns data between 10 and 20 days ago). I am using the PBC interface for query and delete.

So, my question: why this happens? can be related to pagination (maybe some cache)? When we perform our cleanup process, we process, for example, ~7 x10^6 keys.

To control the expiration of a huge amount of data, it is save use only one secondary index? There is some limit for a huge number of keys? I have no idea where I can start to investigate this.

I will try to run a more complete test to find the % of deleted keys returned by Riak.

Contributor

jaredmorrow commented Mar 24, 2014

Also /cc'ing @engelsanchez and setting milestone for 2.0.1. Since I don't know if this was already fixed in the 1.4.x series.

@jaredmorrow jaredmorrow added this to the 2.0.1 milestone Mar 24, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment