Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as_epoch is potentially too inaccurate for safe pagination #1621

Open
peterbe opened this issue May 3, 2018 · 5 comments
Open

as_epoch is potentially too inaccurate for safe pagination #1621

peterbe opened this issue May 3, 2018 · 5 comments
Labels

Comments

@peterbe
Copy link
Contributor

peterbe commented May 3, 2018

What if you have 100 records in the same parent. They all have uniquely different last_modified. E.g. 2018-04-20 16:21:28.000001, 2018-04-20 16:21:28.000002, 2018-04-20 16:21:28.000003, etc. Postgres has no problem indexing and sorting these 100% reliably.

But they all have the same as_epoch(last_modified) :(
In this example: 1524241288164, 1524241288164, 1524241288164, etc.

So if you try to paginate these 100 records, in batches of 10, how can you do the second (and third and forth etc.) batch if you use as_epoch.

@peterbe
Copy link
Contributor Author

peterbe commented May 3, 2018

This is my attempt to file a followup to this comment where I found that selecting 10,000 records at a time from 709,000 records caused it to miss two records that had distinctly different ID and distinctly different last_modified, yet they got missed.

@Natim
Copy link
Member

Natim commented May 3, 2018

Ok so the mistake is to use as_epoch rather than the more precise unique value?

@Natim
Copy link
Member

Natim commented May 3, 2018

I though we were precise at the milliseconds so as_epoch should be unique

@peterbe
Copy link
Contributor Author

peterbe commented May 3, 2018

I though we were precise at the milliseconds so as_epoch should be unique

No as_epoch rounds to what looks like the nearest second or something.

From my database:

buildhub=# select id, as_epoch(last_modified), last_modified from records where id in ('devedition_aurora_58-0b6rc1_linux-x86_64_as', 'devedition_aurora_59-0b6_win32_uk', 'devedition_aurora_58-0b6rc1_linux-x86_64_et', 'devedition_aurora_59-0b6_win64_bs');
                     id                      |   as_epoch    |       last_modified
---------------------------------------------+---------------+----------------------------
 devedition_aurora_58-0b6rc1_linux-x86_64_as | 1524241288164 | 2018-04-20 16:21:28.163601
 devedition_aurora_58-0b6rc1_linux-x86_64_et | 1524241288164 | 2018-04-20 16:21:28.16417
 devedition_aurora_59-0b6_win32_uk           | 1524240460834 | 2018-04-20 16:07:40.833634
 devedition_aurora_59-0b6_win64_bs           | 1524240460834 | 2018-04-20 16:07:40.834225
(4 rows)

@leplatrem leplatrem added the bug label May 3, 2018
@leplatrem
Copy link
Contributor

In case you wonder:

--
-- Convert timestamps to milliseconds epoch integer
--
CREATE OR REPLACE FUNCTION as_epoch(ts TIMESTAMP) RETURNS BIGINT AS $$
BEGIN
RETURN (EXTRACT(EPOCH FROM ts) * 1000)::BIGINT;
END;
$$ LANGUAGE plpgsql
IMMUTABLE;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants