Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize large OFFSET queries on large datastore tables #3646

Closed
3 tasks
grugnog opened this issue Aug 30, 2021 · 1 comment · Fixed by #3703
Closed
3 tasks

Optimize large OFFSET queries on large datastore tables #3646

grugnog opened this issue Aug 30, 2021 · 1 comment · Fixed by #3703

Comments

@grugnog
Copy link
Member

grugnog commented Aug 30, 2021

User Story

As a DKAN operator I want to avoid excessive resource usage when API requests have a large offset on a large table.

Currently an offset of 1000000 on a large table requires over a second for MySQL to return the query results due to it having to count over this many records.

Unfortunately this isn't something that can be resolved by using indexes, but it is possible to improve performance by around a factor of 4 by doing the offset processing on the primary key field alone and then using this to select the fields using a join.

Current query:

SELECT t.field1 AS field1, t.field2 AS field2
FROM datastore_abc123 t
WHERE t.field1 = "unicorn"
LIMIT 500 OFFSET 1000000

Proposed query:

SELECT t.field1 AS field1, t.field2 AS field2
FROM datastore_abc123 t
INNER JOIN
  (SELECT record_number
   FROM datastore_abc123 t
   WHERE t.field1 = "unicorn"
   LIMIT 500 OFFSET 1000000)
AS i USING(record_number);

Acceptance Criteria

  • Query results are unchanged
  • Query performance in this case is ~4x faster
  • Query performance in other cases is not adversely affected in a significant way
@dafeder
Copy link
Member

dafeder commented Oct 27, 2021

New approach proposed in #3700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants