Skip to content

IvoPereira/Efficient-Pagination-SQL-PoC

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

How to Create an Efficient Pagination in SQL (PoC)

OverviewHow To RunObservationsLicense

Overview

This proof of content proves that the most widely used way of making a pagination in most RDBMS - using LIMIT and OFFSET - is not the most efficient one, clearly observed when you start gathering huge amounts of data and need to paginate through them.

To run our experiment, we made available a docker-compose infra that loads a 10M rows dump on a MariaDB instance. When running a sequence of commands we are able to compare the efficiency of two different approaches:

  • Using LIMIT and OFFSET: This approach is the globally accepted as the "way" to do it - and the wrong one as well as it ends up being the most expensive one as it requires the database to perform a full table scan every time it runs without cache. When building a pagination using this approach the developer should pass the OFFSET and the LIMIT to the query.

    SELECT * FROM `docs` LIMIT 10 OFFSET 2850001
  • Using the last retrieved ID and LIMIT: This approach is the one we are trying to prove that should be used instead. By directly limiting the query to search from the last retrieved ID, we prevent the database the need to lookup every sequential row and start the lookup directly from where it needs to. When building a pagination using this approach the developer should pass the last retrieved ID in the page before and the LIMIT to the query.

    SELECT * FROM `docs` WHERE id > 2850000 LIMIT 10

Before running the experiments, we explicitly disable query caching by setting "query_cache_size" to "0", as we want to measure the real impact a bad approach could have1.

1: MariaDB's Query Cache is inconsistent across different versions, so we turn it off manually

How to Run the PoC

To clone and run this PoC, you'll need Git, Docker-Compose and gzip (to unzip the dump.sql.gz) installed on your computer.

To run it, start the MariaDB service running the following command:

make start

Once mysqld is ready for connections (something like mariadb | 2020-06-02 21:29:12 0 [Note] mysqld: ready for connections. should appear in the CLI - should take 30-40 seconds), run the PoC in a separate terminal window:

make poc

Both queries should execute and you should now be able to compare the uncached execution times of both approaches.

Observations

After running the instructions provided in "How to Run the PoC" these are the results we got:

Size Approach comparison

As we can observe in a sample provided by the PoC, we have seen an increase of 193171.17% in execution time, considering the first query as the one using the LIMIT/OFFSET and the second one using the last retrieved ID and a LIMIT.

Feel free to run make poc several times to compare the results and to take your own conclusions on which is the most efficient option.

Considering this, the second approach should be used to create a Pagination system.

You may also like...

License

MIT


ivopereira.net  ·  GitHub @ivopereira  ·  Twitter @ivoecpereira  ·  LinkedIn

About

A PoC that provides better guidance on creating a more efficient pagination without using SQL OFFSET and LIMIT.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published