Unbuffered datastore queries for download #3703
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Another approach to #3646. Previously we'd been trying to optimize our CSV streaming by breaking up the query into multiple chunks, but adding any sorting to queries on columns other than the primary key does not scale with this. Sorting large tables will always be slow, and will need to be repeated on every iteration. This may only add a few seconds, but as tables get larger that time increases and the number of times it is repeated increases.
A better solution may be to use unbuffered queries, to run the entire query just once and then stream the results directly from the database server. By default, Drupal and most other PHP/MySQL projects will always use buffered queries, meaning that the entire result set is passed to PHP immediately.
Here, we create a second database connection object based on default but adding the
PDO::MYSQL_ATTR_USE_BUFFERED_QUERY
attribute. This will be used by all datastore module database operations, which should be fine but we should look out for any unexpected side effects.Known issues, misc
This adds to the pieces of DKAN that assume we are using MySQL as the underlying database. It would be good to have a fallback in case not, or to make it more obvious how one might add support for PostgreSQL or other PDO drivers. Most database backends have some equivalent to this cursor-based fetching, but it's not standardized in PDO.
QA Steps
Coming... in general, just download some big CSVs and make sure they work!