Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Octet Streaming of Datastore Dump #7839

Merged
merged 26 commits into from Oct 17, 2023

Conversation

JVickery-TBS
Copy link
Contributor

fix(views): datastore ext dump octet streaming;

  • Fixed view method for datastore dump to actually stream data in Flask response.
  • Fixed possible issue in sql return.

Fixes

Datastore dump does not actually stream the data. The view method has now be re-written to properly stream the data via Flask Response.

Also includes small condition for possible issue in the return of some sql stuff.

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

Please [X] all the boxes above that apply

- Fixed view method for datastore dump to actually stream data in Flask response.
- Fixed possible issue in sql return.
- Added change log file.
- Fixed flake8 syntax reporting.
- Fixed typing reporting.
@wardi
Copy link
Contributor

wardi commented Sep 27, 2023

Love to see this @JVickery-TBS

Datastore dump for large datasets has been broken since 2.9 and really needs this fix.

- Changed member dump view to not use datastore writer.
- Updated typing for datastore dump inner methods.
- Fix more typing issues.
- Even more typing fixes.
- Please let this be the last fix.
@JVickery-TBS
Copy link
Contributor Author

Okay typing should be good now. And added a fix to member_dump from group view as I contributed it using the datastore csv_writer which now changes with this PR

Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just added a couple of comments before merging

ckanext/datastore/blueprint.py Outdated Show resolved Hide resolved
ckanext/datastore/blueprint.py Outdated Show resolved Hide resolved
ckanext/datastore/blueprint.py Show resolved Hide resolved
- Removed `u` strings in new code.
- Changed resource id validator to datastore search.
- Reworked new code into `dump_to`.
- Removed unused `Invalid` import.
- Reworked the `dump_to` method to work for the view method and the cli method.
- Removed `bom` flag from cli command and handled condition based on format.
- Fixed unbound for `bom`.
ckanext/datastore/cli.py Outdated Show resolved Hide resolved
- Made `dump_to` always return a generator.
- Removed unnecessary params from `dump_to`.
- Brought back BOM flag in cli dump command.
@amercader
Copy link
Member

@JVickery-TBS some related tests are failing:

https://app.circleci.com/pipelines/github/ckan/ckan/5846/workflows/c3297949-45cb-4f09-8798-19f858e1c68e/jobs/13838/tests

I think circle ci working needs to be fixed on your end, perhaps authorizing GitHub again?

@JVickery-TBS
Copy link
Contributor Author

@amercader sadly it was a recent change to CircleCI. Users need to have write permissions to the github repo, and also be added to the project in CircleCI to initialize a CircleCI build/workflow.

I had to talk to CircleCI support about it as we recently moved to CircleCI. And of course the week we started using it was the week they made this change. They said it was for security purposes so that anyone cannot just initialize a CircleCI build.

But thanks for running the workflow, I will update these tests to work with the new code

- Removed byte strings in StringIO buffer usage.
@JVickery-TBS
Copy link
Contributor Author

@amercader Okay the tests should hopefully work now. At least they worked locally for me. Seems like github is just taking a while to update this PR with the new push. But hopefully it updates sometime soon

- Use `BytesIO` instead of `StringIO`.
- Added `utf-8` encoding for byte writing.
- Added `utf-8` encoding for byte writing.
- Added `utf-8` encoding for byte writing.
- Added `utf-8` encoding for byte writing.
- Reverted some `str` to `bytes` changes.
- Typing,
- Indents.
- Reverted some `str` to `bytes` changes.
- More attempts at byte writing for datastore dump.
- More attempts at byte writing for datastore dump.
- Moved output buffers into the datastore writer methods.
- Made write record methods return bytes.
- Added end file method for returning final bytes of the files.
- Lint fixes.
- Typing fixes.
- Removed method param comments.
@JVickery-TBS
Copy link
Contributor Author

@amercader Okay Ian came to my rescue and this should all be working now. Had to move the StringIO/BytesIO stuff into the writers as the XML one wanted Bytes and the others String. So you will no longer need to supply a file-like object to the writer methods, they will make one for you and yield the bytes in write records

@amercader amercader merged commit 308b52c into ckan:master Oct 17, 2023
8 checks passed
@amercader
Copy link
Member

Thanks @JVickery-TBS , and it looks you sorted out your circleci issues!

As this is quite a big change, would you mind submitting a PR with the same changes against dev-v2.10 to make backporting easier? Many thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants