Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4398] Trash batch-purging #4399

Closed
wants to merge 7 commits into from
Closed

[4398] Trash batch-purging #4399

wants to merge 7 commits into from

Conversation

jqnatividad
Copy link
Contributor

@jqnatividad jqnatividad commented Aug 14, 2018

Related to #4398

Proposed fixes:

Purge datasets 10 at a time to avoid timeouts when there are a lot of trash items.

Make the trash list a numbered list. Since the list does not do pagination, and the purge button is all the way in the bottom, the sysadmin needs to go to the bottom of the list to press "Purge." Making it a numbered list, gives the sysadmin useful info when there a lot of datasets available for purging.

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

Please [X] all the boxes above that apply

Proposed fixes:
Purge datasets 10 at a time to avoid timeouts when there are a lot of trash items.

Make the trash list a numbered list. Since the list does not do pagination, and the purge button is all the way in the bottom, the sysadmin needs to go to the bottom of the list to press "Purge."  Making it a numbered list, gives the sysadmin useful info when there a lot of datasets available for purging.

Features:
  [ ] includes tests covering changes
  [ ] includes updated documentation
  [X] includes user-visible changes
  [ ] includes API changes
  [ ] includes bugfix for possible backport

Please [X] all the boxes above that apply
@TkTech
Copy link
Member

TkTech commented Aug 14, 2018

I'd suggest doing this a bit differently, especially because we allow hooking package delete/purge which can cause some pretty lengthy tasks to run (such as unmarking it for publication on the source site). Your batching may prevent a database operation from timing out, but it won't stop the HTTP server from timing out.

  1. When datasets are selected for purge, mark their state as something new (say deleted_pending_purge). If we want to be fully backwards compatible we'd have to add a new flag to the model instead (ex: purge=True, but keep state=DELETED)
  2. Put the actual purge task into a scheduled background task (or trigger it immediately when purge is pressed) which does the actual deletion of all packages matching purge=True

With the removal of revisions the purge method can be simplified quite a bit as well.

@wardi
Copy link
Contributor

wardi commented Aug 14, 2018

I'm fine with this quick hack, but @TkTech 's approach is the best long term fix if someone wants to work on it.

@jqnatividad
Copy link
Contributor Author

We had a large 2.5.x installation with hundreds of datasets in the Trash, and we were getting web server timeouts when their sysadmin was trying to purge the trash.

Had to pull this together as they were running out of DB space as the datastore was filling it up. Even had to use @metaodi's "quick-and-dirty" paster command to delete orphaned tables (#3422 ).

They're now on 2.7.3, and find we still need this because of their heavy usage of the site results in trash filling up.

Perhaps, until someone works on the long-term fix?

@TkTech
Copy link
Member

TkTech commented Aug 15, 2018

Sorry, my comment really should have been on the issue rather than the PR. I'm 100% 👍 on this change going in, but the issue is still an issue and should remain open until we can move it to the background workers.

@wardi
Copy link
Contributor

wardi commented Sep 25, 2018

This is handled by the flask views now in master. Can you make the same change to https://github.com/ckan/ckan/blob/master/ckan/views/admin.py ?

@kmbn
Copy link
Contributor

kmbn commented Jan 22, 2019

The Travis build shows up as pending here, but if you click through, all the tests pass. @jqnatividad made the requested changes for Flask. Is there anything else that needs to be done before merging the PR, @wardi?

@kmbn kmbn added the Awaiting tech team feedback This PR or issue needs feedback from the tech team. label Jan 24, 2019
@jqnatividad
Copy link
Contributor Author

jqnatividad commented Nov 20, 2020

Closing this stale PR as this has been "partly" fixed with the removal of revisions.

"Partly" because the datasets are only marked as deleted and not actually purged from the datastore, resulting in orphaned tables.

#5589

#3422 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting tech team feedback This PR or issue needs feedback from the tech team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants