Introduce garbage collection for validation data#6763
Merged
westonruter merged 8 commits intodevelopfrom Dec 7, 2021
Merged
Conversation
b50939d to
fe878a7
Compare
Contributor
|
Plugin builds for 0e6c4e6 are ready 🛎️!
|
Member
Author
|
Before deploying the changes here to a site, there were 55 Validated URLs and 647 Validation Errors. After deployment, the numbers were reduced to 14 and 148, respectively. |
westonruter
commented
Dec 7, 2021
dhaval-parekh
approved these changes
Dec 7, 2021
Collaborator
dhaval-parekh
left a comment
There was a problem hiding this comment.
Changes look good to me.
schlessera
requested changes
Dec 7, 2021
| * @return void | ||
| */ | ||
| public function process( ...$args ) { // phpcs:ignore VariableAnalysis.CodeAnalysis.VariableAnalysis.UnusedVariable | ||
| AMP_Validated_URL_Post_Type::garbage_collect_validated_urls( 100, '1 week ago' ); |
Collaborator
There was a problem hiding this comment.
Should we maybe provide filters for both of these values (or a single combined one) so that sites can adapt this for optimization or debugging purposes?
Co-authored-by: Alain Schlesser <alain.schlesser@gmail.com>
…-toolbox-php<0.9.0" This reverts commit b305db3.
…ation-garbage-collection * 'develop' of github.com:ampproject/amp-wp: Update Composer lock file Update to amp-toolbox 0.9.2 Update Gutenberg package dependencies Update unit test case Use AMP_Validated_URL_Post_Type::get_url_from_post() instead of ->post_title
schlessera
approved these changes
Dec 7, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #4779
This introduces garbage collection for validation data (
amp_validated_urlposts andamp_validation_errorterms).With Site Scanning in v2.2, the most recently-published post will be validated on a weekly basis. If the user never sees the list of Validated URLs—such as when the user doesn't have DevTools turned on—the end result is a perpetual increase in the number of validated URLs. Over time this will result in validation data taking up more and more of the database. When all of the validation errors associated with a validated URL are unreviewed, or if all of the validation errors are related to other validated URLs as well, then there is no need to keep the old validated URLs in perpetuity. They should be garbage-collected.
This PR introduces a new cron task which runs on a daily basis. It obtains a random set of
amp_validated_urlposts that are older than 1 week. For each post which is stale, it checks to see if it has associatedamp_validation_errortaxonomy terms. The validated URL garbage collected if it does not have a unique validation error (not associated with any other URL) which has been been marked as reviewed or has a non-default removed state.After the validated URLs have been garbage-collected, the cron task finally calls
AMP_Validation_Error_Taxonomy::delete_empty_terms()to remove anyamp_validation_errortaxonomy terms which no longer have any associated validated URLs. This is the same action that previously required the user to click the “Clear Empty” button on the Error Index screen:The logic described above prevents deleting validated URLs that would cause validation error terms to become empty unless they are not reviewed and not in the default removed state.
Checklist