Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36334 - Delete orphaned content units after indexing and improve missing content units error on Pulp 3 copy #10542

Merged
merged 5 commits into from
May 31, 2023

Conversation

ianballou
Copy link
Member

What are the changes introduced in this pull request?

Ensures that content units belonging to no repositories are destroyed at repo indexing time. This is necessary because content units that have been removed from Pulp can still remain attached to content view filters, which will cause blocking errors at CV publish time.

I also improved the error returned from Pulp that occurs when trying to copy units that don't exist.

Note that I didn't rewrite the special rpm copy API error. In testing, I found that the rpm copy API ignores content units that don't exist.

Considerations taken when implementing this change?

I considered if content units should be destroyed during content unit importing, or perhaps if units could be disassociated from filters instead. Finding the filter relations at sync time would be too query-intensive. Determining the content units to remove during ContentUnitIndexer.import_all would require searching other repositories to find out if it's truly orphaned, which I ended up doing higher up in the stack anyway.

What are the testing steps for this pull request?

  1. Create a content view with some module streams and container image tags
  2. Make filters that filter in those tags and module streams
  3. Modify the pulp_hrefs for some module stream and some container image tag to be something that doesn't exist in Pulp
  4. Try publishing the CV. You should see a failed task in Dynflow with the error message rewrite that this PR implements.
  5. Perform a complete sync on the related repositories.
  6. The publish should now work. I think you'll have to skip the failing Dynflow task and try the publish again.

Also, try verifying that orphaned content units are indeed destroyed using something other than module streams and container image tags.

@theforeman-bot
Copy link

Issues: #36334

else
where.not(:repository_id => ::Katello::Repository.all)
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle module streams that don't exist in pulp that are still referenced by Katello CV repositories. (non-library). I am guessing the testing steps are throwing me off here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, in that case they wouldn't get removed. To fix this, I'll look again at having some sort of "correct filters" method that will run after a repository is synced. That'll ensure that ID filter rules exist only if they relate to content that is in the library repositories.

@ianballou
Copy link
Member Author

@sjha4 I updated the code to clean filter rules after disassociating module streams, package groups, and errata. I didn't bother putting it in an async task because the disassociation should be pretty rare.

@ianballou
Copy link
Member Author

@sjha4 I added the rake task.


::Katello::ContentViewErratumFilterRule.all.each do |rule|
content_view = rule.filter.content_view
unless ::Katello::Erratum.in_repositories(content_view.repositories).pluck(:errata_id).include?(rule.errata_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some nil checks here.

not(@model_class.unit_id_field => assocication_tracker.unit_ids).delete_all
repo_associations_to_destroy = @model_class.repository_association_class.where(repository_id: @repository.id).where.
not(@model_class.unit_id_field => assocication_tracker.unit_ids)
clean_filter_rules(repo_associations_to_destroy) if repo_associations_to_destroy.present? && [::Katello::ModuleStream, ::Katello::Erratum, ::Katello::PackageGroup].include?(@model_class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check affected_content_view_ids.present? here to avoid calling this if no CVs have this repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a return statement below.

@ianballou
Copy link
Member Author

@sjha4 I added rule cleanup when removing repositories from content views. I also did a little refactoring with the content view filter rule queries to dry up the code.

@sjha4
Copy link
Member

sjha4 commented May 23, 2023

Code looks good..Will test the updates. 👍🏼

Copy link
Contributor

@parthaa parthaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APJ

Copy link
Member

@sjha4 sjha4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Able to reproduce the error of missing content unit and task fails with updated error message:
    Failed_to_publish
  2. After complete sync on failing repository, able to publish successfully
    Screenshot from 2023-05-23 16-03-52
  3. Filter rules by Id are removed if their repo association is removed during indexing
  4. Tested the rake task on missing module streams from master and then switching to this branch and running rake task. Able to delete stream filter rules with the task.
    Screenshot from 2023-05-22 15-17-31

Code looks good. Added a commit to fix tests etc which was APJ'ed by @parthaa
Ack on the PR. Will merge after tests are green!

@sjha4
Copy link
Member

sjha4 commented May 23, 2023

[test katello]

@ianballou
Copy link
Member Author

@sjha4 my latest push should've fixed the test.

Copy link
Member

@sjha4 sjha4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.. 👍🏼

@ianballou ianballou merged commit aed6299 into Katello:master May 31, 2023
4 of 5 checks passed
@ianballou ianballou deleted the 36334-orphaned-content-units branch May 31, 2023 14:10
wbclark pushed a commit to wbclark/katello that referenced this pull request Jun 6, 2023
…ve missing content units error on Pulp 3 copy (Katello#10542)

* Fixes #36334 - Delete orphaned content units after indexing and improve missing content units error on Pulp 3 copy

* Refs #36334 - remove filter rules when related units get disassociated

* Refs #36334 - add rake task to clean up orphaned filter rules

* Refs #36334 - clean filter rules when removing repositories

* Refs #36334 - Fix rubocop and tests

---------

Co-authored-by: Samir Jha <sjha4@ncsu.edu>
(cherry picked from commit aed6299)
wbclark pushed a commit that referenced this pull request Jun 6, 2023
…ve missing content units error on Pulp 3 copy (#10542)

* Fixes #36334 - Delete orphaned content units after indexing and improve missing content units error on Pulp 3 copy

* Refs #36334 - remove filter rules when related units get disassociated

* Refs #36334 - add rake task to clean up orphaned filter rules

* Refs #36334 - clean filter rules when removing repositories

* Refs #36334 - Fix rubocop and tests

---------

Co-authored-by: Samir Jha <sjha4@ncsu.edu>
(cherry picked from commit aed6299)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants