Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[M] Refresh Mapping Resolution (ENT-5384) #3616

Closed
wants to merge 3 commits into from

Conversation

Ceiu
Copy link
Contributor

@Ceiu Ceiu commented Oct 6, 2022

Added logic for detecting and fixing entity mapping issues during refresh

  • During refresh and manifest import, certain issues with org-entity
    mapping issues will be detected and, if present, will remap products
    and content to ensure correct entity usage

Reverted product/content link from activation key and environments

  • ActivationKey no longer references products directly, instead
    referencing products by Red Hat product ID
  • Environment no longer references content directly, instead
    referencing content by Red Hat content ID
  • Removed the ActivationKey and Environment entity remapping logic
    from org refresh and product/content update flows
  • Internal entity model API improvements, cleanup, and changes
  • Removed some long-unused code

Removed orphaned product grace period functionality

  • Removed the orphaned product grace period functionality entirely,
    as its purpose is superceded by removing the hard link between
    activation key and environment to product and content

@nikosmoum nikosmoum requested review from a team and bakajstep and removed request for a team October 6, 2022 16:41
@Ceiu
Copy link
Contributor Author

Ceiu commented Oct 6, 2022

This PR is a pretty heavy change, primarily focused on addressing an issue where a pool or product can reference a product or content that exists outside of the org. This update will resolve most cases of this class of issue and trigger a complete remapping of the org's products and/or content when if occurs.

This also had knock-on effects of hitting the activation key and environment content, as remapping becomes incredibly complex in those cases, and would eventually run into the same issue/complaint we received with deleting orphaned products: the extra churn on maintaining activation keys and environments is not worth the gain in a handful of bytes of data.

Naturally, this leaned into the orphaned product grace period code, which itself is painful and unnecessary, and has been removed in this PoC.

Additional work to be done:

  • explicit unit tests built for testing the remapping detection and code
  • another changeset to remove the now-useless orphaned date field on owner_products
  • fix failing spec tests

Copy link
Contributor

@joshmalbrecht joshmalbrecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes make sense! During the actual implementation, it would be interesting to get some performance metrics of the RefreshWorker against an SQL dump of non-prod.

@@ -264,6 +264,7 @@ public static final class Attributes {

@ManyToOne
@JoinColumn(name = "derived_product_uuid", nullable = true)
@Immutable // is this necessary?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it makes sense here. If someone wants to update the derivedProduct, they should be updating that entity directly and not by modifying the reference of the object, and then making an update and expecting the changes to cascade

Comment on lines +47 to +55
<!--
<changeSet id="20221003160605-5" author="crog">
<preConditions onFail="MARK_RAN">
<tableExists tableName="cp2_activation_key_products"/>
</preConditions>

<dropTable tableName="cp2_activation_key_products"/>
</changeSet>
-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the actual implementation should we drop cp2_activation_key_products and cp2_environment_content? You may have had this commented to verify the data migration was successful and didn't uncomment it, but it seems like we should be good to drop the tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. In the final implementation, these will be uncommented and possibly moved to another changeset, perhaps in a later commit. We need some kind of delay to allow for a restoration of the original data in the event the migration doesn't go smoothly. Mostly in hosted, but depending on the size of the change, also Satellite.

These are mostly just markers for me to remember to go back and do that.

@@ -508,18 +578,43 @@ public RefreshResult execute(Owner owner) {
this.contentMapper.clearExistingEntities();

// Add in our existing entities
this.poolMapper.addExistingEntities(
this.poolCurator.listByOwnerAndTypes(owner.getId(), PoolType.NORMAL, PoolType.DEVELOPMENT));
List<Pool> pools = this.poolCurator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need obtain a system lock for the poolCurator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. At least, not at present. Pools are currently gated behind the owner locks elsewhere in refresh and the various pool manipulating operations. If pool scope changes, or we give our locking model a proper pass/refactor, that'd definitely be something to consider.

Comment on lines +137 to +145
// Ensure it's set to this environment. Really this should be immutable and created internally.
envcontent.setEnvironment(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include this to the setEnvironmentContent(Set<EnvironmentContent> environmentContent) method for the elements in that collection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we probably should. That whole method should probably be written to encapsulate the collection rather than just using whatever external collection it's given as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@nikosmoum nikosmoum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first change (entity mapping resolution) seems good, but I am unsure if the removal of hard links between environment/content will make the revert of the orphan grace period possible

Comment on lines +8891 to +8809
contentId:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that changing this API will affect katello: https://github.com/Katello/katello/blob/6ec0e826b64dda5bad1080112f83d35c586bfb03/app/lib/actions/candlepin/environment/set_content.rb#L53-L55 (there might be more places where this is used; this is just the most obvious reference I found).

Depending on how they use this, they need the uuid here instead of the rh id (or even both)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which makes me think: Is the removal of the orphan grace period even possible? If that is reverted, it means during a manifest refresh an orphaned content that's in an environment is removed, then Katello tries to do a GET /owners/owner_key/content/content_id and it's not there, so Satellite still has the same problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, that's expected behavior, even with the customers hitting the issue. What the problem ended up being is that with the hard links, when the content/product was cleaned up, we'd remove the reference from the environment or activation key immediately -- which, again, was fine for the state -- but when they then renewed their subscriptions and the entity was restored, they would have to go back and manually re-add all the content and products to the environments or actkeys that got updated when the entity went missing.

So basically, the orphan grace period stuff was written specifically to solve this case:

  • customer has subscription that expires
  • customer does import that contains expired subscription
  • import/refresh op removes now-unused products/content
  • import/refresh op updates activation keys and environments to no longer reference removed entities
  • customer renews subscription
  • customer imports manifest containing renewed subscription
  • environment/keys aren't restored to previous state as expected

Normally this would be a "yeah, don't let your subscriptions expire if you don't want a bunch of extra work" -- but due to some design choices from earlier versions (CP0-CP2 had weak refs; CP3 didn't remove unused entities) the original behavior would be for the keys and environments to be restored to the initial state; even if it makes zero sense in terms of Candlepin entity mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of whether or not this protects us from the need for orphan removal, yes. Basically, we revert to the CP1 days and just look up all the products as the environment or activation key is processed, and ignore entities which no longer exist within the org. The key management APIs still permit the usual CRUD operations on them, so they're not permanently stuck on there, but the entities can blip and and out of existence without requiring explicit management of the container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm... investigating the origins of the environment/key cleanup code, it goes back a very long time -- roughly 10 years [1].

So apparently the way refresh worked in the past was what was saving them -- we would unlink products and content from pools as they were cleaned up and deleted, but the products or content themselves would stick around indefinitely. But if the content was ever manually deleted, it would be removed from the environment all the same.

I really hate the orphan grace period code and flow, but it seems this change would also introduce weird behavior relative to past versions. Going to think on this a bit before fully committing to that cleanup removal; maybe there's a better option. The UUID -> ID change is probably still a net good one, but we'd still need to do some cleanup on the deletion of products and content regardless.

[1] 678f1f0

…resh

- During refresh and manifest import, certain issues with org-entity
  mapping issues will be detected and, if present, will remap products
  and content to ensure correct entity usage
- ActivationKey no longer references products directly, instead
  referencing products by Red Hat product ID
- Environment no longer references content directly, instead
  referencing content by Red Hat content ID
- Removed the ActivationKey and Environment entity remapping logic
  from org refresh and product/content update flows
- Internal entity model API improvements, cleanup, and changes
- Removed some long-unused code
- Removed the orphaned product grace period functionality entirely,
  as its purpose is superceded by removing the hard link between
  activation key and environment to product and content
@Ceiu Ceiu force-pushed the crog/poc/refresh_mapper_refactor branch from 1505a0c to c99cb43 Compare November 1, 2022 17:15
@Ceiu
Copy link
Contributor Author

Ceiu commented Nov 3, 2022

Closing this as the production version of it will be handled in other PRs.

@Ceiu Ceiu closed this Nov 3, 2022
@Ceiu Ceiu deleted the crog/poc/refresh_mapper_refactor branch February 14, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants