Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command 'antctl upgrade api-storage' in antctl #5198

Merged
merged 1 commit into from Jul 28, 2023

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Jul 4, 2023

Add command 'antctl upgrade api-storage' in antctl to upgrade
existing objects of Antrea CRDs to the latest API version.

For #4832

TODO:

  • unit tests
  • doc
  • misc

pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command_test.go Outdated Show resolved Hide resolved
@hongliangl hongliangl marked this pull request as ready for review July 12, 2023 05:02
@hongliangl hongliangl changed the title [WIP] Add command 'antctl upgrade api-storage' in antctl Add command 'antctl upgrade api-storage' in antctl Jul 12, 2023
}

// patchCRDStoredVersions is used to patch field status.storedVersion of a CRD.
func patchCRDStoredVersions(k8sClient client.Client, crd *apiextv1.CustomResourceDefinition) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose user have to set one and only one version's storage as true when the CRD has multiple versions, so why do we need to patch the status.storedVersion of a CRD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may a CRD which has more than one value in status.storedVersion in this case:

  • create v1alpha1 of the CRD, and field of storage in v1alpha is set to true.
  • create a CR of version v1alpha1,
  • add v1beta1 of the CRD, and field of storage in v1beta1 is set to true and storage in v1beta1 is set to false.
  • create a CR of version v1beta1

We will get a CRD whose status.storedVersion has two values.

User may have the objects of the CRD in multiple versions. After upgrading all objects, the status.storedVersion should only have the latest version value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will status.storedVersion be managed and updated by K8s automatically? Have you tried to update all CRs to latest version and check after upgrade later?
cc @tnqn may have more ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember K8s doesn't remove versions from it automatically, this is also the doc says.

pkg/antctl/raw/upgrade/apistorage/command_test.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command_test.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command_test.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command_test.go Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated
antctl upgrade api-storage
```

This command is used to upgrade persisted resources for Antrea CRD `antreaagentinfos.crd.antrea.io` to the latest API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
Comment on lines 100 to 101
Short: "Execute a user-provided upgrade",
Long: "Execute a user-provided upgrade",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear to understand, maybe:

Suggested change
Short: "Execute a user-provided upgrade",
Long: "Execute a user-provided upgrade",
Short: "Upgrade version of Kubernetes resources",
Long: "Upgrade version of Kubernetes resources",

@antoninbas @tnqn any suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
"Upgrade Antrea CRD resources"
"Upgrade Antrea CRD resources to the latest API version"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that upgrade will be a group of commands. We could upgrade not only CRD resources, maybe we could upgrade something else used by Antrea. Do we need to limit the command upgrade to K8s resources or CRD right now?

pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
@luolanzone luolanzone linked an issue Jul 20, 2023 that may be closed by this pull request
@luolanzone luolanzone added the action/release-note Indicates a PR that should be included in release notes. label Jul 20, 2023
docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20230628-api-upgrade-tool branch 4 times, most recently from abd6ca8 to 16fbaf1 Compare July 20, 2023 08:22
@hongliangl hongliangl force-pushed the 20230628-api-upgrade-tool branch 2 times, most recently from a7cd65a to aa88234 Compare July 20, 2023 10:26
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
Duration: time.Second,
Steps: 5,
}, func(err error) bool {
return validateErr(err) != nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateErr also checks if err is nil, which can never happen here.
It's wrong to retry on NotFound error.
It should just check apierrors.IsConflict(err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not retry on NotFound, but we should retry on other errors, not only IsConfict. Is it right?

}, func(err error) bool {
return validateErr(err) != nil
}, func() error {
return k8sClient.Update(context.TODO(), &copyObj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it get the object once when retrying? otherwise this can never succeed since it has a stale resource version. Even if it succeeds by removing the resource version in the request, it would override the updates made after you list the objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here, every time when to update the object, get the object first by the namespace+name from the list.

pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
Comment on lines 208 to 212
expectedStorageVersion := getCRDStorageVersion(crd)
// This ensures that the storage version is not changed during the upgrade.
if getCRDStorageVersion(freshCRD) != expectedStorageVersion {
return newUnexpectedChangeError(crd)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps unnecessary because:

  1. Even if you check it, it's still possible that the CRD is updated after you get the "fresh" version.
  2. We should leverage the resourceVersion to let apiserver fail the update request when the CRD is updated after we get it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Even if you check it, it's still possible that the CRD is updated after you get the "fresh" version.

I got this and removed related code.

  1. We should leverage the resourceVersion to let apiserver fail the update request when the CRD is updated after we get it.

Could you give more details about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems not removed. What I mean is, we could assume the CRD is not updated during the upgrade. It could just update the StoredVersions to the storage version in the CRD's spec. If the CRD is updated during the upgrade, the update will fail because the resourceVersion doesn't match. ResourceVersion is used for this situation, preventing one client overriding another's write accidently.

If the update request fails with Conflict error, it means the CRD might have been updated in some way by users. We could just return error and abort the process.

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation part looks good to me.

docs/antctl.md Outdated Show resolved Hide resolved
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, one comment about logs.

pkg/antctl/raw/upgrade/apistorage/command.go Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
failed++
}
}
fmt.Fprintf(writer, "Successfully upgraded %d objects of CRD %q.\n", len(objList.Items)-failed, crd.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was not really about logging. The command should NOT succeed when there is anything wrong. Think about when the tool is integrated into a cluster management system: how can its caller, a program, know if it succeeds or fails. The exit code must indicate that.

I think it should just return error and abort the process when encountering the first error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got that, I have updated the PR as you suggested.

pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
Comment on lines 208 to 212
expectedStorageVersion := getCRDStorageVersion(crd)
// This ensures that the storage version is not changed during the upgrade.
if getCRDStorageVersion(freshCRD) != expectedStorageVersion {
return newUnexpectedChangeError(crd)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems not removed. What I mean is, we could assume the CRD is not updated during the upgrade. It could just update the StoredVersions to the storage version in the CRD's spec. If the CRD is updated during the upgrade, the update will fail because the resourceVersion doesn't match. ResourceVersion is used for this situation, preventing one client overriding another's write accidently.

If the update request fails with Conflict error, it means the CRD might have been updated in some way by users. We could just return error and abort the process.

pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
@hongliangl hongliangl force-pushed the 20230628-api-upgrade-tool branch 2 times, most recently from 19a029c to 4beeeba Compare July 27, 2023 01:51
@hongliangl hongliangl requested a review from tnqn July 27, 2023 01:54
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/command_definition.go Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated
### Upgrade existing objects of CRDs

antctl supports upgrading existing objects of Antrea CRDs to the latest version.
The related sub-commands should be run out-of-cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention the required permissions. It includes the get, list, and update of each CRD and its objects.
And make sure proper error is printed when some permissions are missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like multicluster sub-commands, currently, antctl upgrade can be only used outside the cluster with kubeconfig file to access to K8s-apiserver. If so, do we still need to mention the required permissions?

Copy link
Member

@tnqn tnqn Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running outside the cluster doesn't mean the kubeconfig file has admin priviledge. Normally we should add the required permissions of this command to ClusterRole antctl as the existence of the ClusterRole implies it contains all permissions its commands require. However, the required permissions of upgrade api-storage could make antctl a super user, while its privilege is currently quite limited, the only create permission it get is creating supportbundles.

So a conservative approach is to document clearly what permissions are required for the upgrade command, then users know what they need to do when they encounter the issue.

I don't know how antctl mc guides users to get required permissions. But for other commands, we have added all permissions to ClusterRole antctl. For upgrage api-storage, we need at least document it.

cc @jianjuns @antoninbas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got that, I'll add the required permissions in doc and give some error prints for lacking permissions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no document or permissions added to ClusterRole when we add the antctl mc, I will create an issue to track this first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
docs/antctl.md Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/upgrade/apistorage/command.go Outdated Show resolved Hide resolved
Kind: crd.Spec.Names.Kind,
})
if getErr = k8sClient.Get(context.TODO(), client.ObjectKeyFromObject(&obj), objToUpdate); getErr != nil {
return getErr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore the not found error here if the object is deleted during retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we got not found error, it will not retry and return the error directly.

existing objects of Antrea CRDs to the storage version.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jul 28, 2023

/test-all

@tnqn tnqn merged commit ddf4d44 into antrea-io:main Jul 28, 2023
46 of 55 checks passed
@hongliangl hongliangl deleted the 20230628-api-upgrade-tool branch July 31, 2023 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Antrea 2.0]CLI for API Deprecation/Removal
5 participants