Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update grpc runtime library #8929

Merged
merged 14 commits into from
Apr 18, 2022
Merged

chore: update grpc runtime library #8929

merged 14 commits into from
Apr 18, 2022

Conversation

leoluz
Copy link
Collaborator

@leoluz leoluz commented Mar 29, 2022

Signed-off-by: Leonardo Luz Almeida leonardo_almeida@intuit.com

TL;DR

This PR updates the gRPC runtime library (to the current latest version v1.45.0) used in ArgoCD.

Analysis

In ArgoCD we have a mix of proto3 and proto2 used for gRPC messages:

  • proto2 is used by the Application gRPC APIs and uses gogo-protobuf annotations.
  • proto3 is used for all other operations provided by the API server.

With this scenario, updating the grpc runtime library works for most of UI/CLI operations but fail in Application ones with the error:

09:13:53  api-server | panic: protobuf tag not enough fields in Empty.state: 
09:13:53  api-server | goroutine 403 [running]:
09:13:53  api-server | github.com/gogo/protobuf/proto.(*unmarshalInfo).computeUnmarshalInfo(0xc000fa2500)
09:13:53  api-server | 	/Users/lalmeida1/dev/go/src/github.com/argoproj/argo-cd/vendor/github.com/gogo/protobuf/proto/table_unmarshal.go:341 +0x138a

Error explanation details can be found here. As this is coming from the gogo marshaling code, one other possibility would be to remove gogo plugin from the project and use the standard provided by google.protobuf. Unfortunately this is still not possible because of Kubernetes proto files incompatibility issues described here.

Solution

With this scenario, the only possibility to update gRPC runtime library in ArgoCD is by removing all gogo annotations from the application.proto so we don't need to rely on generated Marshallers code and avoid the incompatibility issue. However we still need to use gogo plugin for go stubs generation to maintain the compatibility with Kubernetes protos.

This PR is an attempt to implement the solution above.

Relates to #4972

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).

leoluz and others added 4 commits March 29, 2022 16:56
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
@leoluz leoluz marked this pull request as ready for review April 1, 2022 19:32
@leoluz leoluz changed the title chore: update grpc POC chore: update grpc runtime library Apr 1, 2022
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
@codecov
Copy link

codecov bot commented Apr 1, 2022

Codecov Report

Merging #8929 (df0e4b5) into master (0e92705) will increase coverage by 1.48%.
The diff coverage is 13.90%.

@@            Coverage Diff             @@
##           master    #8929      +/-   ##
==========================================
+ Coverage   43.40%   44.88%   +1.48%     
==========================================
  Files         186      212      +26     
  Lines       23373    25310    +1937     
==========================================
+ Hits        10145    11361    +1216     
- Misses      11779    12341     +562     
- Partials     1449     1608     +159     
Impacted Files Coverage Δ
cmd/argocd/commands/app.go 9.15% <0.00%> (-0.03%) ⬇️
cmd/argocd/commands/app_actions.go 0.00% <0.00%> (ø)
util/grpc/grpc.go 0.00% <0.00%> (ø)
server/application/application.go 30.13% <21.50%> (-1.24%) ⬇️
server/server.go 55.57% <100.00%> (+0.16%) ⬆️
pkg/apis/application/v1alpha1/hack.go 0.00% <0.00%> (ø)
...licationset/generators/generator_spec_processor.go 57.57% <0.00%> (ø)
applicationset/utils/utils.go 77.10% <0.00%> (ø)
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e92705...df0e4b5. Read the comment docs.

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>
Copy link
Collaborator

@alexmt alexmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Code changes look good. Tested CLI locally as well and could not find any bugs. Lets merge it !

@leoluz leoluz merged commit bcc69bd into argoproj:master Apr 18, 2022
wojtekidd pushed a commit to wojtekidd/argo-cd that referenced this pull request Apr 25, 2022
* chore: update grpc POC

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* fix: register gogo codec (#1)

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>

* Remove gogo annotations from application.proto

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix unit tests

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix e2e test

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix lint

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix lint

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix unit-test

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix LogEntry.Last required field not populated

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix LogEntry required fields

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix get log content

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix app actions list

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix ApplicationPodLogsQuery

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

* Fix RunResourceAction

Signed-off-by: Leonardo Luz Almeida <leonardo_almeida@intuit.com>

Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Signed-off-by: wojtekidd <wojtek.cichon@protonmail.com>
szpnygo added a commit to szpnygo/argo-cd that referenced this pull request May 11, 2022
szpnygo added a commit to szpnygo/argo-cd that referenced this pull request May 11, 2022
alexmt pushed a commit that referenced this pull request May 30, 2022
In the #8929, the project parameter had changed to projects.

https://github.com/argoproj/argo-cd/blob/5f5d7aa59b4c818192b178f260eed8c0ac0b3669/server/application/application.proto#L23
Signed-off-by: neosu <neo@neobaran.com>
alexmt pushed a commit that referenced this pull request May 31, 2022
In the #8929, the project parameter had changed to projects.

https://github.com/argoproj/argo-cd/blob/5f5d7aa59b4c818192b178f260eed8c0ac0b3669/server/application/application.proto#L23
Signed-off-by: neosu <neo@neobaran.com>
@@ -44,13 +43,13 @@ type ApplicationQuery struct {
// forces application reconciliation if set to true
Refresh *string `protobuf:"bytes,2,opt,name=refresh" json:"refresh,omitempty"`
// the project names to restrict returned list applications
Projects []string `protobuf:"bytes,3,rep,name=project" json:"project,omitempty"`
Projects []string `protobuf:"bytes,3,rep,name=projects" json:"projects,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this change deleted all of our apps since we used project=XX. The filter on project was not there and resulted in all our apps being listed and deleted. This was not in the changelog.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victorboissiere would you mind opening an issue requesting that the upgrade guide be changed to include a note about this?

We probably also need to establish a stricter policy about API backwards-compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also need to establish a stricter policy about API backwards-compatibility.

I think we already promise backwards compatibility for the API. And I think this one just slipped through the cracks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is auto-generated. The breaking change originated here. This was probably a misunderstanding about how the gogoproto.customname worked. Unfortunately this wasn't caught by our automated tests on that occasion. I agree with @jannfis that a stricter policy isn't the answer to this incident. Our API is exposed at /api/v1 and that shouldn't break. What we could do to avoid this type of problem in the future is improving our automated tests. Probably having some sort of contract testing would be beneficial.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victorboissiere I am sorry about the incident that this change caused. To avoid this type of issue to happen again in the future, I created the following proposal: #12589

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@victorboissiere what was your client? I'm investigating whether this is a problem with the CLI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the issue! it was not the CLI, we’ve built an internal tool that cleanup from time to time some project applications. The filter stopped working and we’ve lost ~8000 pods.
Thanks for the effort to avoid this in the future 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, thanks! PR to fix the issue: #12594

I had written up some warnings about the CLI, but after investigation, I don't think the CLI is impacted. Only the JSON API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants