Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble: bump proto deps for cmd/protoc-gen-go-grpc/v1.1.0 #13405

Closed
wants to merge 5 commits into from

Conversation

rolinh
Copy link
Member

@rolinh rolinh commented Oct 5, 2020

No description provided.

@rolinh rolinh added release-note/misc This PR makes changes that have no direct user impact. sig/hubble Impacts hubble server or relay labels Oct 5, 2020
@rolinh rolinh requested review from a team as code owners October 5, 2020 13:03
@rolinh rolinh requested a review from a team October 5, 2020 13:03
@rolinh rolinh marked this pull request as draft October 5, 2020 15:53
@rolinh
Copy link
Member Author

rolinh commented Oct 5, 2020

Converted to draft as I have to understand and fix this build failure:

CGO_ENABLED=0 go build  -mod=vendor -ldflags ' -X "github.com/cilium/cilium/pkg/version.ciliumVersion=1.8.90 785a5d6 2020-10-05T13:03:51+00:00" -s -w -X "github.com/cilium/cilium/pkg/envoy.RequiredEnvoyVersionSHA=1177896bebde79915fe5f9092409bf0254084b4e" -X "github.com/cilium/cilium/pkg/datapath/loader.DatapathSHA=6596ac1f04af6806f50131aef2402db4a497151b" ' -tags=osusergo  -o cilium-agent
# github.com/cilium/cilium/api/v1/observer
../api/v1/observer/observer_grpc.pb.go:14:11: undefined: grpc.SupportPackageIsVersion7
# github.com/cilium/cilium/api/v1/peer
../api/v1/peer/peer_grpc.pb.go:14:11: undefined: grpc.SupportPackageIsVersion7
make[1]: Leaving directory '/go/src/github.com/cilium/cilium/daemon'
make[1]: *** [Makefile:15: cilium-agent] Error 2
make: *** [Makefile:129: build-container] Error 2
The command '/bin/sh -c make RACE=$RACE NOSTRIP=$NOSTRIP LOCKDEBUG=$LOCKDEBUG PKG_BUILD=1 V=$V LIBNETWORK_PLUGIN=$LIBNETWORK_PLUGIN     SKIP_DOCS=true DESTDIR=/tmp/install build-container install-container' returned a non-zero code: 2
Makefile.docker:26: recipe for target 'docker-cilium-image' failed
make: *** [docker-cilium-image] Error 2
Error: Process completed with exit code 2.

@@ -14,11 +14,12 @@ go mod init github.com/cilium/hubble/protoc

# latest tag at the time. For some reason `go get foo/bar/baz@vX.Y.Z` doesn't
# work with nested go.mod definitions.
go get google.golang.org/grpc/cmd/protoc-gen-go-grpc@cee815d
# b2c5f4a == tag: cmd/protoc-gen-go-grpc/v1.0.0
go get google.golang.org/grpc/cmd/protoc-gen-go-grpc@b2c5f4a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not using @1.0.0 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label is literally cmd/protoc-gen-go-grpc/v1.0.0, not v1.0.0 (which would point to an older release of grpc-go, the most recent one being 1.32.0). And this format does not work with go get:

$ go get google.golang.org/grpc/cmd/protoc-gen-go-grpc@cmd/protoc-gen-go-grpc/v1.0.0: google.golang.org/grpc/cmd/protoc-gen-go-grpc@cmd/protoc-gen-go-grpc/v1.0.0: invalid version: version "cmd/protoc-gen-go-grpc/v1.0.0" invalid: disallowed version string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label is literally cmd/protoc-gen-go-grpc/v1.0.0

D:

go build google.golang.org/grpc/cmd/protoc-gen-go-grpc

# protoc-gen-go-json doesn't have releases, this is the latest commit at the time
go get github.com/mitchellh/protoc-gen-go-json@8fbb6f3
go get github.com/mitchellh/protoc-gen-go-json@364b693
go build github.com/mitchellh/protoc-gen-go-json

go get github.com/golang/protobuf@v1.4.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to double check this version is compatible with 3.13.0

@rolinh rolinh force-pushed the pr/rolinh/hubble-proto-gen-go-grpc-v1.0.0 branch from 7340429 to cc24518 Compare October 6, 2020 13:00
@@ -6,6 +6,7 @@

PROTOC ?= protoc

EXTERNAL_PROTO_SOURCE := /usr/local/include
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this path should be auto-inlcuded, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to be the case when running as a regular user.

@rolinh rolinh force-pushed the pr/rolinh/hubble-proto-gen-go-grpc-v1.0.0 branch from cc24518 to 18d5266 Compare October 7, 2020 07:47
@rolinh rolinh added the dont-merge/waiting-for-upstream Only merge once upstream library or kernel features have landed label Oct 7, 2020
@rolinh rolinh marked this pull request as ready for review October 7, 2020 07:52
@rolinh
Copy link
Member Author

rolinh commented Oct 7, 2020

# github.com/cilium/cilium/api/v1/peer
../api/v1/peer/peer_grpc.pb.go:14:11: undefined: grpc.SupportPackageIsVersion7
# github.com/cilium/cilium/api/v1/observer
../api/v1/observer/observer_grpc.pb.go:14:11: undefined: grpc.SupportPackageIsVersion7

This compilation error actually relates to the problem reported in #12767:

Note: I wanted to bump grpc-go to v1.31.0 but I can't due to a breaking change introduced in v1.30.0 ("Remove the deprecated naming package.") that in turn breaks go.etcd.io/etcd (see etcd-io/etcd#12124 for details).

There's a PR opened for etcd to upgrade grpc-go (etcd-io/etcd#12155) but it's stalled due to other breaking changes introduced in a minor version of grpc-go which breaks the API.

As of today, etcd-io/etcd#12124 is still open, which in turn means we can't upgrade google.golang.org/grpc to a version greater than v1.29.1. This is a problem because we need v1.32.0 or greater for the grpc.SupportPackageIsVersion7 constant to be defined (and thus avoid the above compilation error).

rolinh added a commit to cilium/hubble that referenced this pull request Oct 8, 2020
With a dial timeout, the error on failed dialing attempt always ends up
being `context deadline exceeded` which helps very little in understand
what the root cause might be.

With `grpc.FailOnNonTempDialError(true)`, we get more descriptive errors
"in some cases" (ie errors that are estimated to be non-temporary).
Example:

    connection error: desc = "transport: error while dialing: dial tcp [::1]:4245: connect: connection refused"

However, I tested to dial using TLS on a non-TLS target and in this
case, the option above does not help.

As of grpc-go v1.30.0, there is another option that is available:
`WithReturnConnectionError()`. With this option, we'd finally get
informative errors such as:

    connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for goodname.example.com, not badname.example.com"

However, we are currently stuck with grpc-go v1.29.1 due to breaking
changes introduced by this version and that prevent Cilium (and Hubble
CLI) to use a more recent version of grpc-go. For more details about
this issue, see[0].

[0]: cilium/cilium#13405 (comment)

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
rolinh added a commit that referenced this pull request Oct 9, 2020
To connect to peers via gRPC, Hubble Relay uses `grpc.DialContext()`
with a dial timeout. When a connection attempt to a peer fails, the
following error message is returned: `context deadline exceeded`.
This provides no value when trying to debug the issue. Is the peer
unreachable? Is this a TLS configuration issue? etc.

By using the dial option `grpc.FailOnNonTempDialError(true)`, we get
better error messages on non-temporary dial errors, such as:

    connection error: desc = "transport: error while dialing: dial tcp 172.18.0.4:4244: connect: connection refused"

Using the option `grpc.WithReturnConnectionError()`, we would get a
message that contains both the last connection error that occurred and
the `context.DeadlineExceeded` error. This greatly improves the
situation for errors that are deemed transient. Example:

    context deadline exceeded: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for *.hubble-relay.cilium.io, not localhost"

However, this is only available starting with grpc-go v1.30.0 and due to
the issue described here[0], we are currently stuck with v1.29.1. Thus,
this commit adds the option but comments it with a TODO note so that it
can be caught up and uncommented when upgrading grpc-go becomes
possible.

[0]: #13405 (comment)

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
gandro pushed a commit that referenced this pull request Oct 12, 2020
To connect to peers via gRPC, Hubble Relay uses `grpc.DialContext()`
with a dial timeout. When a connection attempt to a peer fails, the
following error message is returned: `context deadline exceeded`.
This provides no value when trying to debug the issue. Is the peer
unreachable? Is this a TLS configuration issue? etc.

By using the dial option `grpc.FailOnNonTempDialError(true)`, we get
better error messages on non-temporary dial errors, such as:

    connection error: desc = "transport: error while dialing: dial tcp 172.18.0.4:4244: connect: connection refused"

Using the option `grpc.WithReturnConnectionError()`, we would get a
message that contains both the last connection error that occurred and
the `context.DeadlineExceeded` error. This greatly improves the
situation for errors that are deemed transient. Example:

    context deadline exceeded: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for *.hubble-relay.cilium.io, not localhost"

However, this is only available starting with grpc-go v1.30.0 and due to
the issue described here[0], we are currently stuck with v1.29.1. Thus,
this commit adds the option but comments it with a TODO note so that it
can be caught up and uncommented when upgrading grpc-go becomes
possible.

[0]: #13405 (comment)

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
qmonnet pushed a commit that referenced this pull request Oct 12, 2020
[ upstream commit e2e3661 ]

To connect to peers via gRPC, Hubble Relay uses `grpc.DialContext()`
with a dial timeout. When a connection attempt to a peer fails, the
following error message is returned: `context deadline exceeded`.
This provides no value when trying to debug the issue. Is the peer
unreachable? Is this a TLS configuration issue? etc.

By using the dial option `grpc.FailOnNonTempDialError(true)`, we get
better error messages on non-temporary dial errors, such as:

    connection error: desc = "transport: error while dialing: dial tcp 172.18.0.4:4244: connect: connection refused"

Using the option `grpc.WithReturnConnectionError()`, we would get a
message that contains both the last connection error that occurred and
the `context.DeadlineExceeded` error. This greatly improves the
situation for errors that are deemed transient. Example:

    context deadline exceeded: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for *.hubble-relay.cilium.io, not localhost"

However, this is only available starting with grpc-go v1.30.0 and due to
the issue described here[0], we are currently stuck with v1.29.1. Thus,
this commit adds the option but comments it with a TODO note so that it
can be caught up and uncommented when upgrading grpc-go becomes
possible.

[0]: #13405 (comment)

[ Backport note:
    Minor conflict when applying this patch. Only add
    grpc.FailOnNonTempDialError() and related message, do not remove
    grpc.WithInsecure() or add TLSConfig: to pool.GRPCClientConnBuilder,
    as those were changed in v1.9. ]

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
tklauser pushed a commit that referenced this pull request Oct 12, 2020
[ upstream commit e2e3661 ]

To connect to peers via gRPC, Hubble Relay uses `grpc.DialContext()`
with a dial timeout. When a connection attempt to a peer fails, the
following error message is returned: `context deadline exceeded`.
This provides no value when trying to debug the issue. Is the peer
unreachable? Is this a TLS configuration issue? etc.

By using the dial option `grpc.FailOnNonTempDialError(true)`, we get
better error messages on non-temporary dial errors, such as:

    connection error: desc = "transport: error while dialing: dial tcp 172.18.0.4:4244: connect: connection refused"

Using the option `grpc.WithReturnConnectionError()`, we would get a
message that contains both the last connection error that occurred and
the `context.DeadlineExceeded` error. This greatly improves the
situation for errors that are deemed transient. Example:

    context deadline exceeded: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for *.hubble-relay.cilium.io, not localhost"

However, this is only available starting with grpc-go v1.30.0 and due to
the issue described here[0], we are currently stuck with v1.29.1. Thus,
this commit adds the option but comments it with a TODO note so that it
can be caught up and uncommented when upgrading grpc-go becomes
possible.

[0]: #13405 (comment)

[ Backport note:
    Minor conflict when applying this patch. Only add
    grpc.FailOnNonTempDialError() and related message, do not remove
    grpc.WithInsecure() or add TLSConfig: to pool.GRPCClientConnBuilder,
    as those were changed in v1.9. ]

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
@aanm aanm added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Oct 13, 2020
@rolinh rolinh marked this pull request as draft October 26, 2020 09:35
@rolinh rolinh force-pushed the pr/rolinh/hubble-proto-gen-go-grpc-v1.0.0 branch from 18d5266 to 5cb3e27 Compare October 26, 2020 09:47
@rolinh rolinh changed the title hubble: bump proto deps for cmd/protoc-gen-go-grpc/v1.0.0 hubble: bump proto deps for cmd/protoc-gen-go-grpc/v1.0.1 Oct 26, 2020
@aanm aanm removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Nov 10, 2020
@rolinh rolinh force-pushed the pr/rolinh/hubble-proto-gen-go-grpc-v1.0.0 branch from 5cb3e27 to 267ab44 Compare December 7, 2020 12:14
@stale
Copy link

stale bot commented Jan 10, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 10, 2021
@rolinh rolinh removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 10, 2021
aanm pushed a commit that referenced this pull request Jan 30, 2021
We cannot update the grpc beyond 1.29.1 until we bump etcd to 3.04, see
#13405 (comment),
etcd-io/etcd#12124 and
#14787 (comment) for
more information.

We also ignore github.com/miekg/dns because we use our own fork of it
via replace and it seems the long-term plan is to replace the DNS proxy
with Envoy, see
#14790 (comment)

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
@rolinh rolinh force-pushed the pr/rolinh/hubble-proto-gen-go-grpc-v1.0.0 branch 2 times, most recently from eb859cd to 626ce23 Compare February 26, 2021 09:10
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
When run as root, the owner of newly generated files ends up being root.
Run the container as a regular user to fix this issue.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
@rolinh rolinh force-pushed the pr/rolinh/hubble-proto-gen-go-grpc-v1.0.0 branch from 626ce23 to bdd32e7 Compare February 26, 2021 09:19
@rolinh rolinh changed the base branch from master to pr/rolinh/vendor-grpc-etcd February 26, 2021 09:20
lyveng pushed a commit to lyveng/cilium that referenced this pull request Mar 4, 2021
We cannot update the grpc beyond 1.29.1 until we bump etcd to 3.04, see
cilium#13405 (comment),
etcd-io/etcd#12124 and
cilium#14787 (comment) for
more information.

We also ignore github.com/miekg/dns because we use our own fork of it
via replace and it seems the long-term plan is to replace the DNS proxy
with Envoy, see
cilium#14790 (comment)

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
@rolinh rolinh force-pushed the pr/rolinh/vendor-grpc-etcd branch 3 times, most recently from 05d6163 to da41ef4 Compare May 19, 2021 13:05
@rolinh rolinh force-pushed the pr/rolinh/vendor-grpc-etcd branch from da41ef4 to f087433 Compare May 20, 2021 07:23
@rolinh rolinh force-pushed the pr/rolinh/vendor-grpc-etcd branch from f087433 to 0e2fc97 Compare June 1, 2021 11:57
@rolinh rolinh force-pushed the pr/rolinh/vendor-grpc-etcd branch 3 times, most recently from a7a1c39 to b7bdb41 Compare June 21, 2021 12:41
@rolinh rolinh force-pushed the pr/rolinh/vendor-grpc-etcd branch 3 times, most recently from c6282f5 to cecf40d Compare July 1, 2021 09:29
@rolinh rolinh force-pushed the pr/rolinh/vendor-grpc-etcd branch from cecf40d to 2c469a7 Compare July 13, 2021 19:04
Base automatically changed from pr/rolinh/vendor-grpc-etcd to master July 16, 2021 11:36
@rolinh
Copy link
Member Author

rolinh commented Jul 16, 2021

Superseded by #16915, closing.

@rolinh rolinh closed this Jul 16, 2021
@rolinh rolinh deleted the pr/rolinh/hubble-proto-gen-go-grpc-v1.0.0 branch July 16, 2021 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/waiting-for-upstream Only merge once upstream library or kernel features have landed release-note/misc This PR makes changes that have no direct user impact. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants