Skip to content

Commit

Permalink
cilium, gops: remap to fixed port to avoid collision with nodeport range
Browse files Browse the repository at this point in the history
Lee reported that kube-proxy log had a warning that its bind protection
couldn't bind a specific port in the nodeport range. Turns out gops was
using this particular port already through it's auto-binding (127.0.0.1:0).
Meaning that in case gops collides with a NodePort service, we might
not be able to pull gops data from that port since either kube-proxy or
kube-proxt free variant will redirect us to the actual service instead.

Given this is rather unpredictable wrt which port the agent will bind for
gops, remap it to a fixed default port and add a user configurable knob
that allows to use a different one if necessary. Given the agent, operator,
clustermesh-apiserver and hubble-relay all start the gops listener, add
the --gops-port flag to each of them. The CNI does not have gops enabled
by default but only in debug mode hence no changes there for now given
it's unlikely being used this way in production.

Fixes: cilium#14218
Reported-by: Lee Hu via Slack
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  • Loading branch information
borkmann committed Dec 10, 2020
1 parent 0a444b7 commit 7757d31
Show file tree
Hide file tree
Showing 16 changed files with 79 additions and 15 deletions.
1 change: 1 addition & 0 deletions Documentation/cmdref/cilium-agent.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Documentation/cmdref/cilium-operator-aws.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Documentation/cmdref/cilium-operator-azure.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Documentation/cmdref/cilium-operator-generic.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Documentation/cmdref/cilium-operator.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 7 additions & 3 deletions Documentation/operations/system_requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -318,16 +318,20 @@ ICMP 8/0 egress ``worker-sg`` (self) health checks

The following ports should also be available on each node:

======================== ==========================================
======================== ===========================================================
Port Range / Protocol Description
======================== ==========================================
======================== ===========================================================
4240/tcp cluster health checks (``cilium-health``)
4244/tcp Hubble server
4245/tcp Hubble Relay
6942/tcp operator Prometheus metrics
9090/tcp cilium-agent Prometheus metrics
9876/tcp cilium-agent health status API
======================== ==========================================
9890/tcp cilium-agent gops server (listening on 127.0.0.1)
9891/tcp operator gops server (listening on 127.0.0.1)
9892/tcp clustermesh-apiserver gops server (listening on 127.0.0.1)
9893/tcp Hubble Relay gops server (listening on 127.0.0.1)
======================== ===========================================================

.. _admin_mount_bpffs:

Expand Down
5 changes: 5 additions & 0 deletions Documentation/operations/upgrade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,11 @@ Deprecated Options
the Helm option ``bpf.hostRouting=true`` can be used. If the underlying kernel
does not implement the needed BPF features, then the agent will fallback and rely
on host routing automatically.
* For the agent, operator, clustermesh-apiserver and hubble-relay, the gops listener
has been mapped to fixed ports instead of port auto-binding. Meaning, the agent's
gops server will listen on 9890, the operator on 9891, the clustermesh-apiserver on
9892, and hubble-relay on port 9893 by default. If needed, the port can also be
remapped for each through using the ``--gops-port`` flag.

.. _1.9_helm_options:

Expand Down
1 change: 1 addition & 0 deletions Documentation/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,7 @@ gocheck
golang
golangci
golangci-lint
gops
grafana
grep
hairpinned
Expand Down
18 changes: 14 additions & 4 deletions clustermesh-apiserver/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,17 @@ var (
Use: "clustermesh-apiserver",
Short: "Run the ClusterMesh apiserver",
Run: func(cmd *cobra.Command, args []string) {
// Open socket for using gops to get stacktraces of the agent.
addr := fmt.Sprintf("127.0.0.1:%d", viper.GetInt(option.GopsPort))
addrField := logrus.Fields{"address": addr}
if err := gops.Listen(gops.Options{
Addr: addr,
ReuseSocketAddrAndPort: true,
}); err != nil {
log.WithError(err).WithFields(addrField).Fatal("Cannot start gops server")
}
log.WithFields(addrField).Info("Started gops server")

runServer(cmd)
},
}
Expand Down Expand Up @@ -160,14 +171,13 @@ func readMockFile(path string) error {
}

func runApiserver() error {
if err := gops.Listen(gops.Options{}); err != nil {
return fmt.Errorf("unable to start gops: %s", err)
}

flags := rootCmd.Flags()
flags.BoolP(option.DebugArg, "D", false, "Enable debugging mode")
option.BindEnv(option.DebugArg)

flags.Int(option.GopsPort, defaults.GopsPortApiserver, "Port for gops server to listen on")
option.BindEnv(option.GopsPort)

flags.Duration(option.CRDWaitTimeout, 5*time.Minute, "Cilium will exit if CRDs are not available within this duration upon startup")
option.BindEnv(option.CRDWaitTimeout)

Expand Down
14 changes: 11 additions & 3 deletions daemon/cmd/daemon_main.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,15 @@ var (
}

// Open socket for using gops to get stacktraces of the agent.
if err := gops.Listen(gops.Options{}); err != nil {
fmt.Fprintf(os.Stderr, "unable to start gops: %s", err)
os.Exit(1)
addr := fmt.Sprintf("127.0.0.1:%d", viper.GetInt(option.GopsPort))
addrField := logrus.Fields{"address": addr}
if err := gops.Listen(gops.Options{
Addr: addr,
ReuseSocketAddrAndPort: true,
}); err != nil {
log.WithError(err).WithFields(addrField).Fatal("Cannot start gops server")
}
log.WithFields(addrField).Info("Started gops server")

bootstrapStats.earlyInit.Start()
initEnv(cmd)
Expand Down Expand Up @@ -777,6 +782,9 @@ func init() {
flags.MarkHidden(option.CMDRef)
option.BindEnv(option.CMDRef)

flags.Int(option.GopsPort, defaults.GopsPortAgent, "Port for gops server to listen on")
option.BindEnv(option.GopsPort)

flags.Int(option.ToFQDNsMinTTL, 0, fmt.Sprintf("The minimum time, in seconds, to use DNS data for toFQDNs policies. (default %d )", defaults.ToFQDNsMinTTL))
option.BindEnv(option.ToFQDNsMinTTL)

Expand Down
11 changes: 10 additions & 1 deletion hubble-relay/cmd/serve/serve.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import (
const (
keyPprof = "pprof"
keyGops = "gops"
keyGopsPort = "gops-port"
keyDialTimeout = "dial-timeout"
keyRetryTimeout = "retry-timeout"
keyListenAddress = "listen-address"
Expand Down Expand Up @@ -67,6 +68,10 @@ func New(vp *viper.Viper) *cobra.Command {
flags.Bool(
keyGops, true, "Run gops agent",
)
flags.Int(
keyGopsPort,
defaults.GopsPort,
"Port for gops server to listen on")
flags.Duration(
keyDialTimeout,
defaults.DialTimeout,
Expand Down Expand Up @@ -184,7 +189,11 @@ func runServe(vp *viper.Viper) error {
}
gopsEnabled := vp.GetBool(keyGops)
if gopsEnabled {
if err := agent.Listen(agent.Options{}); err != nil {
addr := fmt.Sprintf("127.0.0.1:%d", vp.GetInt(keyGopsPort))
if err := agent.Listen(agent.Options{
Addr: addr,
ReuseSocketAddrAndPort: true,
}); err != nil {
return fmt.Errorf("failed to start gops agent: %v", err)
}
}
Expand Down
3 changes: 3 additions & 0 deletions operator/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,9 @@ func init() {
flags.MarkHidden(option.CMDRef)
option.BindEnv(option.CMDRef)

flags.Int(option.GopsPort, defaults.GopsPortOperator, "Port for gops server to listen on")
option.BindEnv(option.GopsPort)

flags.Duration(option.K8sHeartbeatTimeout, 30*time.Second, "Configures the timeout for api-server heartbeat, set to 0 to disable")
option.BindEnv(option.K8sHeartbeatTimeout)

Expand Down
13 changes: 9 additions & 4 deletions operator/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,16 @@ var (
os.Exit(0)
}

// Open socket for using gops to get stacktraces of the operator.
if err := gops.Listen(gops.Options{}); err != nil {
fmt.Fprintf(os.Stderr, "unable to start gops: %s", err)
os.Exit(1)
// Open socket for using gops to get stacktraces of the agent.
addr := fmt.Sprintf("127.0.0.1:%d", viper.GetInt(option.GopsPort))
addrField := logrus.Fields{"address": addr}
if err := gops.Listen(gops.Options{
Addr: addr,
ReuseSocketAddrAndPort: true,
}); err != nil {
log.WithError(err).WithFields(addrField).Fatal("Cannot start gops server")
}
log.WithFields(addrField).Info("Started gops server")

initEnv()
runOperator()
Expand Down
9 changes: 9 additions & 0 deletions pkg/defaults/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ const (
// AgentHealthPort is the default value for option.AgentHealthPort
AgentHealthPort = 9876

// GopsPortAgent is the default value for option.GopsPort in the agent
GopsPortAgent = 9890

// GopsPortOperator is the default value for option.GopsPort in the operator
GopsPortOperator = 9891

// GopsPortApiserver is the default value for option.GopsPort in the apiserver
GopsPortApiserver = 9892

// IPv6ClusterAllocCIDR is the default value for option.IPv6ClusterAllocCIDR
IPv6ClusterAllocCIDR = IPv6ClusterAllocCIDRBase + "/64"

Expand Down
2 changes: 2 additions & 0 deletions pkg/hubble/relay/defaults/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ const (
// DialTimeout is the timeout that is used when establishing a new
// connection.
DialTimeout = 5 * time.Second
// GopsPort is the default port for gops to listen on.
GopsPort = 9893
// RetryTimeout is the duration to wait between reconnection attempts.
RetryTimeout = 30 * time.Second
// HubbleTarget is the address of the local Hubble instance.
Expand Down
3 changes: 3 additions & 0 deletions pkg/option/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,9 @@ const (
// EnvoyLog sets the path to a separate Envoy log file, if any
EnvoyLog = "envoy-log"

// GopsPort is the TCP port for the gops server.
GopsPort = "gops-port"

// ProxyPrometheusPort specifies the port to serve Cilium host proxy metrics on.
ProxyPrometheusPort = "proxy-prometheus-port"

Expand Down

0 comments on commit 7757d31

Please sign in to comment.