Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subnetCount to pod_ip_metrics.go #205

Closed
wants to merge 0 commits into from

Conversation

yiningou
Copy link
Contributor

No description provided.

@yiningou
Copy link
Contributor Author

Add ipv4 & ipv6 subnet Count and Utilization metrics to pod_ip_metrics.go
Please review my codes. Thanks!
/assign jingyuanliang
/assign MrHohn

pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
@google-oss-prow google-oss-prow bot added size/XXL and removed size/L labels Mar 20, 2023
@yiningou yiningou changed the title Add subnetCount and utilization to pod_ip_metrics.go Add subnetCount to pod_ip_metrics.go Mar 20, 2023
Copy link
Collaborator

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Yining, thanks for the PR! The approach generally looks good to me, leaving a few comments on the details.

@@ -37,6 +39,7 @@ const (
ipv6
dual

gkeCNIConfigDir = "/etc/cni/net.d/10-gke-ptp.conflist"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest follow the same pattern as what we did in init container and derive the actual CNI config name from CNI_SPEC_NAME:

output_file="/host/etc/cni/net.d/${CNI_SPEC_NAME}"

In that way we have a lower chance of hitting any mismatch problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is tricky. How will we handle the Calico case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - with Calico there are a few more things to consider:

  • CNI file may not be available on netd startup.
  • Need to pipe through the Calico CNI name.
  • Parsing is a bit different as Calico puts { "subnet": "usePodCidr" } instead of a plain CIDR.

Thought about parsing the CNI template file instead but that may not work as the template file will be removed by Calico pod promptly.

It may be fine to start without supporting this case though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNI file may not be available on netd startup.

Not really. install-cni.sh waits for calico's conflist before moving ahead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, Calico's regular conf:

{
   "name":"k8s-pod-network",
   "cniVersion":"0.3.1",
   "plugins":[
      {
         "type":"calico",
         "mtu":1460,
         "log_level":"info",
         "log_file_path":"/var/log/calico/cni/cni.log",
         "datastore_type":"kubernetes",
         "nodename":"gke-calico-netd-default-pool-8ce288fc-wb07",
         "ipam":{
            "type":"host-local",
            "subnet":"usePodCidr"
         },
         "policy":{
            "type":"k8s"
         },
         "kubernetes":{
            "kubeconfig":"/etc/cni/net.d/calico-kubeconfig"
         }
      },
      {
         "type":"portmap",
         "capabilities":{
            "portMappings":true
         },
         "snat":true
      },
      {
         "type":"bandwidth",
         "capabilities":{
            "bandwidth":true
         }
      }
   ]
}

Direct path conf:

{
   "name":"gke-pod-network",
   "cniVersion":"0.3.1",
   "plugins":[
      {
         "type":"calico",
         "mtu":1460,
         "log_level":"warning",
         "log_file_path":"/var/log/calico/cni/cni.log",
         "datastore_type":"kubernetes",
         "nodename":"gke-calico-netd-v6-default-pool-fc507ad4-vmx2",
         "ipam":{
            "type":"host-local",
            "ranges":[
               [
                  {
                     "subnet":"usePodCidr"
                  }
               ],
               [
                  {
                     "subnet":"2600:2d00:4005:c4e0:a80:3a:0:0/112"
                  }
               ]
            ],
            "routes":[
               {
                  "dst":"0.0.0.0/0"
               },
               {
                  "dst":"2001:4860:8040::/42"
               }
            ]
         },
         "policy":{
            "type":"k8s"
         },
         "kubernetes":{
            "kubeconfig":"/etc/cni/net.d/calico-kubeconfig"
         }
      },
      {
         "type":"portmap",
         "capabilities":{
            "portMappings":true
         },
         "snat":true
      },
      {
         "type":"bandwidth",
         "capabilities":{
            "bandwidth":true
         }
      }
   ]
}

/etc/cni/net.d/10-calico.conflist

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the complexity aroundusePodCidr I'm not going to block this PR on calico support - that may come afterward if needed.

pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics_test.go Outdated Show resolved Hide resolved
Copy link
Member

@jingyuanliang jingyuanliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review tests because they may be changed based on how other parts change.

@@ -37,6 +39,7 @@ const (
ipv6
dual

gkeCNIConfigDir = "/etc/cni/net.d/10-gke-ptp.conflist"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is tricky. How will we handle the Calico case?

pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
@yiningou yiningou force-pushed the master branch 2 times, most recently from ca3b15d to 212cd27 Compare March 24, 2023 18:10
@yiningou yiningou force-pushed the master branch 2 times, most recently from 4ba3150 to 6380e00 Compare March 28, 2023 18:25
Copy link
Collaborator

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yining, adding second around of comments - this looks good overall.

pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
pkg/metrics/collector/pod_ip_metrics.go Outdated Show resolved Hide resolved
@yiningou yiningou force-pushed the master branch 3 times, most recently from d2994e7 to f4de66d Compare April 3, 2023 17:58
Copy link
Collaborator

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yining, two minor comments. Could you fix the commit message to remove the redundant message from the fix-ups?

return nil
}

// hostsPerNetwork returns the number of available hosts in a subnet.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should match the method name - // countIPsFromRange returns ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now thinking, why are we doing it by parsing the CNI config file from the beginning?

I previously thought, by doing this we can probably handle more different scenarios where the CNI is configured by some different party or in a different layout, but in the current implementation, what it can handle is pretty limited to the CNI config written by install-cni.sh using the default cni_spec_template (which is even not a part of this repo), and it currently just populates with .spec.podCIDR.

Given this, what about repeating what install-cni.sh does here and read .spec.podCIDR here then export the range size? This will happen to work well with calico's config too. By the way, is the control plane currently populating podCIDRs correctly for IPv6 and what about directpath?

I apologize for not addressing your comment earlier, as I was focusing on fixing the above-mentioned comments. I'd be more than happy to discuss this issue further and explore possible solutions together. How about we schedule a quick meeting or have a discussion session?

if cniSpecName == "" {
glog.Errorf("CNI_SPEC_NAME environment variable is not set")
} else {
subnetFile := "/host/etc/cni/net.d/" + cniSpecName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make "/host/etc/cni/net.d/" a constant.

@@ -91,18 +94,31 @@ var (
"Indicates the IP reuse duration in millisecond for all IPs.",
nil, nil,
)
podIPMetricsWatcherSetup = false
assignedIPv4AddrCountDesc = prometheus.NewDesc(
"IPv4_assigned_count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keep this all lowercase for consistency

// Number of hosts is calculated with the formula:
// IPv4: 2^x – 2, not consider network and broadcast address
// IPv6: 2^x - 1, not consider network address
// where x is the number of host bits in the subnet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consider full RangeStart/End support, instead of just the default values?

}
max := uint64(1) << uint(bits-ones)
// Don't use the network's ".0" address,
if max == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we'll never run into this case?

return 0, fmt.Errorf("subnet includes only the network address")
}
max--
if strings.Contains(subnet.IP.String(), ".") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition looks hacky

return nil
}

func (c *podIPMetricsCollector) Update(ch chan<- prometheus.Metric) error {
if !podIPMetricsWatcherSetup {
if !podIPMetricsWatcherIsInitialized {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract all these into a separate function?

nil, nil,
)
podIPMetricsWatcherIsInitialized = false
assignedIPExported = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to put these inside struct podIPMetricsCollector.

@jingyuanliang
Copy link
Member

I'm now thinking, why are we doing it by parsing the CNI config file from the beginning?

I previously thought, by doing this we can probably handle more different scenarios where the CNI is configured by some different party or in a different layout, but in the current implementation, what it can handle is pretty limited to the CNI config written by install-cni.sh using the default cni_spec_template (which is even not a part of this repo), and it currently just populates with .spec.podCIDR.

Given this, what about repeating what install-cni.sh does here and read .spec.podCIDR here then export the range size? This will happen to work well with calico's config too. By the way, is the control plane currently populating podCIDRs correctly for IPv6 and what about directpath?

@MrHohn
Copy link
Collaborator

MrHohn commented Apr 4, 2023

Yining, Jingyuan and myself did a quick sync on this and agreed going with .spec.podCIDR and .spec.podCIDRs seems like a cleaner and more consistent way to go with less dependencies to be added.

Regarding directpath - since it doesn't have an angle into IP allocation it should be fine to put it aside (it won't show up in podCIDR).

@yiningou yiningou force-pushed the master branch 3 times, most recently from f44e8b3 to 74f9e80 Compare April 17, 2023 07:39
@yiningou yiningou force-pushed the master branch 7 times, most recently from 48a9e2c to aca7a91 Compare April 26, 2023 20:23
Copy link
Collaborator

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yining! One minor comment and LGTM.
/approve

continue
}
allCIDRs = append(allCIDRs, podCIDR)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like:

	for _, podCIDR := range node.Spec.PodCIDRs {
		allCIDRs = append(allCIDRs, podCIDR)
	}
        if len(allCIDRs) == 0 {
                 if node.Spec.PodCIDR != "" {
		        allCIDRs = append(allCIDRs, node.Spec.PodCIDR)
                 }
        }

@google-oss-prow google-oss-prow bot added the lgtm label Apr 26, 2023
@MrHohn
Copy link
Collaborator

MrHohn commented Apr 26, 2023

/hold

Copy link
Collaborator

@MrHohn MrHohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will keep the hold in case Jingyuan wants to take a final look.
/lgtm

@google-oss-prow
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@google-oss-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MrHohn, yiningou

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants