Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ExternalIPPool status #2490

Merged
merged 1 commit into from Aug 20, 2021

Conversation

wenqiq
Copy link
Contributor

@wenqiq wenqiq commented Jul 29, 2021

Implement ExternalIPPool status

Report the number of allocated versus free IPs to the status of the Egress ExternalIPPool.

For example:
kubectl get eip

kubectl get eip 
NAME            TOTAL   USED   AGE
eip-test-ipv4   345     1      20s

kubectl get eip -o yaml

  status:
    usage:
      total: 91
      used: 1
kubectl get eip -o yaml                     
apiVersion: v1
items:
- apiVersion: crd.antrea.io/v1alpha2
  kind: ExternalIPPool
  metadata:
    creationTimestamp: "2021-08-02T17:48:17Z"
    generation: 1
    name: eip-test-ipv4
    resourceVersion: "116090"
    uid: fbc8b2ac-a147-4937-b842-88ce6e261a33
  spec:
    ipRanges:
    - end: 10.10.10.100
      start: 10.10.10.10
    nodeSelector:
      matchExpressions:
      - key: kubernetes.io/hostname
        operator: In
        values:
        - kind-worker2
  status:
    usage:
      total: 91
      used: 1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

For #2454

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial suggestion (and the scope of #2454) was to report usage data (number of allocated IPs vs total number of IPs). It seems that this PR wants to return a list of allocated IPs, which could lead to a significantly large Status. Is this PR trying to address something else besides #2454? It seems that it is missing some context, as there is no explanation of what Preallocated is compared to Allocated.

@wenqiq
Copy link
Contributor Author

wenqiq commented Jul 30, 2021

My initial suggestion (and the scope of #2454) was to report usage data (number of allocated IPs vs total number of IPs). It seems that this PR wants to return a list of allocated IPs, which could lead to a significantly large Status.

I don't know whether it needs to be consistent with IPPool status describing in #2442?

I also think reporting usage data (number of allocated IPs vs total number of IPs) is better in ExternalIPPool status, this avoids a significantly large Status.

It seems that it is missing some context, as there is no explanation of what Preallocated is compared to Allocated.

‘Preallocated’ represents the IP allocated to Egress but not assigned to Node yet.

‘Allocated’ represents the IP allocated to Egress and has been assigned to Node.

If we report the status with a list of allocated IPs.

Is this PR trying to address something else besides #2454?

This PR is based on #2444, I was thinking rebase after merged, however they don't actually have dependencies. I will separate these two commits.

@codecov-commenter
Copy link

codecov-commenter commented Jul 30, 2021

Codecov Report

Merging #2490 (dfe2676) into main (7f90fc0) will decrease coverage by 0.01%.
The diff coverage is 59.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2490      +/-   ##
==========================================
- Coverage   42.00%   41.98%   -0.02%     
==========================================
  Files         153      154       +1     
  Lines       18565    18658      +93     
==========================================
+ Hits         7798     7834      +36     
- Misses      10067    10118      +51     
- Partials      700      706       +6     
Flag Coverage Δ
unit-tests 41.98% <59.18%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/egress/ipallocator/allocator.go 80.41% <0.00%> (-2.57%) ⬇️
pkg/controller/egress/controller.go 84.12% <63.04%> (-3.08%) ⬇️
pkg/ovs/openflow/ofctrl_packetout.go 66.96% <0.00%> (-0.61%) ⬇️
pkg/ovs/openflow/ofctrl_builder.go 18.34% <0.00%> (-0.27%) ⬇️
pkg/ovs/openflow/ofctrl_action.go 11.25% <0.00%> (-0.20%) ⬇️
pkg/agent/controller/networkpolicy/packetin.go 4.63% <0.00%> (-0.05%) ⬇️
pkg/ovs/openflow/ofctrl_group.go 0.00% <0.00%> (ø)
pkg/ovs/openflow/ofctrl_nxfields.go 12.50% <0.00%> (ø)
pkg/agent/openflow/pipeline.go 24.62% <0.00%> (+0.03%) ⬆️
pkg/antctl/command_definition.go 53.97% <0.00%> (+0.11%) ⬆️
... and 2 more

"k8s.io/client-go/util/workqueue"
"k8s.io/klog/v2"

"antrea.io/antrea/pkg/apis/controlplane"
egressv1alpha2 "antrea.io/antrea/pkg/apis/crd/v1alpha2"
. "antrea.io/antrea/pkg/apis/crd/v1alpha2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an encouraged usage, which misleads people that the types are in same package. Package qualifier is very useful to indicate where the dependency is. The code shouldn't pretend they are in same package which makes dividing them in different packages pointless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tnqn
Copy link
Member

tnqn commented Jul 30, 2021

‘Preallocated’ represents the IP allocated to Egress but not assigned to Node yet.

‘Allocated’ represents the IP allocated to Egress and has been assigned to Node.

This will then involve more components to achieve it and need a chain of reactions but I don't see much value it adds to the API. We already have enough information in Egress API about whether the Egress IP is just allocated or is configured on a Node.
My original thinking of the status is also just number of allocated IPs vs total number of IPs, which helps users to understand how many IPs have been used and whether they need to extend the pool.

@wenqiq wenqiq force-pushed the externalIPPool-status branch 2 times, most recently from 862daa8 to 7defc3e Compare August 3, 2021 01:03
@wenqiq
Copy link
Contributor Author

wenqiq commented Aug 3, 2021

As we discussed above, there is no need to keep the ExternalIPPool status consistent with IPPool status CRD. So I implemented the ExternalIPPool status with a new method which only reporting usage data (number of allocated IPs and total number of IPs).

@wenqiq wenqiq marked this pull request as ready for review August 3, 2021 03:05
type: object
required:
- used
- total
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: if they are set to "required", does it mean we must set it even when creating a new pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedbacks. I have verified that it can work and create a ExternalIPPool successfully, It seems status.usage is not required.

@@ -311,6 +315,37 @@ func (c *EgressController) setIPAllocation(egressName string, ip net.IP, poolNam
}
}

func (c *EgressController) updateExternalIPPoolStatus(poolName string, total, used int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can be called concurrently in pool's eventHandler and egress's workers. So it seems that a race condition can make they override each other and the final value in API may be wrong. Maybe creating multiple Egresses can reproduce it.
If this is true, I would suggest have dedicated workers to sync pool's status, other places only enqueue a pool that needs to sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good consideration. Will reproduce and fix it.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, some minor comments

@@ -92,6 +95,8 @@ type EgressController struct {
egressGroupStore storage.Interface
// queue maintains the EgressGroup objects that need to be synced.
queue workqueue.RateLimitingInterface
// queueEIP maintains the ExternalIPpool objects that need to be synced.
queueEIP workqueue.RateLimitingInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more natural to call it eipQueue, but I think poolQueue is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -311,6 +311,10 @@ func testEgressCRUD(t *testing.T, data *TestData) {
require.NoError(t, err, "Failed to check if IP exists on Node")
assert.False(t, exists, "Found stale IP on Node")
}
pool, err = data.crdClient.CrdV1alpha2().ExternalIPPools().Get(context.TODO(), pool.Name, metav1.GetOptions{})
assert.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.NoError(t, err)
require.NoError(t, err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pool, err = data.crdClient.CrdV1alpha2().ExternalIPPools().Get(context.TODO(), pool.Name, metav1.GetOptions{})
assert.NoError(t, err)
assert.Equal(t, 1, pool.Status.Usage.Used, "ExternalIPPool status length not match")
t.Logf("pool %s status: %v", pool.Name, pool.Status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Logf("pool %s status: %v", pool.Name, pool.Status)
t.Logf("ExternalPool %s status: %v", pool.Name, pool.Status)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -311,6 +311,10 @@ func testEgressCRUD(t *testing.T, data *TestData) {
require.NoError(t, err, "Failed to check if IP exists on Node")
assert.False(t, exists, "Found stale IP on Node")
}
pool, err = data.crdClient.CrdV1alpha2().ExternalIPPools().Get(context.TODO(), pool.Name, metav1.GetOptions{})
assert.NoError(t, err)
assert.Equal(t, 1, pool.Status.Usage.Used, "ExternalIPPool status length not match")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add expectedTotal to the table and check it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -24,12 +24,14 @@ import (
"sync"
"time"

apimachineryerrors "k8s.io/apimachinery/pkg/api/errors"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: apierrors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -92,6 +95,8 @@ type EgressController struct {
egressGroupStore storage.Interface
// queue maintains the EgressGroup objects that need to be synced.
queue workqueue.RateLimitingInterface
// poolQueue maintains the ExternalIPpool objects that need to be synced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// poolQueue maintains the ExternalIPpool objects that need to be synced.
// poolQueue maintains the ExternalIPPool objects that need to be synced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

build/yamls/base/crds.yml Show resolved Hide resolved
@wenqiq wenqiq closed this Aug 17, 2021
@wenqiq wenqiq reopened this Aug 17, 2021
tnqn
tnqn previously approved these changes Aug 18, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @antoninbas do you want to take a look since you suggested the enhancement?

@tnqn
Copy link
Member

tnqn commented Aug 18, 2021

/test-all

antoninbas
antoninbas previously approved these changes Aug 18, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this

Comment on lines +255 to +256
// The current status of the ExternalIPPool.
Status ExternalIPPoolStatus `json:"status"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn we don't need omitempty for this? I know that when kubebuilder generates CRD types, it always adds the omitempty tag for the Status field.

I also see it for upstream resources (at least for Pod)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is no omitempty tag for all the Status field in the project. I think we can take another PR to implement it? @antoninbas @tnqn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's set for many Status fields of native Kubernetes resources, I just found it doesn't have any impact as long as the field is a customized struct. I tried marshalling an empty ExternalIPPool, it got the same result regardless of omitempty is set:

{
  "metadata": {
    "creationTimestamp": null
  },
  "spec": {
    "ipRanges": null,
    "nodeSelector": {}
  },
  "status": {
    "usage": {
      "total": 0,
      "used": 0
    }
  }
}

According to https://golang.google.cn/pkg/encoding/json/#Marshal, it works for the built-in types only.

The "omitempty" option specifies that the field should be omitted from the encoding if the field has an empty value, defined as false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming guys.

pkg/controller/egress/controller.go Outdated Show resolved Hide resolved
if gotEgress.Spec.ExternalIPPool != "" && gotEgress.Spec.EgressIP != "" {
poolName := gotEgress.Spec.ExternalIPPool
eip, err := controller.crdClient.CrdV1alpha2().ExternalIPPools().Get(context.TODO(), poolName, metav1.GetOptions{})
assert.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.NoError(t, err)
require.NoError(t, err)

assert will continue with the test, even though eip will be nil in case of an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

eip, err := controller.crdClient.CrdV1alpha2().ExternalIPPools().Get(context.TODO(), poolName, metav1.GetOptions{})
assert.NoError(t, err)
usage := eip.Status.Usage
assert.Equal(t, 1, usage.Used, fmt.Sprintf("ExternalIPPool status not match: %v", usage))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Equal(t, 1, usage.Used, fmt.Sprintf("ExternalIPPool status not match: %v", usage))
assert.Equal(t, 1, usage.Used, "Expected one used IP in EgressIPPool Status")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@wenqiq wenqiq dismissed stale reviews from antoninbas and tnqn via fbd2754 August 18, 2021 03:22
antoninbas
antoninbas previously approved these changes Aug 19, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-all

@tnqn
Copy link
Member

tnqn commented Aug 20, 2021

@wenqiq e2e failed:

=== RUN   TestEgress/testEgressCRUD
=== RUN   TestEgress/testEgressCRUD/single_matching_Node
    egress_test.go:320: 
        	Error Trace:	egress_test.go:320
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 0
        	Test:       	TestEgress/testEgressCRUD/single_matching_Node
        	Messages:   	ExternalIPPool status usage used num not match
    egress_test.go:322: ExternalPool pool-tk58b status: {{2 0}}
=== RUN   TestEgress/testEgressCRUD/two_matching_Nodes
    egress_test.go:320: 
        	Error Trace:	egress_test.go:320
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 0
        	Test:       	TestEgress/testEgressCRUD/two_matching_Nodes
        	Messages:   	ExternalIPPool status usage used num not match
    egress_test.go:322: ExternalPool pool-jdsn5 status: {{2 0}}
=== RUN   TestEgress/testEgressCRUD/no_matching_Node
    egress_test.go:320: 
        	Error Trace:	egress_test.go:320
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 0
        	Test:       	TestEgress/testEgressCRUD/no_matching_Node
        	Messages:   	ExternalIPPool status usage used num not match

@wenqiq
Copy link
Contributor Author

wenqiq commented Aug 20, 2021

Fixed ExternalIPPool status checking logic order in Egress e2e test case.
/test-all

Report the number of allocated versus free IPs to the status of the Egress ExternalIPPool.

For example:

kubectl get eip

```
NAME            TOTAL   USED   AGE
eip-test-ipv4   345     1      20s
```

kubectl get eip -o yaml

```
   ...
items:
- apiVersion: crd.antrea.io/v1alpha2
  kind: ExternalIPPool
   ...
  spec:
    ipRanges:
    - end: 10.10.10.100
      start: 10.10.10.10
  status:
    usage:
      total: 91
      used: 1
   ...
```

Signed-off-by: Wenqi Qiu <wenqiq@vmware.com>
@wenqiq
Copy link
Contributor Author

wenqiq commented Aug 20, 2021

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

All the e2e test failures are unrelated to this PR: see #2618

@antoninbas antoninbas merged commit 9ccddc0 into antrea-io:main Aug 20, 2021
@antoninbas antoninbas added this to the Antrea v1.3 release milestone Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants