Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extends the Endpoints support from 500 to 800, extra ones will be dropped in AntreaProxy #2101

Merged
merged 3 commits into from
Apr 29, 2021

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Apr 16, 2021

For #2092

Due to the message size and the implementation of Service in AtreaProxy, the maximum Endpoint that AntreaProxy can support now is 800. If the Endpoints of Service exceed 800, the exceeding Endpoints will be ignored.

In AntreaProxy, OVS group is the key part of Service implementation. For now, Antrea is using Openflow 1.3 to communicate with OVS. In previous design, every bucket of a OVS group has five actions. Two actions for loading Endpoint IP and port to registers and resubmit action must be reserved.The other two actions for loading values to register can be moved to flows (in current patch, they are moved to table 41), and then one message can hold more bucket items. As a result, the maximum Endpoint has changed from 511 to 800. Unfortunately, to ensure AntreaProxy running correctly, the exceeding Endpoints will be ignored.

pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Show resolved Hide resolved
@hongliangl hongliangl changed the title Add support for insert-bucket in openflow group Fix #2092 Apr 20, 2021
@hongliangl hongliangl force-pushed the Fix-#2092 branch 2 times, most recently from 275ef43 to 22b0306 Compare April 20, 2021 18:42
@codecov-commenter
Copy link

codecov-commenter commented Apr 20, 2021

Codecov Report

Merging #2101 (bc95f63) into main (c67106c) will increase coverage by 4.02%.
The diff coverage is 22.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2101      +/-   ##
==========================================
+ Coverage   61.23%   65.26%   +4.02%     
==========================================
  Files         270      269       -1     
  Lines       20376    20812     +436     
==========================================
+ Hits        12478    13583    +1105     
+ Misses       6609     5842     -767     
- Partials     1289     1387      +98     
Flag Coverage Δ
e2e-tests 56.39% <22.85%> (?)
kind-e2e-tests 52.12% <20.00%> (+0.11%) ⬆️
unit-tests 41.36% <14.28%> (-0.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/proxy/endpoints.go 65.00% <0.00%> (-2.54%) ⬇️
pkg/agent/openflow/pipeline.go 80.66% <11.11%> (+10.63%) ⬆️
pkg/agent/proxy/proxier.go 63.60% <27.27%> (-3.18%) ⬇️
pkg/agent/openflow/client.go 71.54% <100.00%> (+11.70%) ⬆️
pkg/antctl/raw/traceflow/command.go 23.82% <0.00%> (-1.92%) ⬇️
...ntroller/networkpolicy/networkpolicy_controller.go 69.35% <0.00%> (-0.97%) ⬇️
...agent/controller/traceflow/traceflow_controller.go 73.14% <0.00%> (-0.14%) ⬇️
pkg/ipfix/ipfix_process.go 100.00% <0.00%> (ø)
pkg/ipfix/ipfix_set.go
pkg/agent/openflow/network_policy.go 76.41% <0.00%> (+0.59%) ⬆️
... and 50 more

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongliangl : I would prefer to add descriptions about the actual changes in the commit title and message. We can list the issue number in the commit message too.

pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved
pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved
pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, would like to see @jianjuns @antoninbas's opinions.

build/yamls/base/conf/antrea-agent.conf Outdated Show resolved Hide resolved
@tnqn
Copy link
Member

tnqn commented Apr 21, 2021

@hongliangl please update the PR's title as well, otherwise it's not clear what this is about from the PR list.

@tnqn
Copy link
Member

tnqn commented Apr 21, 2021

And typo in the commit message: "AtreaProxy"

@hongliangl hongliangl changed the title Fix #2092 Fix issue of the maximum supporting Endpoints in AntreaProxy Apr 21, 2021
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not really fixing the issue IMO. We should either keep it open or open a new one.

pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
maxRetryForOFSwitch = 5
// Due to the message size of openflow 1.3 and implementation of Service in Antrea, the maximum Endpoint that Antrea
// can support now is 800. If the Endpoints of Service exceed 800, the exceeding Endpoints will be ignored.
maxEndpoints = 800
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we went from 500 to 800? This is a bit underwhelming to be honest, albeit still an improvement

Copy link
Contributor Author

@hongliangl hongliangl Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, still an improvement, may a bit, too small.

pkg/agent/openflow/client.go Outdated Show resolved Hide resolved
build/yamls/base/conf/antrea-agent.conf Outdated Show resolved Hide resolved
@hongliangl hongliangl changed the title Fix issue of the maximum supporting Endpoints in AntreaProxy Fix issue of the maximum number of Endpoints supporting in AntreaProxy Apr 21, 2021
@hongliangl
Copy link
Contributor Author

hongliangl commented Apr 21, 2021

@hongliangl please update the PR's title as well, otherwise it's not clear what this is about from the PR list.

And typo in the commit message: "AtreaProxy"

Thanks for reminding, updated.

Comment on lines 294 to 299
for _, endpoint := range endpointUpdateList {
if _, ok := endpointsInstalled[endpoint.String()]; !ok {
needUpdateEndpoints = true
break
}
}
Copy link
Contributor Author

@hongliangl hongliangl Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn

The extra endpoints won't be in "endpointsInstalled" and "needUpdateEndpoints" will always be true regardless of how many times it has been reconciled.

Sorting endpointUpdateList is to arrange all Endpoints in order and then cut them. If the reserved
Endpoints are all installed, needUpdateEndpoints will not true.

More detailed, needUpdateEndpoints is recalculated with the cut endpointUpdateList and endpointsInstalled. needUpdateEndpoints will not true if the reserved Endpoints in the cut endpointUpdateList are all installed. This part causes extra overhead, but only when the service is oversize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think L274 will set needUpdateEndpoints to true.

Copy link
Contributor Author

@hongliangl hongliangl Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a Service is oversize, L274 will always set needUpdateEndpoints to true as extra Endpoints are not installed.
L290 resets needUpdateEndpoints to false as endpointUpdateList is cut and it's unknown that if all reserved Endpoints in endpointUpdateList are installed, so L294~L299 checks that if there are any Endpoints in endpointUpdateList not install.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed L290. However, it still seems wrong to reset needUpdateEndpoints as there are other conditions that could set it to true, for example, needUpdateEndpoints = pSvcInfo.SessionAffinityType() != svcInfo.SessionAffinityType(). And it's redundant to "Check if the cut endpointUpdateList are all installed" twice when it exceeds the size. Could we cut the endpoints in the below place? https://github.com/vmware-tanzu/antrea/blob/eadf0921f88712b373df9f00c778ef81642def2f/pkg/agent/proxy/proxier.go#L244-L246

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still seems wrong to reset needUpdateEndpoints as there are other conditions that could set it to true, for example, needUpdateEndpoints = pSvcInfo.SessionAffinityType() != svcInfo.SessionAffinityType().

needUpdateEndpoints = pSvcInfo.SessionAffinityType() != svcInfo.SessionAffinityType() should be considered.

Could we cut the endpoints in the below place?

Variable endpoints is a map, IMO, we can't cut the endpoints below the code above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to convert it to a list sooner or later, no harm to do it earlier? It can save the repeated calculation of needUpdateEndpoints in your current code L273-L278 and L294-303, and reduce the code complexity in some way I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@hongliangl hongliangl requested a review from tnqn April 27, 2021 15:17
@hongliangl hongliangl force-pushed the Fix-#2092 branch 2 times, most recently from 10b3620 to 6ad0b0c Compare April 28, 2021 03:00
@hongliangl
Copy link
Contributor Author

./test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Apr 28, 2021

/test-windows-networkpolicy
/test-windows-e2e
/test-networkpolicy
/test-e2e

@tnqn
Copy link
Member

tnqn commented Apr 29, 2021

/test-windows-e2e

@jianjuns @antoninbas would you take another look at this PR?

@hongliangl
Copy link
Contributor Author

/test-windows-e2e

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion on the feature gate doc.

@@ -51,7 +51,11 @@ example, to enable `AntreaProxy` on Linux, edit the Agent configuration in the
`AntreaProxy` implements Service load-balancing for ClusterIP Services as part
of the OVS pipeline, as opposed to relying on kube-proxy. This only applies to
traffic originating from Pods, and destined to ClusterIP Services. In
particular, it does not apply to NodePort Services.
particular, it does not apply to NodePort Services. Please note that due to
some restrictions on the implementation of Services in Antrea, the maximum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
Please note that in the current AntreaProxy implementation there is a restriction that the maximum number...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you feel strongly about it, I suggest merging as it is. We have been postponing v1.0.1 for a while now, and this is pretty much the last change we are waiting for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me.

for _, endpoint := range endpoints { // Check if there is any installed Endpoint which is not expected anymore.
if _, ok := endpointsInstalled[endpoint.String()]; !ok { // There is an expected Endpoint which is not installed.
needUpdateEndpoints = true
if len(endpoints) > maxEndpoints {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I would have liked to see a unit test for this (with maxEndpoints set to small value). However, this is strictly better than what we had before, and there is little risk in merging this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I prefer to add a unit test. However, the maxEndpoints is a constant variable, IMO, I didn't come up with any good idea to test this. BTW, thanks for merging this.

@antoninbas antoninbas merged commit 007995d into antrea-io:main Apr 29, 2021
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Apr 29, 2021
…pped in AntreaProxy (antrea-io#2101)

For antrea-io#2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit that referenced this pull request Apr 30, 2021
…pped in AntreaProxy (#2101)

For #2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Apr 30, 2021
…pped in AntreaProxy (antrea-io#2101)

For antrea-io#2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Apr 30, 2021
…pped in AntreaProxy (antrea-io#2101)

For antrea-io#2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Apr 30, 2021
…pped in AntreaProxy (antrea-io#2101)

For antrea-io#2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit that referenced this pull request Apr 30, 2021
…pped in AntreaProxy (#2101)

For #2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Apr 30, 2021
…pped in AntreaProxy (antrea-io#2101)

For antrea-io#2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit that referenced this pull request May 1, 2021
…pped in AntreaProxy (#2101)

For #2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request May 1, 2021
…pped in AntreaProxy (antrea-io#2101)

For antrea-io#2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request May 1, 2021
…pped in AntreaProxy (antrea-io#2101)

For antrea-io#2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
antoninbas pushed a commit that referenced this pull request May 3, 2021
…pped in AntreaProxy (#2101)

For #2092

Due to the message size and the implementation of Service in AntreaProxy,
the maximum number of Endpoints that AntreaProxy can support now is 800.
If the the number of Endpoints in given Service exceeds 800, the extra
Endpoints will be dropped and a warning will be logged.

In AntreaProxy, OVS group is the key part of Service implementation. For
now, Antrea is using Openflow 1.3 to communicate with OVS. In previous
design, every bucket of a OVS group has five actions. Two actions for loading
Endpoint IP and port to registers and resubmit action must be preserved.The
other two actions for loading values to register can be moved to flows (in
current patch, they are moved to table 41), and then one message can hold
more bucket items. As a result, the maximum Endpoint has changed from 511
to 800. Unfortunately, to ensure AntreaProxy running correctly, the extra
Endpoints will be dropped.
@jianjuns jianjuns mentioned this pull request Sep 17, 2021
@hongliangl hongliangl deleted the Fix-#2092 branch April 18, 2022 09:34
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jan 30, 2024
The current implementation limits the maximum number of buckets in an OVS group
add/insert_bucket message to 800. This constraint is based on the fact that each
bucket has 3 actions, such as `set_field:0xa0a0007->reg0`,
`set_field:0x50/0xffff->reg4`, and `resubmit(,EndpointDNAT)`. However, an update
in antrea-io#5205 introduced a new action, `set_field:0x4000000/0x4000000->reg4`, for
remote Endpoints, making it impossible to accommodate 800 buckets with 4 actions
in an OVS group add/insert_bucket message. Another case is that the message cannot
hold 800 buckets with 3 actions, such as `set_field:0xa0a0007->xxreg0`,
`set_field:0x50/0xffff->reg4` and `resubmit(,EndpointDNAT)`, for IPv6 Endpoints.

To address this limitation, we have the following changes in this patch:

- The action for loading `EpToLearnRegMark` or `EpSelectedRegMark` in table
  `ServiceLB` flows is moved back to OVS group bucket action. This original change
  was introduced in antrea-io#2101, which is a workaround to accommodate as many as more
  Endpoints in an OVS group add message in Openflow 1.3, where an OVS group can be
  only created by an add message and cannot be updated. Now we use Openflow 1.5,
  where an insert_bucket message can be used to append buckets to an existing OVS
  group. Moving the action for loading `EpToLearnRegMark` or `EpSelectedRegMark`
  back to OVS group bucket action is more logical as such action is loaded after
  Service Endpoint selection, rather than being set earlier before the selection
  in table ServiceLB.
- Set the maximum number of buckets to 400. is derived from the worst-case scenario,
  where each bucket includes 4 actions like: `set_field:0xa0a0007->xxreg0`,
  `set_field:0x50/0xffff->reg4`, `set_field:0x100000/0x100000->reg4`,
  `load:0x2->NXM_NX_REG4[16..18]` and `resubmit(,EndpointDNAT)`. We can use the
  following command to verify this:
  ```bash
  ovs-ofctl mod-group br-int group_id=100,type=select,$(for i in {0..400}; do echo -n "bucket=bucket_id:$i,weight:100,actions=set_field:0xa0a0007->xxreg0,set_field:0x50/0xffff->reg4,set_field:0/0x100000->reg4,load:0x2->NXM_NX_REG4[16..18],resubmit(,EndpointDNAT),"; done)
  ```

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this pull request Jan 30, 2024
The current implementation limits the maximum number of buckets in an OVS group
add/insert_bucket message to 800. This constraint is based on the fact that each
bucket has 3 actions, such as `set_field:0xa0a0007->reg0`,
`set_field:0x50/0xffff->reg4`, and `resubmit(,EndpointDNAT)`. However, an update
in antrea-io#5205 introduced a new action, `set_field:0x4000000/0x4000000->reg4`, for
remote Endpoints, making it impossible to accommodate 800 buckets with 4 actions
in an OVS group add/insert_bucket message. Another case is that the message cannot
hold 800 buckets with 3 actions, such as `set_field:0xa0a0007->xxreg0`,
`set_field:0x50/0xffff->reg4` and `resubmit(,EndpointDNAT)`, for IPv6 Endpoints.

To address this limitation, we have the following changes in this patch:

- The action for loading `EpToLearnRegMark` or `EpSelectedRegMark` in table
  `ServiceLB` flows is moved back to OVS group bucket action. This original change
  was introduced in antrea-io#2101, which is a workaround to accommodate as many as more
  Endpoints in an OVS group add message in Openflow 1.3, where an OVS group can be
  only created by an add message and cannot be updated. Now we use Openflow 1.5,
  where an insert_bucket message can be used to append buckets to an existing OVS
  group. Moving the action for loading `EpToLearnRegMark` or `EpSelectedRegMark`
  back to OVS group bucket action is more logical as such action is loaded after
  Service Endpoint selection, rather than being set earlier before the selection
  in table ServiceLB.
- Set the maximum number of buckets to 400. is derived from the worst-case scenario,
  where each bucket includes 4 actions like: `set_field:0xa0a0007->xxreg0`,
  `set_field:0x50/0xffff->reg4`, `set_field:0x100000/0x100000->reg4`,
  `load:0x2->NXM_NX_REG4[16..18]` and `resubmit(,EndpointDNAT)`. We can use the
  following command to verify this:

  ```bash
  ovs-ofctl mod-group br-int group_id=100,type=select,$(for i in {0..400}; do echo -n "bucket=bucket_id:$i,weight:100,actions=set_field:0xa0a0007->xxreg0,set_field:0x50/0xffff->reg4,set_field:0/0x100000->reg4,load:0x2->NXM_NX_REG4[16..18],resubmit(,EndpointDNAT),"; done)
  ```

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants