New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subscription feature MTU3900 #1672
Conversation
Maybe to clarify, here's my thinking on why I put this in ARO-RP. I could be wrong in my assumptions. As best as I can tell, the installer code does not currently query Azure subscription flags. Replicating that functionality in the installer fork just for this feature seems inappropriate and unlikely to be accepted by upstream. We could have also split the code across ARO-RP and the installer fork, with the subscription flag check in the RP and the "business logic" in the installer. But it was not clear to me how to add this logic to the installer in an API-compatible way aside from just bolting on public function(s) somewhere and calling them directly from the RP. I imagine such an approach is also unlikely to be accepted by upstream since they'd be stuck maintaining what is, from their perspective, dead code. That would leave us with a permanent carry-patch in the installer fork, and I'm loathe to make the installer rebase work any more difficult than it already is. I understand this code is tightly coupled to the installer, but for the reasons above I thought the RP would be the best place. |
Please rebase pull request. |
FeatureFlagMTU3900 is the feature in the subscription that causes new OpenShift cluster nodes to use the largest available Maximum Transmission Unit (MTU) on Azure virtual networks, which as of late 2021 is 3900 bytes. Otherwise cluster nodes will use the DHCP-provided MTU of 1500 bytes.
Force-pushed a replacement commit that heavily refactors the code to add a I faked the subscription feature flag to test, but aside from that it works:
|
I got so focused on setting MTU on the hosts at install time that I forgot to check if cluster-network-operator is picking up the increased host MTU when deriving a default MTU for SDN/OVN. It's not. 😞 Further investigation required. getDefaultMTU is the cluster-network-operator function I thought would detect the higher host MTU. I wonder if the operator first comes up on the bootstrap node? This PR doesn't alter MTU on the bootstrap node.
|
New commit gets SDN/OVN configured properly.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to hold merging this until #1492 which is very close.
We have other ways to do this. I somehow think all this PR introduced one more code pattern into our code. And last - with new pattern added it has no tests at all. Not even E2E or unit tests. I think this was merged prematurely. |
We already introduce custom machine configs in our installer fork. Take a look at the commits which introduce custom DNS as a reference. BTW, as far as I know OCP is going to work on supporting custom DNS. I imagine that having (more or less) properly implemented installer features in our fork is going to be huge help (as well as enhancement proposal in this case). Also It is ok to ask for help, if something is not clear. I can help with this (or at least show the right course). I think Amber, MJ and Ben can help and probably some other people too.
Upstream is ok to have ARO and OKD specific code gated: we already have BTW, the second example while being ARO-specific was still useful to the upstream: it was one of the requirements for Azure Stack Hub support.
Even if we end up with a permanent carry-patch (I doubt it) - I think it will be better than doing a hack in the RP codebase which is inconsistent with what we already do (for DNS, etc). I imagine that the installer patch will create a a new asset (installer terminology) so it will be easy to rebase it. I think it will be very similar to the DNS patch, but simpler/smaller. And as mentioned above: we have at least 2 cases were ARO-specific cary patches were useful to the upstream in implementing something new featuers. |
Which issue this PR addresses:
This implements 10080838 Support for larger MTUs at cluster install time.
What this PR does / why we need it:
See Matt Woodson's summary in the ADO feature. (It's customer-specific so not sure I should paste details here.)
Test plan for issue:
So far only manual testing until the subscription feature flag exists.
Is there any documentation that needs to be updated for this PR?
Microsoft will be delivering documentation about the larger MTU in the October/November timeframe, but presumably it will not mention Azure Red Hat OpenShift. Waiting to see what Microsoft delivers but likely we'll need to add a mention under Azure Red Hat OpenShift documentation.