-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split operator to separate out cloud provider specific code #9920
Comments
Per discussion in the meeting today, the first task here will be to define more details. Just a few initial thoughts below. Two path I can think of so far are these (I think we mentioned this on the call):
|
From the first look at the operator code, I think it would make sense to separate out some packages, as it's currently a single (and relatively large) |
I also noted that operator currently depends on |
@tgraf do we mostly care about the size of binary and the image, or we might want to isolate the dependencies also? I am mostly leaning towards the route where there are just separate binaries/images, it just seems like the path of least resistance to start with. Since this was referenced from #10056, I assume the binary size matters to us primarily for the impact on memory usage, and not so much for image download time. If that's the case, we can probably keep the image the same, just have separate binaries, which would make the build changes simpler. |
Thinking more about the route with separate binaries, it looks like there are two way:
|
Exactly, afaics binary size mainly matters because it increases RSS of the process. Regarding image size, I think operator size is not the main factor and we also have some other leverage there to reduce size, see e.g. #10542 |
I'm not very familiar with the operator yet but I'd guess this way would be easier to implement given the current build system. It would probably also need less duplicated code than the separated |
it's simpler to have a single |
Sure, thanks for clarification! I just wanted to double-check, as sometimes dependency sub-tree conflicts can be an issue, but I guess SDKs do maintain good dependency hygiene and don't introduce conflicting versions of |
These are the numbers I got for the 64-bit macOS binaries (
|
I must say that refactoring operator main package into a few sub-packages is not very easy, as we are using quite a few global variables, and I am not sure whether to get rid of some of those globals, or do something else... E.g. it would be possible to move code inside of the main package without break it apart, or maybe create a sub-package with global variables? Aside from IP allocators inside of the operator main package, there is also one additional offender: cilium/pkg/policy/groups/providers.go Lines 15 to 20 in 10fcc77
It appears that this |
Which global variables?
@errordeveloper I guess we need to look it case-by-case. For example, that file can easily be fixed by adding a go build tag to it for which only build of cilium-operator-aws would have it set. |
It's the cilium/operator/cilium_node.go Line 34 in 0f49fc6
cilium/operator/cilium_node.go Line 109 in 0f49fc6
Line 73 in 0f49fc6
I'll have another go at these now, I have an idea. |
@errordeveloper An upcoming refactoring in the GCP IPAM PR will remove the global Are there any other blockers that you have identified? How complex is the work overall? |
@tgraf is that refactoring work on a branch yet? I started having ago at it, I am not very far from finishing refactoring The intention so far is to have |
@tgraf I have got to a stage where |
- Move AWS policy groups provider registration into operator, as it's not used anywhere else - Add `operator_aws` build tag to exclude this dependency in non-AWS builds (see #9920) Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
- Move AWS policy groups provider registration into operator, as it's not used anywhere else - Add `operator_aws` build tag to exclude this dependency in non-AWS builds (see #9920) Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Since #10758 was merged, we need to discuss how are we going to ship the operator. Open option would be to ship large binary that we ship now, and smaller binaries along the side. We can make Helm select small binaries and eventually deprecate the large binaries. Another, slightly different option would be to have a wrapper binary (perhaps a shell script) that calls the right binary based on what flag was passed in. Alternatively we can just start shipping new small binaries, namely 3 of them, and make sure Helm chart does the right thing. We would have to document this in the change log and upgrade guide as a breaking change to those who aren't using the official Helm chart or manifests that we ship in the repo. From a UX perspective, it seems nicer if we can avoid an instant breaking change and provide a window of 1 or 2 releases, but in a way this can be consider as a low level element and it would only affect those who aren't using the Helm chart, so I'm wouldn't be too worried if we prefer the quicker route. |
After thinking about it for a while, I think that personally I'd probably prefer to go with the approach where agent fat agent is the default for non-Helm installs, so the image will have old large binary as well as new smaller ones, and the smaller ones will have to have IPAM flag removed. Helm installs will make use of new smaller binaries. |
Which non-helm installs are you thinking about? I can think of the |
Yes, agreed. This also ensures we are not breaking anything.
This makes sense to me. We already have helm options for several cloud providers ( |
Summary
The
cilium-operator
currently imports cloud-provider specific SDKs which bloat up the binary size and lead to unnecessary complexity when operated outside of the context of that cloud-provider.Details
TBD
The text was updated successfully, but these errors were encountered: