-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
endpoint: store state in ep_config.json #31559
Conversation
9b326f7
to
f910f95
Compare
/test |
f910f95
to
f1553ac
Compare
/test |
f1553ac
to
2c566c3
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some quick upgrade / downgrade test cases?
If you can tell me how? |
Maybe :-) This will have to be a ginkgo test, since we will need to test adding and removing Cilium. The easiest will be basing it off of a test in |
Ok, I'll take a look. How are the tests you would like different than the other upgrade / downgrade tests we already do? If I understand correctly we have some CI which does upgrade / downgrade from the previous stable. |
We don't have that many upgrade tests, only ipsec and cluster-mesh. So a quick ginkgo test wouldn't be bad idea. If this winds up being a total mess, then skip it. It would be good to have, though. |
Wait, why should we ever add more ginkgo tests at this point? The decision was made a long time ago to move off of ginkgo in favor of cilium-cli driven tests, and many efforts have been made in this direction over the past few years. If there's an issue with up/downgrades, the cilium-cli e2e tests you mentioned will catch those. I think the goal was to add more up/downgrade tests at some point, but since these are slow by definition, I think we're not adding more. Summoning @brb for a bit more context here. The only kind of tests that I think could make sense here are unit tests based on golden files rendered by older/newer versions of Cilium, but there's no real downgrade path that needs to be tested here. This PR only adds a double-write and opportunistically uses it. Maybe some other form of unit test could work, I dunno. |
@squeed we discussed this on our weekly call. Quick summary:
Please let me know what you'd like to see to get this unblocked. |
If you’re comfortable with the test coverage, that’s OK with me. The IPsec upgrade test has a habit of being disabled, though. I realize we try to avoid new ginkgo tests, sometimes they are hard to avoid. |
2c566c3
to
0ac5e51
Compare
/ci-e2e-upgrade |
0ac5e51
to
e358a8a
Compare
Got an almost clean e2e-upgrade run. https://github.com/cilium/cilium/actions/runs/8552399781 |
@tommyp1ckles can you take another look please? |
/test |
/ci-e2e-upgrade |
Commit d81a5cd ("pkg/endpoint: Simplify search for C header file") retained earlier retry behaviour around stat, with a comment saying that we can remove the logic if the bug hasn't manifested in a while. That was in 2020. The time to remove the check is now. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
On startup, the agent attempts to restore existing endpoint state from disk. It does this by reading the per Endpoint ep_config.h, parsing that file for a line containing a magic string and then decoding a base64 encoded JSON blob from that. ep_config.h is in turn used by the loader to compile the per endpoint BPF programs. In effect, the concern of persisting endpoint state is mixed up with how we compile per endpoint programs. Instead, write the JSON state into a separate ep_config.json file in addition to ep_config.h. The old behaviour is retained so that downgrades do not lose endpoint state unnecessarily. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
e358a8a
to
f315d2a
Compare
/test |
/ci-e2e-upgrade |
1 similar comment
/ci-e2e-upgrade |
endpoint: remove stat retry on restore
endpoint: store state in ep_config.json