Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralized Wasm caching #51125

Open
kyessenov opened this issue May 17, 2024 · 10 comments
Open

Centralized Wasm caching #51125

kyessenov opened this issue May 17, 2024 · 10 comments

Comments

@kyessenov
Copy link
Contributor

kyessenov commented May 17, 2024

FR to move Wasm fetching over to Istiod. This ensures a uniform reliability expectation for all configuration - regular xDS and Wasm. A separate distribution channel for Wasm (e.g. OCI) adds risks due to the distributed failure (e.g. every pod can fail independently), and complicates the coordination of xDS (e.g. when listener can start using Wasm).

To reduce the load on Istiod, we can limit the binary sizes to O(10MB) to capture the common Lua usage pattern. For very large modules, we can rely on binary files shipped separately (FUSE to GCS or built into container).

As a side benefit this will further reduce pilot agent dependencies by removing OCI clients and the ECDS proxy.

@kyessenov
Copy link
Contributor Author

@igsong @howardjohn

@howardjohn
Copy link
Member

IMO the gold standard here is to use the nodes container runtime. It was built for this task, and also has fancy things like custom snapshotters, p2p fetching, ... And you get a per node cache.

I wonder what https://developer.fermyon.com/spin/v2/kubernetes is using -- worth exploring.

If we did that, depending on details we may or may not actually get rid of OCI, though, depending on how we did things

@kyessenov
Copy link
Contributor Author

kyessenov commented May 17, 2024

I don't think it makes sense to use the node for what are essentially Lua scripts. There's a place for very large Wasm binaries, but that's not the target for this feature request since we'll bound the size by 5-10MB. Very large Wasm binaries are completely unproven in general right now in the context of a mesh. A proxy is not a VM runtime and we don't run general Wasm applications on it.

@howardjohn
Copy link
Member

10mb and Lua scripts are very different IMO. 10mb is well not even super uncomon of a docker image size, I think we have a few that small

@kyessenov
Copy link
Contributor Author

kyessenov commented May 17, 2024

5MB is relatively minor with respect to the overall xDS configuration size. Regular xDS configuration can be pushed from OCI, too, but we don't recommend that because we can't promise any reliability for alternative channels, including the node registry. If Wasm proves to be common on k8s, the platform will deliver an ability to dynamically mount Wasm modules on live pods, but the promise seems to always be in the future, while there's a clear use case for Lua replacement with small modules.

@hzxuzhonghu
Copy link
Member

I donot think istiod has the capability to fecth all the wasm modeules. Not to mention distribution, it should not be working as a image registry

@kyessenov
Copy link
Contributor Author

I donot think istiod has the capability to fecth all the wasm modeules. Not to mention distribution, it should not be working as a image registry

It certainly can since this ability is present and used in each sidecar independently of k8s. We don't need istiod as a registry, it will convert Wasm into xDS and treat it as xDS config going forward.

@zirain
Copy link
Member

zirain commented May 22, 2024

I'm not sure if it worth to do this in istiod. users can set up a mutatingwebhook mutate the url to a proxy/centralize backend.

@kyessenov
Copy link
Contributor Author

@zirain I agree that for large binaries, xDS is a poor fit for content distribution. However, many things go wrong when modules are large, and there're a lot of use cases that can be covered with small modules.

@zirain
Copy link
Member

zirain commented May 22, 2024

We've done a lot of things in the past to reduce the size of XDS, and distributing content in XDS sounds like store binary(e.g. photo, raw json) in database, it works but we don't usually do that(S3/GCS is better choice).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants