Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved network architecture which makes it working on Kubernetes too #110

Closed
wants to merge 9 commits into from

Conversation

harduino
Copy link
Contributor

@harduino harduino commented Aug 1, 2020

Suggestion to change configuration of created by orborus/worker Docker containers. At the moment we have to determine network name and pass it to child containers. After investigation I found how we can improve architecture and as a bonus make Shuffle working on Kubernetes cluster (current implementation doesn't work on K8s as Docker Compose and K8s uses network little different). I tested on local Minikube cluster only yet.
The main idea is that instead of creating Network Endpoint with required network name, we would pass to created Docker-container HostConfig's "NetworkMode" & "IpcMode" with value "container:12345", where "12345" - is parent's container id (orborus/worker). So new created container will take full network settings from passed container id. When orborus would create worker, it would determine self-container-id and pass configurated NetworkMode/IpcMode params with included self-container-id. That new worker when creating apps will pass self container's id too. So all chain's containers will be located in the same network wherever Shuffle's stack runs in Docker or Kubernetes mode.
image

https://docs.docker.com/engine/api/v1.24/

NetworkMode - Sets the networking mode for the container. Supported standard values are: bridge, host, none, and container:<name|id>. Any other value is taken as a custom network’s name to which this container should connect to.

TODO. It should be more right to implement creating worker & apps as temporary pods+services when running in K8s mode, but such solution is a much longer to implement and suggested solution is really much easier, so it makes sense.

P.S. I had to replace Scratch with Alpine as Scratch+Minikube didn't work properly with downloading remote data via https. Also we have to get access to Docker's file system (we need file /proc/self/cgroup) to figure out container's id.

@frikky
Copy link
Member

frikky commented Aug 2, 2020

I'll take a closer look at this next week. Definitely looks promising, but I'll have to go deeper. Good shit 🔥

@harduino
Copy link
Contributor Author

harduino commented Aug 5, 2020

added some improvements

@harduino
Copy link
Contributor Author

harduino commented Aug 5, 2020

P.S. We could get rid of new environment variable RUNNING_MODE=Kubernetes/Docker and form unified container-self-id script-finder for both, but I predict, that such ENV could be useful in future, so leaved it.

@harduino
Copy link
Contributor Author

UPD. Improvements above make Shuffle working on AWS K8s. Shared IPC has been removed as it's not required for Shuffle and didn't work in AWS.

@frikky
Copy link
Member

frikky commented Aug 14, 2020

Hey @harduino - you up for another chat about this next week / this weekend? I've been away on holiday, so haven't been able to look at this much sadly

@harduino
Copy link
Contributor Author

Hey @frikky. No problem, I suggest nearest Monday.

@peterclemenko
Copy link

Oh please yes, this would be a killer feature. Please make sure you include deployment manfiests

@peterclemenko
Copy link

Any update on this?

@harduino
Copy link
Contributor Author

harduino commented Sep 2, 2020

Hi, @AoiGhost
I'm testing this PR and all seems work fine.
Just if on K8s there're >1 nodes you should make sure, that Backend & Orborus services are located on the same node as they are sharing Docker images.

@peterclemenko
Copy link

Alright, any chance this can get mainlined?

@frikky
Copy link
Member

frikky commented Sep 2, 2020

Sorry guys, I've been out of it for a bit and it's hard to prioritize things right now. I'll get some real testing done on this by Sunday.

Thanks for the hard work @harduino

@harduino
Copy link
Contributor Author

harduino commented Sep 2, 2020

Hey, @frikky. No problem, there is no rush as it's all based on enthusiasm.

@frikky
Copy link
Member

frikky commented Sep 5, 2020

Heyo! This is now a part of master and the 0.7.0 release! All your commits are a part of this pull request: #136

Thanks again for your help on this sort of thing. Please test it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants