Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrleansDockerUtils -> Docker and Docker Swarm Clusters for Orleans #2571

Closed
galvesribeiro opened this issue Jan 5, 2017 · 6 comments
Closed
Milestone

Comments

@galvesribeiro
Copy link
Member

galvesribeiro commented Jan 5, 2017

This issue is for discussion about the design of OrleansDockerUtils so we can add native support to docker containers as Orleans Silos. The implementation is on PR #2569

Goal

The idea behind this effort is to enable an Orleans Silo to run in containers with minimal efforts by leveraging both Orleans' provider model and Docker APIs. This PR introduces a DockerMembershipOracle and a DockerGatewayProvider following the design of #2542 but instead of use SF services, use Docker APIs.

Some context

Without going too deep, a brief explanation about common topics on docker ecosystem so the reviewers feels more comfortable with it.

Docker Daemon

This is the Docker process running in the underlying OS (dockerd on Linux and OSX and dockerd.exe on Windows). This process is responsible to manage the containers and provide a way (using libcontainer) to talk to underlying OS' kernel in order to provide the container runtime environment. Each docker host (in production or even your development machine and regardless if single-node or clustered) has one instance of this process. This process also expose a API so 3rd party developers can integrate/automate container management.

Docker Swarm

Swarm is the most used clustering (and orchestration) technology for Docker and is created by Docker Inc. Swarm implement almost 100% of the Docker APIs which means, that Docker CLI (and API) commands issued to swarm has (almost) no difference to the ones issued to a single node Docker Daemon. While in Swarm mode, commands are issue to the Swarm Manager nodes and it apply the request actions across the cluster of Docker Daemons properly.

Docker Compose

YAML file describing grouped/correlated services/containers and the dependencies between each other as long as networking and volume mapping settings. This file can be pushed to a single Docker Daemon or to a Swarm cluster and will deploy the whole set of containers, networks and whatever artifact described in it. It is the most used composition tech for Docker containers nowadays.

Container

A container is basically an isolation share of the host kernel with its own processes. The PID 1 of each container is the command passed when container start or described in the Dockerfile (the container image description file). In our case it would be our Silo host application. If for whatever reason that process crash or return/exit, the container will also exit.

Labels

Docker allow one to add metadata to container and/or images thru labels. They are essentially key/value pairs. It can be anything as long as it can be string represented (key or value) and JSON.stringfy()(only value). You can even apply a label without a value.

Design

I'm trying to make simple the clustering of Orleans silos by leveraging the existent features from Docker. I'm not going to enter in details about how Swarm (or any other tech) cluster work since it is beyond the scope here, but some aspects will be mentioned when necessary. Regardless of running with a single Docker Daemon, a Swarm Cluster, or whatever other clustering stack, as long as it implement the Docker API, we should be fine.

The main idea is to follow almost the same implementation of SF Membership Oracle and Gateway provider, except that instead of having partitions and SF Service discover services to keep the cluster data, we are not keeping the cluster data anywhere! (I know it sounds crazy, but keep reading)

4 labels where defined. Those labels can be applied at each container at startup, in docker-compose.yml, or in their images so it is always applied on every container that derive from it (they can be overridden while starting a container).

IS_DOCKER_SILO = "com.microsoft.orleans.silo" -> Should be set on docker images/containers which run an Orleans Silo host
SILO_PORT = "com.microsoft.orleans.siloport" -> The inter-silo port used in the cluster for all Silos
GATEWAY_PORT = "com.microsoft.orleans.gatewayport" -> The client gateway port
GENERATION = "com.microsoft.orleans.generation" -> The silo generation
DEPLOYMENT_ID = "com.microsoft.orleans.deploymentid" -> This is the key for a deployment using this component. Will make sure that everyone involved in the Orleans Deployment (either client or silo) will talk to everyone on the same deployment.

Besides all the common Orleans Membership protocol, ping, etc which is already known, and all the implementation of SF Membership Oracle, here is how we describe the two key components of this Membership Oracle for Docker:

DockerSiloResolver -> This class is responsible for read the Docker API (now with a Timer) and refresh its silo list, and publish it to IDockerStatusListener listeners. It query Docker APIs for all containers that have a given DEPLOYMENT_ID label set to the deploymentId and IS_DOCKER_SILO with no value. This ensure that even if there are other containers using that deploymentId (lets say if the developer uses it as a general aggregate of containers) for something else, only the containers which is suppose to have silos are returned.

DockerMembershipOracle -> When initialized, register itself to the IDockerSiloResolver registered on DI, and wait for updates. All its implementation is exactly a copy from the SF one, so threre not much to explain about it. (I would suggest later in other PR to make this class abstract or something else so we can remove this boilerplate code)

DockerGatewayProvider -> When initialized, it register itself with a DockerSiloResolver (no DI on client!!!) and wait for updates. This inherit most of the implementation of SF one and uses the same filter logic but only returns to the client the silo which have gateways installed. Yes, now I'm assuming all silos in the cluster are gateways. We can probably change that and ask for ones with GATEWAY_PORT label in it as well. For the sake of V1, we would live with it.

This implementation works. Silos come up and down (and crash) and they are refreshed on the silo/gateway lists. You are either UP or Down. Since a container is immutable, you don't restart a container because the silo process is dead. If it die, it is automatically killed and discarded. You need to start another fresh container, which will have another container name, IP address, ID, etc.

Silo health

The common MBR we have (Azure, SQL, AWS) rely on a consistent persistence table to store the cluster's silo health. Inside the membership protocol, silos "ping" each other and if it don't get a reply, it suspect that this silo is dead and write its own identity+timestamp of that attempt to the suspicious silo record in the MBR table. If the number of suspicious are > than configured threshold, that silo is marked as dead and no messages/activations are sent to it.

Ok. Given that context, we have a problem with this implementation that rely on Docker...

persistence table to store the cluster's silo health

We don't have a table! This implementation don't have a IMembershipTable. We are instead implementing a IMembershipOracle. The idea behind this provider is to rely on Docker Labels to aggregate silos and gateways by attaching labels to the containers part of the cluster.

There are 2 ways to detect if a silo is alive in this implementation:

  1. The container is in running state.
  2. The HEALTHCHECK tag in Dockerfile -> This will run an arbitrary command inside the container. We can use it to try connect to the silo ports (common approach on linux based containers). If it doesn't connect or timeout, Docker will kill the container and the next refresh on the MBR Oracle will catch that change.

The problem is, what if the other containers/silos part of the cluster can't ping another silo? For example in a network partition... That is why we have the votes. The problem is that we can't store anything in Docker. The labels aren't updatable and the Raft implementation on Swarm, is just for Swarm and we can't touch.

My first idea was to use (another common approach on unix systems) to create a lock file.

Suggested approach to votes

The votes would happen this way:

  1. SiloA try to ping SiloB and fail (no response from SiloB for whatever reason)
  2. SiloA ask Docker to check inside that container the number of files in a specific directory (an arbitrary directory that holds those files)
  3. If the number of files are > than allowed threshold, ask Docker to kill that container
  4. If it is not, ask Docker to create an empty lock file with the SiloA identity + timestamp on SiloB at that give directory. -> Note that there isn't races on writing this files since each silo ask to write its own identity so 0 chance of conflict.
  5. Repeat

In case of a network partition (or whatever other reason), SiloB Oracle also keep checking its own directory, so it can suicide in case of the files are > than allowed threshold.

I know it is not the best thing to do (deal with files) but I don't see a way to make votes to work. Suggestions are appreciated.

IMHO this is enough for a V1 but I appreciate any suggestions.

Challengers for a V2

Label updates

Since containers are immutable, it can't be changed/updated. However, labels are metadata essentially. Docker allow you to update a label in runtime but there are two downsides:

  1. The container is paused while updating. This means the whole container memory is frozen, the label changes are applied, a commit on that container happen, and you have the container back running. This may not noticeable while starting a silo (the update is fast) but I wonder what side effects to the cluster it will have if we do it after the silo started and is joined to the it. Examples of updates would be to mark a silo as dead, so in the next refresh even if the silo process (and the container) is running, we will not have that frozen container in the list or being able to generate the GENERATION value at silo start and update it.
  2. Windows Containers does not support yet update running containers. Linux works just fine (I tested it).

Docker event stream

Would be nice if instead of pooling the Docker API only, we could listen to Docker API stream of events like container_created, container_running, container_dead and react accordingly to update the silo list. However, the stream API is very unstable/experimental at the moment and we shouldn't rely on that yet because only Docker Daemons on experimental mode turned on will emit them. Production environments like AWS or Azure ACS doesn't have it.

Kill an unresponsive silo

Somehow we could kill a container even if it is running but become unresponsive. This is just a matter of invoke an API. I just don't know the moment for it or how the best way to declare a silo dead so this API can be invoked.

I think that is it.

cc: @gabikliot @sergeybykov @ReubenBond

@sergeybykov
Copy link
Contributor

Cross-referencing the concern from #2542 that applies here as well - #2542 (comment).

@galvesribeiro
Copy link
Member Author

I'm closing this issue and the related PR since it isn't required anymore.

With the latest release of Docker 1.13 we don't need an extra provider to run Orleans in it.

I'll create doc page and a sample to explain how to make it work.

@gabikliot
Copy link
Contributor

Interesting. Please do. I will be interested to read.

@mtdaniels
Copy link

@galvesribeiro - did you ever get around to creating the doc page? I am just starting to investigate how to get our service running in Docker and could use some direction :-).

@galvesribeiro
Copy link
Member Author

@mtdaniels - Will try get to it today. If you have any immediate questions please ping me on gitter and I'll try to help.

@galvesribeiro
Copy link
Member Author

@gabikliot and others interested in Orleans+Docker check out #3004 and #3005

@ghost ghost locked as resolved and limited conversation to collaborators Sep 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants