Question: Running Che on Mesos #2536

Spritekin · 2016-09-22T03:53:48Z

Hi,

Note: Originally posted in the che forums but someone suggested to post here instead.

This question is theoretical, just looking for comments.

I was wondering if I can run che on a mesos cluster. For example, start che launcher using marathon, then mount an NFS to store the workspaces.

As the NFS is connected to all the slaves, then marathon can start che in any node and it will find the NFS with the workspaces.

It also means if the che launcher fails or the node crashes, marathon will relocate che to other node for minimal downtime. Small file losses maybe but nothing critical.

My nodes are AWS m4.2xlarge nodes with 8 CPUs and 32GB RAM each.

So my questions:
a. How much RAM and cpu are recommended for the launcher and the slave.

b. How much RAM is required per concurrent session? Does each session starts an independent runtime?

c. I imagine the runtime runs in the same machine as the che server. Is it possible to set the used RAM for the runtime?

d. Is there some tool in Eclipse Che to burn my program into the runtime and output a new image ready to work (I use python)? Is there a tool to build a new runtime image from a previous runtime? Imgine, starting from python:2.7, then pip install boto3, build new image, push image to my registry, change runtime to the new image, all without leaving che?

e. Is it possible to ask che to run the programs starting the runtime in other server?

f. How resilient is che to a crash? File or project loses?

g. Any plans to make this a framework... like installing a mesos framework plugin into eclipse? Then run my program as a daemon in other server and leave it running as a test of my machine learning code, while I continue coding.

Just a few questions that came to my head. I think che is awesome and just what I needed, I just need to integrate this into my architecture.

Thanks!

TylerJewell · 2016-09-22T21:05:55Z

Hi @Spritekin - what you are thinking about with Mesos, sounds like it could be possible. The biggest challenge will be that (right now) Che must have direct access to the Docker daemon. We are working on an SPI for later this year which will allow different platform providers to intercept our workspace management commands that are directed for a Docker daemon and then invoke specific APIs of a provider such as Mesos or OpenShift.

So, if you are going to snapshot and move files around, you will need to capture the Che server, the Che configuration, all of the projects (which are stored in CHE_DATA_FOLDER), and then also all of the workspace snapshots, which will be Docker images, stored in the appropriate Docker location (or registry). All of those assets will need to be on an NFS mountable directory.

a. How much RAM and cpu are recommended for the launcher and the slave.
TYLER: launcher is stateless - it destroys itself after it finishes starting or stopping a che server. The che server (your slave here), doesn't need a lot of RAM to run. The workspaces, however, may need a lot of RAM / CPU depending upon what your users need to do.

b. How much RAM is required per concurrent session? Does each session starts an independent runtime?
TYLER: The Che server itself can handle 1000s of concurrent sessions, and it's a very linearly scalable solution. The issue you need to zone in on is around the workspaces. Each workspace that is running will be separate containers, and they will have RAM requirements. So the bottleneck of scale will really be around how your system handles many concurrent running workspaces. Stopped workspaces are snapshot, so you also have to manage the total disk imprint of total number of overall workspaces as each snapshot can be a big image.

c. I imagine the runtime runs in the same machine as the che server. Is it possible to set the used RAM for the runtime?
TYLER: Each workspace's runtime is run in separate containers. I don't think that our agents that are in each workspace are doing a lot of reporting on current workspace state. But I know that for ARTIK we built a resource monitor that we can inject. I also konw that we are working on having our agents detect OOM errors. But I don't think we are reporting how much RAM is in active use. We assume that once your ws has been allocated "x" of RAM, that we are using all of that.

d. Is there some tool in Eclipse Che to burn my program into the runtime and output a new image ready to work (I use python)? Is there a tool to build a new runtime image from a previous runtime? Imgine, starting from python:2.7, then pip install boto3, build new image, push image to my registry, change runtime to the new image, all without leaving che?
TYLER: We do not have such a tool. While Dockerfiles are somewhat similar to just executing processes within a shell, they are also fairly different. So unfortunately, you do need to write a Dockerfile for your particular runtime time. We will ship in a few weeks a stack editor that makes adding new stacks with runtimes to che a lot easier.

e. Is it possible to ask che to run the programs starting the runtime in other server?
TYLER - not sure I understand what this means. If you are asking about whether Che can allow workspace containers to run on different machines? Codenvy support this today. Che does not. When we add the SPI in the future, someone can implement algorithms for sending workspaces to different physical nodes.

f. How resilient is che to a crash? File or project loses?
TYLER - there is no HA built into the system. If Che crashes, all project files will be fine. But if the workspace image was not snapshot, then any non-project state changes will be lost. But Che crashing does not mean that the workspace will crash. They are independent.

g. Any plans to make this a framework... like installing a mesos framework plugin into eclipse? Then run my program as a daemon in other server and leave it running as a test of my machine learning code, while I continue coding.
TYLER: We already provide chedir and the che CLI, which allows for running che servers in a variety of scenarios. Chedir capability is designed to be competitive to vagrant. Also, Che itself is just a set of WAR files - so those can be run directly within another app server if you wanted.

Spritekin · 2016-09-23T04:36:40Z

Thanks Tyler!

I will do some tests and post again on my experiment results.

Luck!

Spritekin · 2016-10-13T23:25:11Z

Closing as I think it will reguire this to work at least in a minimum level.
#2692

A more complicated approach would be to use some volume management solution like flocker but it would be a very hard requirement so I prefer staying away from it.

TylerJewell · 2016-10-14T01:03:42Z

@Spritekin - you may want to look at the work of Google, which has some preliminary work to get Che working on Kubernetes. They are using docker in docker to achieve some interesting behaviors. https://github.com/wernight/kubernetes-che

Spritekin · 2016-10-16T01:11:18Z

@TylerJewell
Interesting, I would need to setup kubernettes support for mesos and see if it works with the YAML

The problem is not making Che work itself. The problem is that if the host crashes and is automatically restarted, it will probably go to other node so it won't find the workspaces. By having the worspaces in a NFS it doesn't matter in which host it starts and I can start multiple che servers.

You have no idea how much I want to put Che to work. I have researchers and each one likes their platform (Windows, Linux, OSX). Having Che as their common IDE sharing workspaces would be a boon.

TylerJewell · 2016-10-16T02:55:24Z

Also - you may want to take a look at this PR - which simplifies how che runs as a container. We have simplified everything down to a single volume mount. #2786

shangyaqi · 2017-11-27T03:13:43Z

we also want to Running Che on Mesos, cloud you please give me some advice and guidance?

gorkem · 2017-11-27T03:24:42Z

We should soon have instructions for running Che on kubernetes. I think running Che on kubernetes running mesos is your best bet.

@l0rd Do you know if we have an issue created for kubernetes support that can be followed up?

l0rd · 2017-11-27T09:52:04Z

@gorkem no there is no issue dedicated to k8s. I had in mind to merge what I had done for JavaOne to master but the official k8s support should come with Che 6 and the SPI. Anyway if it can be useful here is the branch with the code, the bash script to install Che on k8s, and mariolet/che-server:kube-201710101353 is a che image ready to use with k8s.

shangyaqi · 2017-12-04T00:58:22Z

oh I see new SPI for v6 . I want to know when will be released

ghost added the kind/question Questions that haven't been identified as being feature requests or bugs. label Sep 22, 2016

Spritekin closed this as completed Oct 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Running Che on Mesos #2536

Question: Running Che on Mesos #2536

Spritekin commented Sep 22, 2016 •

edited

Loading

TylerJewell commented Sep 22, 2016

Spritekin commented Sep 23, 2016

Spritekin commented Oct 13, 2016

TylerJewell commented Oct 14, 2016

Spritekin commented Oct 16, 2016

TylerJewell commented Oct 16, 2016

shangyaqi commented Nov 27, 2017

gorkem commented Nov 27, 2017

l0rd commented Nov 27, 2017 •

edited

Loading

shangyaqi commented Dec 4, 2017

Question: Running Che on Mesos #2536

Question: Running Che on Mesos #2536

Comments

Spritekin commented Sep 22, 2016 • edited Loading

TylerJewell commented Sep 22, 2016

Spritekin commented Sep 23, 2016

Spritekin commented Oct 13, 2016

TylerJewell commented Oct 14, 2016

Spritekin commented Oct 16, 2016

TylerJewell commented Oct 16, 2016

shangyaqi commented Nov 27, 2017

gorkem commented Nov 27, 2017

l0rd commented Nov 27, 2017 • edited Loading

shangyaqi commented Dec 4, 2017

Spritekin commented Sep 22, 2016 •

edited

Loading

l0rd commented Nov 27, 2017 •

edited

Loading