Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize orchestration scripts #10

Closed
davidonlaptop opened this issue Feb 21, 2015 · 2 comments
Closed

Optimize orchestration scripts #10

davidonlaptop opened this issue Feb 21, 2015 · 2 comments
Assignees
Milestone

Comments

@davidonlaptop
Copy link
Member

Create a parametrizeable script that can setup a complete genomic processing environment (snap, adam, avocado, spark, hdfs). The script should be the same regardless if it is executed on 1 machine or a cluster.

Parameters

An array of objects with the following properties:

  • container hostname / IP
  • docker image
  • container data directory

What was done

Let's start with the bash scripts created by Sebastien Bonami, and update them with the new docker images.

Potential solution

Fig seems to be the best solution seems it is created by the Docker team, and it is planned to be integrated into docker project soon and renamed Docker Compose (see here, and here).

@flangelier
Copy link
Contributor

DISCUSSION

Images we need:

  • Snap
    • The setup script should only pull the image for one docker since we cannot distribute the job anyway
    • When we want to do one of the jobs, we should just run the container with the parameters
  • Adam
  • Avocado
  • Spark

@davidonlaptop
Copy link
Member Author

Design Template

Service Design Template

This applies for Hadoop HDFS, Spark for now. Later on, we'll add MapReduce.

orchestrate <env> <service> <service-params>
orchestrate <env> hdfs <nn-host> <snn-host> <dn1-host> <dn2-host> ...
orchestrate <env> spark <spark-master-host> <worker1-host> <worker2-host> ...

# For localhost environment
# Uses config files (Hadoop): env/local/hadoop/hdfs-site.xml
# Uses config files (Spark): env/local/spark/spark-config-file.yml
orchestrate local hdfs localhost localhost localhost
orchestrate local spark localhost localhost

# For Mac mini cluster
# Uses config files (Hadoop): env/macmini/hadoop/hdfs-site.xml
# Uses config files (Spark): env/macmini/spark/spark-config-file.yml
orchestrate macmini hdfs mini1 mini1 mini2 mini3 mini4

Genomic Design Template

# Spawns a docker container named 'snap1', 'snap2', ... with the requested params
snap <snap-host> <snap-params>

# Spawns a docker container named 'adam1', 'adam2', ... with the requested params
adam <adam-host> <adam-params>
ou au besoin:
adam <adam-host> <spark-host> <adam-params>

# and so on...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants