Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend requirements for HPC: use of nix instead of modules for instance #16

Open
Trophime opened this issue Jan 22, 2018 · 8 comments
Open
Assignees

Comments

@Trophime
Copy link

So far the orchestrator plugin is designed for an HPC with modules and slurm.
Would it be possible to extend it to other requirements like "nix" for instance?

@emepetres emepetres self-assigned this Feb 5, 2018
@emepetres
Copy link
Contributor

@Trophime what do you mean about "nix"? nixos? how would it work?

@Trophime
Copy link
Author

Trophime commented Feb 7, 2018

At Grenoble Meso center they are using nix instead of modules:

NIX is a Linux distribution that allows local packages installation. It is independent from the hosts distribution and allows a simple user, with no privileges, to install packages of it's choice. It is specifically interesting in an HPC context where users need a lot of different libraries, sometimes of different versions. Nix also offers a way to develop into an isolated and reproducible environment. Those environment or resulting packages can be easily and efficiently shared between users.

Currently, NIX is only available on Froggy and Luke, but the goal is to have it installed on every Ciment clusters. Also note that it's very easy to use Nix on your own desktop computer, and that you'll find a very similar environment. Check this quickstart to setup Nix on your own Linux computer. It's also possible on OS X

see also: https: //gricad.github.io/calcul/nix/hpc/2017/05/15/nix-on-hpc-platforms.html

@Trophime
Copy link
Author

Trophime commented Feb 7, 2018

here is an example of a script to launch calculation on 256 nodes with OAR and NIX:

#!/bin/bash
#OAR -n HL-31
##OAR -l /nodes=16/core=16
#OAR -l /nodes=16,walltime=2:00:00
#OAR -t devel
##OAR --stdout HL-31_%jobid%.out
##OAR --stderr HL-31_%jobid%.err
#OAR --project hpcfeelpp
#OAR --notify exec:/usr/local/bin/sendmail.sh

# Ensure Nix is loaded. The following line should be into your ~/.bashrc file.
source /applis/site/nix.sh

# Run the program
# Number of cores
nbcores=`cat $OAR_NODE_FILE|wc -l`
# Number of nodes
nbnodes=`cat $OAR_NODE_FILE|sort|uniq|wc -l`
#Name of the first node
firstnode=`head -1 $OAR_NODE_FILE`
#Number of cores allocated on the first node (it is the same on all the nodes)
pernode=`grep "$firstnode$" $OAR_NODE_FILE|wc -l`
echo "nbcores=" $nbcores
echo "nbnodes=" $nbnodes

HIFIMAGNET_APPSDIR=/scratch/trophime/feelpp_build/clang-3.7/research/hifimagnet/applications
mpirun -np 256 \
  -machinefile $OAR_NODEFILE -mca plm_rsh_agent "oarsh" \
  $HIFIMAGNET_APPSDIR/MagnetModels/feelpp_magnetmodels3DP1N1_linear_reg \
  --config-file HL-31-H1H8-Leads-air_singular_256_json.cfg

@Trophime
Copy link
Author

Trophime commented Feb 7, 2018

this is typically the kind of script I would like to generate with the orchestrator to use on Meso center.
Note that the Meso center in Strasbourg is also considering to move to Nix instead of Modules.

Oar is specific to Grenoble

@Trophime
Copy link
Author

Trophime commented Feb 7, 2018

more info on Grenoble Meso center and nix could be found here

@victorsndvg
Copy link
Member

Hi @Trophime ,

if I understood correctly, nix is an alternative to LMOD, right?

If this is true, I cannot understand how to use it from your example. Maybe I'm missing something. Can you extend the usage example? for example, a table comparing its usage against LMOD could be awesome.

On the other hand, you talk about Oar, that seems to be a resources manager. Can you extend the description about it? again, a table comparing against Slurm usage could be fantastic!

@Trophime
Copy link
Author

for OAR I found this guide
from your Luxembourg collegues.

@Trophime
Copy link
Author

Trophime commented Feb 10, 2018

nix is indeed an alternative to LMOD.
I will try to find some docs to illustrate the use of nix in HPC context
There is an article from the IT guy in Grenoble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants