Skip to content

How to run parcels on Lorenz or Gemini

Vecko edited this page Jul 22, 2024 · 25 revisions

Lorenz

Lorenz is the IMAU oceanography group's server and the preferred machine for running Parcels simulations.

Connecting

When working at the university, go directly to step 2.

  1. In a terminal, type ssh yoursolisid@gemini.science.uu.nl and enter your solis-password.
  2. On gemini, type ssh yoursolisid@lorenz.science.uu.nl and enter your solis-password.
  3. To disconnect, type exit once (at the office) or twice (at home).

Unfortunately, this login procedure is rather cumbersome, especially from home. Luckily it's possible to store most details, saving keystrokes and human memory use. Read more on these workflow improvements below.

Automated SSH login steps

You can automate the login procedure with the following two steps

Workflow improvement 1: store SSH hostnames

  1. On your local machine, edit the file ~/.ssh/config, create if it doesn't exist.
  2. Add the following lines:
Host gemini
   hostname gemini.science.uu.nl
   user <yoursolisid>

Host lorenz
   hostname lorenz.science.uu.nl
   user <yoursolisid>
   ProxyJump gemini
  1. Save file and exit.

From now on, you can connect with just ssh lorenz. The connection will be routed through gemini automatically, thanks to ProxyJump.

Workflow improvement 2: automated SSH authentication

If you don't want to enter your password every time, add your public ssh key to ~/.ssh/authorized_keys on lorenz, or follow these instructions (Linux/OSX):

  1. On your local machine, type ssh-keygen (default filename id_rsa is OK, passphrase is optional).
  2. Type ssh-copy-id gemini, enter your solis-password when asked.
  3. Type ssh-copy-id lorenz, enter your solis-password when asked.

Windows users may want to use PuttyGen. Alternatively, if you use Windows 10, you can follow these instructions:

  1. On your local machine, open PowerShell (in Administrator mode) and type ssh-keygen (default filename id_rsa is OK, passphrase is optional).
  2. Type cat ~/.ssh/id_rsa.pub and copy the output.
  3. Type ssh gemini, and once connected type nano ~/.ssh/authorized_keys (or create the file if necessary).
  4. Paste the output from step (2).
  5. Repeat steps (3) and (4), replacing gemini with lorenz.

Setting up parcels

The latest official release of parcels

If you only want to run parcels, you can do module load parcels and you will get the latest release of parcels.

However, if you want to also use other python packages, you can best create your own conda environment To do that, follow these steps:

  1. On the head node (i.e., not a compute node) first load miniconda with module load miniconda
  2. Then, install parcels and all its dependencies with conda create --prefix ~/parcels_env -c conda-forge parcels
  3. Update your bash settings with conda init bash. Re-log in so that the changes to your ~/.bashrc file take effect.

The next time you log in, the only steps to take are

  1. module load miniconda
  2. conda activate ~/parcels_env

You could also add these last two commands in your ~/.bashrc file so that they are automatically executed when you log in. *NOTE: these two commands must be placed below the # <<< conda initialize <<< block of commands.

The developer version of parcels

If you want to use the developer version of parcels (because you want/need features that are not yet in a release, or because you want to fixes yourself), you can follow these steps:

  1. On the head node (i.e., not a compute node) first load miniconda with module load miniconda
  2. Clone the parcels git repository with git clone https://github.com/OceanParcels/parcels.git
  3. Go into the new parcels directory with cd parcels
  4. Create a conda environment in your home directory with conda env create -f environment.yml --prefix ~/parcels_env
  5. Activate the new environment with conda activate ~/parcels_env
  6. Finish the installation with pip install --no-build-isolation --no-deps -e .

Don't worry if it seems that version 0.0.0 has been installed; that is a fluke. You can check whether your developer installation works, by

  1. First checking that python points to the new environment: which python should return ~/parcels_env/bin/python
  2. Go into the python command line (by simply calling python), then import parcels and then print(parcels.__file__). That last command should return /storage/home/USERNAME/parcels/parcels/__init__.py

The next time you log in, the only steps to take are

  1. module load miniconda
  2. conda activate ~/parcels_env

You could also add these last two commands in your ~/.bashrc file so that they are automatically executed when you log in.

This setup will give you access to the latest master branch of parcels, and you can also pull, push and change branches with git commands in your parcels directory.

Hydrodynamic data

The hydrodynamic data on lorenz is stored at /storage/shared/oceanparcels. If you need access to more datasets there, let Erik know. See here for information how to use the MOi data, including some example code for creating a FieldSet.

Automatic GitHub authentication

GitHub facilitates basic interaction with repositories (pull/push) over two different protocols: SSH and HTTPS. The following instructions set you up for automatic authentication for HTTPS repo's, for SSH (using ssh keys) see here.

  1. generate a Personal Access Token (PAS) on this page. Name it "lorenz". Regarding options, select "repo", “read:org” and “workflow". Copy the resulting token to your clipboard.
  2. on Lorenz, load parcels-dev if you haven't already (module load parcels-dev). You'll need this for the gh command in step 3.
  3. login using your new PAS using gh auth login. Choose the options Github.com, HTTPS, Github Credentials, Authentication Token. In the last step, paste the token from step 1.
  4. From now on, authentication should be automatic (read: no passwords required) when interacting with Github HTTPS repositories, for instance the Parcels clone that you made in the preceding section.
  5. In case it still doesn't work, you may need to add the following lines to the file ~/.gitconfig:
[credential "https://github.com"]
        helper = !gh auth git-credential

Running interactive jobs

The Lorenz cluster is composed of one main node (where you login) and several compute nodes. It is strongly preferred to do all compute-intensive or data-intensive work on a compute node. You request an interactive compute-node allocation as follows:

srun -n 1 -t 2:00:00 --pty bash -il

This will request a single CPU core (-n 1) for 2 hours (-t). When you're done working, please release your allocation by typing

exit

A more advanced script for requesting interactive nodes.

You can save the following script as request_interactive_node.sh or something similar, so that you do not need to remember the above line anymore. Running bash request_interactive_node.sh will automatically log you into a compute node. You can still request a specific partition, node or deviating runtime using the options [-p partition] [-n node_name] [-t runtime].

#!/bin/bash

# Default partition is "normal"
partition="normal"
# Default node name is unset
node_name=""
# Default runtime is 8 hours for "normal" partition
runtime="08:00:00"

# Parse command line arguments
while getopts "p:n:t:" opt; do
  case ${opt} in
    p ) partition="$OPTARG"
      ;;
    n ) node_name="$OPTARG"
      if [ "$node_name" == "node09" ]; then
        partition="short"
        runtime="03:00:00"
      fi
      ;;
    t ) runtime="$OPTARG"
      ;;
    \? ) echo "Usage: request_interactive.sh [-p partition] [-n node_name] [-t runtime]"
      exit 1
      ;;
  esac
done

if [ "$partition" == "short" ]; then
  echo "Requesting interactive node on short partition"
  runtime="03:00:00"
  node_name="node09"
else
  echo "Requesting interactive node on normal partition"
fi

# Request an interactive session
if [ -z "$node_name" ]; then
  srun --partition="$partition" --time="$runtime" --pty bash -il
else
  srun --partition="$partition" --nodelist="$node_name" --time="$runtime" --pty bash -il
fi

Running batch jobs

See https://github.com/IMAU-oceans/Lorenz

Note that most Parcels jobs use only one core, so set #SBATCH -n 1

Using VS Code on a compute node

Another way to develop, debug and analyze your work on Lorenz is through VS Code. VS Code supports interactive Spyder-like development, debugging, Git, and Jupyter notebooks. It also has a built-in file browser. Lastly, VS Code supports many plug-ins, amongst which Github Copilot: an AI that can give code suggestions (similar to ChatGPT). GitHub Codepilot is free for students, teachers and open source developers. UU students (including PhD candidates) can apply for a Student Developer pack by sending an email to info.rdm@uu.nl. Post-docs and staff can apply for Teacher Benefits via GitHub Education. See this page for more info about GitHub @UtrechtUniversity.

To use VS Code on Lorenz:

  1. Install VS Code on your own computer.
  2. Set-up SSH authentication properly. See instructions for at the top of this page, under 'Connecting' (Workflow improvement 1: store SSH hostnames and Workflow improvement 2: automated SSH authentication).
  3. In VS Code, install the Remote - SSH extension.
  4. On the bottom-left, there is now a green remote connection button. If you click on it, a menu pops up, where you can click Connect to Host. This will be used to connect to the compute nodes later.
  5. For Python development, install the Python, Pylance, and Jupyter extensions. Optionally install the GitHub Copilot AI extension.
  6. It is important to always use VS Code on a compute node, so that adequate resources are reserved for you, and you don't slow down the main login node. You need to add the interactive nodes as an available host to your SSH Config. In VS Code, open the command palette (CMD/CTRL + SHIFT + P) and type ssh config.
  7. Choose your configuration file (usually the first).
  8. Add the (short) compute nodes to your config file:
Host lorenz-compute
	ForwardAgent yes
	StrictHostKeyChecking no
	UserKnownHostsFile=/dev/null
	ProxyCommand ssh lorenz "/opt/slurm/bin/salloc --nodes=1 --mem=64G --cpus-per-task=1 --partition=normal --time=8:00:00 /bin/bash -c 'nc \$SLURM_NODELIST 22'"
	User 1234567
Host lorenz-short
	ForwardAgent yes
	StrictHostKeyChecking no
	UserKnownHostsFile=/dev/null
	ProxyCommand ssh lorenz "/opt/slurm/bin/salloc --nodes=1 --mem=64G --cpus-per-task=1 --partition=short --time=3:00:00 /bin/bash -c 'nc \$SLURM_NODELIST 22'"
	User 1234567

Make sure to choose the right user name (replace 1234567). *NOTE: If you are on a Windows machine, remove the backslash before $SLURM_NODELIST in the above config lines.

Now you can choose the compute nodes on the normal and short partition as available hosts in Lorenz. Note that in the above example, a memory restriction of 64GB is set, and you'll run on one CPU (you can change this if you need multiple CPUs).

⚠️ Never run VS Code on the main node. When choosing the SSH host in VS Code, always choose lorenz-compute or lorenz-short. Never simply choose lorenz.

If you are done using the remote connection, close it properly by clicking 'file > close remote connection'. If you don't do this, processes can linger around on the compute node. If for some reason the connection closed unexpectedly, you'll have to clean up the remaining server processes yourself:

  1. Open a terminal and type: ssh lorenz
  2. Log into the compute node that you were on: e.g. ssh node01
  3. Check if there are processes running ps -ef | grep SOLISID (replacing SOLISID with your Solis-ID.
  4. This should show if you still have running processes, which will include .vscode-server in them.
  5. Kill these processes using their PID (second column), i.e. kill 12345. It is also possible to kill all your open processes using killall -u SOLISID, but note that this will also kill processes that are not related to VS Code. So don't use this option if you have important computations running on this node.

Connection errors related to turned off nodes (timeout)

Lorenz has a power saving feature: nodes that have been idle for a while will be turned off. In principle, the SSH configuration that we've specified above should force SLURM to automatically find a suitable node for your interactive session. If there is no powered-on node with the right resources, it will boot up a new node and redirect VS Code to it. However, VS Code is impatient and doesn't like to wait for the node to be powered on. Instead it will return a timeout-error.

There are 2 ways to circumvent this issue:

  1. You can change the amount of time that VS Code will wait before returning a time-out error. This can give SLURM the time to allocate resources and boot up a node. To do so, go to settings (gear icon in left bottom), search for ssh and change the Remote.SSH: Connect Timeout setting to a higher number, for instance 60 seconds.
  2. You can also turn on a powered off node yourself, so that once VS Code tries to find a powered on interactive node, it can immediately reach it. You can turn on a node by requesting a 'dummy' interactive session. To do so, open a plain terminal and SSH to Lorenz. Check which nodes are idle or in use by using sinfo and/or squeue. Then use the request_interactive_node.sh script (mentioned above) to request an interactive session on a node that needs to be powered on. This script can be found above under the link "A more advanced script for requesting interactive nodes.". For instance request_interactive_node.sh -n node02 requests an interactive session on node02. If node02 has been powered off, requesting a session will boot it. Once you're into the node, type exit again to leave the interactive session and release the associated resources. Now that at least one node, node02, is powered on, you should be able to request an interactive session using VS Code again.

Other connection errors

In case you have other issues connecting to Lorenz via VS Code, you can first check if you have too many jobs runnings. At present, the limit on the number of concurrent jobs is 3. Sometimes when VS Code instantiates a interactive job and doesn't connect correctly (patchy wifi, lost connection etc.), the job remains running for the allocated time requested. To check this, open a terminal (or PowerShell if using Windows), and type ssh lorenz. Once connected, type squeue -u solisid where solisid is replaced with your own solisid. If there are several interactive jobs running, you can terminate them using the command scancel jobid where jobid is the job you want to cancel.

Jupyter Lab on a compute node

Running a interactive Jupyter notebook is a great way to work and can be easily done via VScode, which handles the SSH tunneling for you.

  1. Connect to a compute note through VScode remote SSH.
  2. Run conda activate ~/parcels_env (assuming you followed the setup instructions before), and install Jupyter Lab if you haven't already (conda install -c conda-forge jupyterlab)
  3. cd into your folder of choice and run jupyter lab.
  4. Click the link printed in your terminal. This will open a browser on your local machine connected to the Jupyter Lab instance on the server.

Gemini

Gemini is the name of the UU ssh-server that can be used by students and staff to run simulations.

How to connect to gemini?

  • In your terminal/command prompt, type ssh yoursolisid@gemini.science.uu.nl and enter your solis-password
  • If you don't want to do this again every time, create a config-file in ~/.ssh/
  • To disconnect, type exit

Setting up parcels on gemini

You can create a conda environment, and set up parcels within it by completing the following steps:

  • Load miniconda by typing module load miniconda/3. Put this line into your ~/.bashrc such that it is executed automatically every time on startup. Make sure that your ~/.bash_profile says source $HOME/.bashrc
  • git clone parcels into your home directory, i.e. go to your home directory and type git clone https://github.com/OceanParcels/parcels.git. You can also put it somewhere else of course, then you need to choose that path by setting the PYTHONPATH variable later
  • Go into the newly created parcels folder with cd parcels.
  • Install the needed environment (if it does not exist yet from previous parcels versions) by typing conda env create -f environment.yml. Note that you have very limited space on gemini, so you cannot have many environments. If you want to delete an old one, check out this page.
  • Activate the environment: conda activate parcels. To be able to use conda activate you might need to first initialize the conda commands by typing conda init bash.
  • Still in the parcels directory, type pip install --no-build-isolation --no-deps -e .

parcels is now set up. To run it, always activate the parcels environment first with conda activate parcels

Running jobs on gemini

You can run your codes with python just as usual: python your_file.py. However, this will run the file on the front node, which is not good practice. You can use it for a very short test, but not for a large run. For larger runs, submit a job with qsub (http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html), see this example. To submit a job, use qsub -V examplejob.sh (the -V is necessary!).

Sometimes, you may need to adapt the parameters of qsub to run larger experiments. A more advanced script can look like this:

#!/bin/bash
# SGE Options
#$ -S /bin/bash
# Shell environment forwarding
#$ -V
# Job Name
#$ -N <YOUR_EXPERIMENT_NAME_HERE>
# Notifications
#$ -M <YOUR_UU_MAIL_ADDRESS_HERE>
# When notified (b : begin, e : end, s : error)
#$ -m es
# Set memory limit (important when you a lot of field- or particle data(
#  Guideline: try with 20G - if you get a mail with subject "Job <some_number_here> (<EXPERIMENT_NAME>) Aborted",
#  then check the output line "Max vmem". If that is bigger that what you typed in here, your experiment ran
#  out of memory. In that case, raise this number a bit.
#  (1) 'h_vmem' cannot be >256G (hard limit); (2) Do not start with with 200G or more - remember that you
#  share this computer with your next-door-neighbour and colleague.
#$ -l h_vmem=20G
# Set runtime limit
#$ -l h_rt=24:00:00
# run the job on the queue for long-running processes: $ -q long.q

echo 'Running Stommel example ...'
cd ${HOME}/parcels/parcels/examples
python3 example_stommel.py
echo 'Finished computation.'

You can put this directly into a shell script, e.g. experiment.sh, replace python example_stommel.py with your own experiment and the placeholders with your own information, and then run it via qsub -V experiment.sh.

Warning for windows users: If you create a shell script on your own windows device, the job subscription might return an error starting with "failed searching requested shell because:". If so, make sure the line endings in your shell script do not include the "^M" character. To check this you can use vim as described here and solve the problem with the answers here

Once a job is submitted you can check for warnings and errors in the file .e where the job ID is the number you get to see when submitting a job. Note that there is an unsolved issue with bash that always leads to the following error:

/bin/bash: module: line 1: syntax error: unexpected end of file
/bin/bash: error importing function definition for `BASH_FUNC_module'

This should not cause any problems for your python script however, so you can simply ignore it.

Useful commands for submitted jobs

  • qstat displays the jobs you have submitted
  • qdel 123 to kill the job with id 123 (you can see the id when using qstat). You can of course only kill your own jobs.

Hydrodynamic data on gemini

Our data is stored in /data/oceanparcels/input_data. If you encounter an error due to not having permission to access the data, contact e.vansebille@uu.nl

Space and scratch directories

You have a maximum of 15 GB of space in your home directory on gemini. To check how much storage you have used, you can type du -s in your home directory. If you write large files, write them to your scratch directory: /scratch/yourname. If this directory does not exist, create it with mkdir /scratch/yourname. The files in your scratch folder may be removed after a few weeks of inactivity. For long term storage of output files use /data/oceanparcels/output_data/data_<YOUR_NAME>/.

Copying files between your local computer and gemini

Use scp or rsync. E.g. type on your local computer scp wichm003@gemini.science.uu.nl:/home/staff/wichm003/file ./ to copy the file file to your computer.

The two nodes (important if you don't find your output files in scratch!)

There are two nodes available on gemini, called science-bs35 and science-bs36. After connecting to gemini, you will be logged in to bs35. To switch to bs36, type ssh science-bs36, and exit to go back. Importantly, you have different scratch-directories on both nodes. If you just submit your job as explained above, either one of the nodes could be used, so check both scratch-directories if you are missing output files. If you want to submit to a specific node, use the -l hostname=<nodename> option: qsub -l hostname=science-bs36 -V TestSubmit.sh. Warning: The science-bs35 node does not execute scripts with the option -q long.q. If you use the parcels module, note that it might be necessary to load the parcels module from the same terminal window on the node you want to execute from. Also note that the parcels versions on both nodes are not necessarily the same. Check this beforehand - it might lead to errors otherwise!

Working with parcels

There are two options to work with parcels on gemini: with your local parcels version or with the parcels module available on gemini. In either case, jobs need to be submitted with the -V option: qsub -V jobscript.sh.

  1. Use the parcels module available on gemini. Type (or add to your .bashrc) the command module load parcels. To display the available modules, type module avail. Other options for modules are displayed after logging in to gemini.
  2. Use your local version, i.e. just as on your own computer. In order to do that, you need to add the parcels directory to your PYTHONPATH in your ~/.bashrc: export PYTHONPATH="$PYTHONPATH:/home/staff/wichm003/parcels".

Remote access via Jupyter and a browser

Using Jupyter with Gemini can easily be done with VScode, as detailed above however this time you connect to Gemini instead of a Lorenz compute node.

Access to gemini on windows

Bitvise Client is a program that helps to access gemini very easily on windows. It has a file browser and a terminal. Use host: gemini.science.uu.nl and port: 22 and log in with your solis-id password.

In case of problems

If you have problems, ask the other people in the group. If they don't know, ask Carel van der Werf.

Cartesius

The Cartesius cluster no longer exists, the instructions are kept for reference only.

Toggle instructions

How to connect to cartesius?

  • You need an account. Ask the PI of your project to ask for one with an E-Mail to helpdesk@surfsara.nl.
  • Connect via ssh yourcartesiusname@cartesius.surfsara.nl

Using cartesius

Use the sbatch command for job submission. There is a very good documentation available here: https://userinfo.surfsara.nl/systems/cartesius/usage. Read the important parts about the number of cores per node and the parameters that should be in your script.

Setting up parcels on cartesius

  • This is similar to gemini, but you load anaconda in the following way (put it in your ~/.bashrc):

module load 2019

module load Anaconda3/2018.12

  • The rest is completely similar to gemini.

Other modules

  • There are many modules available on cartesius, which have to be loaded manually. For example, type module load nco. The documentation of surfsara is very good! If you miss a module, just google how to load it on cartesius.
  • Note that ncdump will work within your parcels environment

Scratch-spaces

There are two scratch directories: local and shared. You can create your own directory in the scratches as explained here: https://userinfo.surfsara.nl/systems/cartesius/filesystems. If you write data to scratch, write it to scratch-shared/yourname (not to scratch-local, it does not work).

Special issue on submitting several jobs with parcels

So far parcels can be run only in serial mode. Through the compilation there are some files created in your local scratch for each execution of parcels. It was found out that there can be conflicts at the execution of several runs. This is solved by putting a sleep command between the different executions. In this example, we do five runs, each at a different core (note the & at the end of the lines).

Hydrodynamic data on cartesius

Our data is stored at /projects/0/topios.

In case of problems

Surfsara has a very good helpdesk where you receive an answer usually within a few hours: helpdesk@surfsara.nl.