Welcome to the Cerulean tutorial. This tutorial demonstrates the basics of using Cerulean: using local and remote file systems, running processes locally and remotely, and using schedulers.
To install Cerulean, use
pip install cerulean
If you're using Cerulean in a program, you will probably want to use a virtualenv and install Cerulean into that, together with your other dependencies.
The file access functions of Cerulean use a pathlib
-like API, but unlike in pathlib
, Cerulean supports remote file systems. That means that there is no longer just the local file system, but multiple file systems, and that Path objects have a particular file system that they are on.
Of course, Cerulean also supports the local file system. To make an object representing the local file system, you use this:
import cerulean
fs = cerulean.LocalFileSystem()
And then you can make a path on the file system using:
import cerulean
fs = cerulean.LocalFileSystem()
my_home_dir = fs / 'home' / 'username'
In this example, my_home_dir
will be a cerulean.Path
object, which is very similar to a normal Python pathlib.PosixPath
. For example, you can read the contents of a file through it:
import cerulean
fs = cerulean.LocalFileSystem()
passwd_file = fs / 'etc' / 'passwd'
users = passwd_file.read_text()
print(users)
Note that cerulean.Path
does not support open()
. Cerulean can copy files and stream data from and to them, but it does not offer random access, as not all remote file access protocols support this.
You can use the /
operator to build paths from components as with pathlib
, and there's a wide variety of supported operations. See the API documentation for cerulean.Path
for details.
Cerulean supports remote file systems through the SFTP protocol. (It uses the Paramiko library internally for this.) Accessing a remote file system through SFTP goes like this:
import cerulean
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
with SftpFileSystem(term) as fs:
my_home_dir = fs / 'home' / 'username'
test_txt = (my_home_dir / 'test.txt').read_text()
print(test_txt)
Since we are going to connect to a remote system, we need a credential. Cerulean has two types of credentials, PasswordCredential
and PubKeyCredential
. They are what you expect, one holds a username and a password, the other a username, a local path to a public key file, and optionally a passphrase for the key.
Once we have a credential, we can open a terminal. Like a terminal window on your desktop, a Terminal
object lets you run commands. Cerulean supports local terminals and remote terminals through SSH. Since the SFTP protocol is an extension to the SSH protocol, we need an SSH terminal connection first, so we make one, connecting to a host, on a port, with our credential. This terminal holds an SSH connection, which needs to be closed when we are done with it. SshTerminal
is therefore a context manager and needs to be used in a with
statement. Note that LocalTerminal
is not a context manager, as it does not hold any resources.
Once we have the terminal, we can make an SftpFileSystem
object, and from there it works just like a local file system. Just like SshTerminal
, SftpFileSystem
is a context manager, so we need another with
-statement.
When running jobs on HPC machines, you often start with copying the input files from the local system to the HPC machine, and finish with copying the results back. Cerulean's copy
function takes care of this for you, and works as you would expect:
import cerulean
local_fs = cerulean.LocalFileSystem()
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
with SftpFileSystem(term) as remote_fs:
input_file = local_fs / 'home' / 'username' / 'input.txt'
job_dir = remote_fs / 'home' / 'username' / 'my_job'
cerulean.copy(input_file, job_dir)
# run job and wait for it to finish
output_file = local_fs / 'home' / 'username' / 'output.txt'
cerulean.copy(job_dir / 'output.txt', output_file)
If you have read the above, then the secret is already out: running commands using Cerulean is done using a Terminal
. For example, you can run a command locally using:
import cerulean
term = cerulean.LocalTerminal()
exit_code, stdout_text, stderr_text = term.run(
10.0, 'ls', ['-l'], None, '/home/username')
The first argument to Terminal.run
is a timeout value in seconds, which determines how long Cerulean will wait for the command to finish. The second argument is the command to run, followed by a list of arguments. Next is an optional string that, if you specify it, will be fed into the standard input of the program you are starting. The final argument is a string specifying the working directory in which to execute the command.
The function returns a tuple containing three values: the exit code of the process (or None if it didn't finish in time), a string containing text printed to standard output, and a string containing text printed to standard error by the command you ran.
Running commands remotely through SSH of course works in exactly the same way, except you use an SshTerminal
, as above:
import cerulean
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
exit_code, stdout_text, stderr_text = term.run(
10.0, 'ls', ['-l'], None, '/home/username')
On High Performance Computing machines, you don't run commands directly. Instead, you submit batch jobs to a scheduler, which will place them in a queue, and run them when everyone else in line before you is done. The most popular scheduler at the moment seems to be Slurm, but Cerulean also supports Torque/PBS.
The usual way of working with a scheduler is to use ssh
to connect to the cluster, where you run commands that submit jobs and check on their status. Cerulean works in the same way:
import cerulean
import time
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
sched = cerulean.SlurmScheduler(term)
job = cerulean.JobDescription()
job.name = 'cerulean_test'
job.command = 'ls'
job.arguments = ['-l']
job_id = sched.submit_job(job)
time.sleep(5)
status = sched.get_status(job_id)
if status == cerulean.JobStatus.DONE:
exit_code = sched.get_exit_code()
print('Job exited with code {}'.format(exit_code))
Of course, if you intend to run your submission script on the head node, then the scheduler is local, and you want to use a LocalTerminal
with your SlurmScheduler
. If your HPC machine runs Torque/PBS, use a TorqueScheduler
instead.
To find all the details of what Cerulean can do and how to do it, please refer to the API documentation<apidocs/cerulean>
.