> [!Warning] 
> **This project is still in an early phase of development.**
>
> The [python API](../api.html) is not yet stable, and some aspects of the schema for the [blueprint](../terminology.html#term-blueprint) will likely evolve. 
> Therefore whilst you are welcome to try out using the package, we cannot yet guarantee backwards compatibility. 
We expect to reach a more stable version in Q1 2025.
>
> To see which systems C-Star has been tested on so far, see [Supported Systems](../machines.html).

# Tracking runs executed as jobs on HPC systems

## Contents
1. [Introduction](#1.-Introduction)
2. [Importing an example Simulation and running it on HPC with a job scheduler](#2.-Importing-an-example-Simulation-and-running-it-on-HPC-with-a-job-scheduler)
   - [A quick look at the system's scheduler](#2i.-A-quick-look-at-the-system's-scheduler)
   - [Submitting a job to the scheduler queue](#2ii.-Submitting-a-job-to-the-scheduler-queue)
3. [Tracking the submitted job](#3.-Tracking-the-submitted-job)
   - [Viewing the submitted script](#3i.-Viewing-the-submitted-script)
   - [Checking the job ID](#3ii.-Checking-the-job-ID)
   - [Checking the status](#3iii.-Checking-the-status)
   - [Viewing the output file path](#3iv.-Viewing-the-output-file-path)
   - [Receiving live updates from the output file](#3v.-Receiving-live-updates-from-the-output-file)
5. [Cancelling a job](#4.-Cancelling-a-job)
7. [Summary](#5.-Summary)

## 1. Introduction

[(return to top)](#Contents)

On this page, we will look at how to use C-Star on supported HPC systems with job schedulers, including:

- Submitting a job to a scheduler queue
- Checking the id of a job submitted to the queue
- Checking the status of a job submitted to the queue
- Receiving live updates from a job submitted to the queue
- Cancelling a job submitted to the queue



## 2. Importing an example Simulation and running it on HPC with a job scheduler
We will import and set up the same simulation as our [tutorial](../tutorials/2_importing_and_running_a_simulation_from_a_blueprint.html) on importing and running Simulations.

In [1]:
from cstar.roms import ROMSSimulation

example_simulation_1 = ROMSSimulation.from_blueprint(blueprint  = "https://raw.githubusercontent.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/netcdf_inputs/cstar_blueprint_example_with_netcdf_inputs.yaml",
                                                     directory  = "../../examples/example_case/", 
                                                     start_date = "2012-01-03 12:00:00", 
                                                     end_date   = "2012-01-06 12:00:00")

## 2i. A quick look at the system's scheduler

Before running the case, let's take a look at this system's (i.e. NERSC Perlmutter's) scheduler. We can do this via the global variable `cstar_sysmgr`, using its `scheduler` property:

In [2]:
from cstar.system.manager import cstar_sysmgr
print(cstar_sysmgr.scheduler)

SlurmScheduler
--------------
primary_queue: regular
queues:
- regular
- shared
- debug
other_scheduler_directives: {'-C': 'cpu'}
global max cpus per node: 256
global max mem per node: 503.02734375GB
documentation: https://docs.nersc.gov/systems/perlmutter/architecture/


From here we can see some global properties of the current system's scheduler, including its queues and a link to its official documentation.

We can query a queue to see its time limit before submitting a job to it:

In [3]:
print(cstar_sysmgr.scheduler.get_queue("shared"))

SlurmQOS:
--------
name: shared
max_walltime: 48:00:00



## 2ii. Submitting a job to the scheduler queue
We can now set up and run the job [as in the corresponding tutorial](../tutorials/2_importing_and_running_a_simulation_from_a_blueprint.html), assigning the `SlurmJob` instance returned by `ROMSSimulation.run()` to a variable we can keep track of.

In [4]:
example_simulation_1.setup()
example_simulation_1.build()
example_simulation_1.pre_run()

hpc_job = example_simulation_1.run(account_key="m4746", walltime="00:10:00", queue_name="shared")

Configuring ROMSSimulation
--------------------------
Setting up ROMSExternalCodeBase...
#######################################################
C-STAR: ROMS_ROOT not found in current cstar_sysmgr.environment. 
if this is your first time running C-Star with an instance of ROMSExternalCodeBase, you will need to set it up.
It is recommended that you install this external codebase in 
/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/ucla-roms
This will also modify your `~/.cstar.env` file.
#######################################################


Would you like to do this now? ('y', 'n', or 'custom' to install at a custom path)
 y


Cloned repository https://github.com/CESR-lab/ucla-roms.git to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/ucla-roms
Checked out main in git repository /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/ucla-roms
Updating environment in C-Star configuration file ~/.cstar.env
Updating environment in C-Star configuration file /global/homes/d/dafydd/.cstar.env
Compiling UCLA ROMS' NHMG library...
Compiling Tools-Roms package for UCLA ROMS...
UCLA-ROMS is installed at /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/ucla-roms
Setting up MARBLExternalCodeBase...
#######################################################
C-STAR: MARBL_ROOT not found in current cstar_sysmgr.environment. 
if this is your first time running C-Star with an instance of MARBLExternalCodeBase, you will need to set it up.
It is recommended that you install this external codebase in 
/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/MARBL
This will also modify yo

Would you like to do this now? ('y', 'n', or 'custom' to install at a custom path)
 y


Cloned repository https://github.com/marbl-ecosys/MARBL.git to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/MARBL
Checked out marbl0.45.0 in git repository /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/MARBL
Updating environment in C-Star configuration file ~/.cstar.env
Compiling MARBL...
MARBL successfully installed at /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/cstar/externals/MARBL

Fetching compile-time code code...
----------------------------------
Cloned repository https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example.git to /tmp/tmp4c2nlmcu
Checked out main in git repository /tmp/tmp4c2nlmcu
copying bgc.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
copying bulk_frc.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
copying cppdefs.opt to /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code

Downloading file 'roms_grd.nc' from 'https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/raw/netcdf_inputs/input_datasets_netcdf/roms_grd.nc' to '/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets'.
Downloading file 'roms_ini.nc' from 'https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/raw/netcdf_inputs/input_datasets_netcdf/roms_ini.nc' to '/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets'.
Downloading file 'roms_tides.nc' from 'https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/raw/netcdf_inputs/input_datasets_netcdf/roms_tides.nc' to '/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets'.
Downloading file 'roms_bry.nc' from 'https://github.com/CWorthy-ocean/cstar_blueprint_roms_marbl_example/raw/netcdf_inputs/input_datasets_netcdf/roms_bry.nc' to '/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/i

Compiling UCLA-ROMS configuration...
UCLA-ROMS compiled at /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code
Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_grd.nc into (3,3)
Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_ini.nc into (3,3)
Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_tides.nc into (3,3)
Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_bry.nc into (3,3)
Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_bry_bgc.nc into (3,3)
Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_datasets/roms_frc.nc into (3,3)
Partitioning /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/input_

## 3. Tracking the submitted job
### 3i. Viewing the submitted script
We can see the script that was submitted to the scheduler using the `script` property:

In [5]:
print(hpc_job.script)

#!/bin/bash
#SBATCH --job-name=cstar_job_20250226_164342
#SBATCH --output=/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/output/cstar_job_20250226_164342.out
#SBATCH --qos=shared
#SBATCH --ntasks=9
#SBATCH --account=m4746
#SBATCH --export=ALL
#SBATCH --mail-type=ALL
#SBATCH --time=00:10:00
#SBATCH -C cpu

srun -n 9 /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/compile_time_code/roms /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/example_case/ROMS/runtime_code/roms.in


We can see where the script is saved using the `script_path` property:

In [6]:
hpc_job.script_path

PosixPath('/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/docs/howto_guides/cstar_job_20250226_164342.sh')

We can see the output file where the job's output will be written using the `output_file` property:

### 3ii. Checking the job ID
We can check the scheduler-assigned job ID using the `id` property:

In [7]:
hpc_job.id

36284834

### 3iii. Checking the status
We can check the job status using the `status` property. Possible values are:

- `UNSUBMITTED`: the job is not yet submitted to the scheduler
- `PENDING`: the job is in the queue
- `RUNNING`: the job is underway
- `COMPLETED`: the job is finished
- `CANCELLED`: the job was cancelled by the user
- `FAILED`: the job finished unsuccessfully
- `HELD`: the job is being held in the queue
- `ENDING`: the job is in the process of finishing
- `UNKNOWN`: the job status cannot be determined

In [9]:
hpc_job.status

<ExecutionStatus.RUNNING: 3>

### 3iv. Viewing the output file path
The output file contains the standard output and error streams returned by the job

In [8]:
hpc_job.output_file

PosixPath('/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/alpha_example/example_case/output/cstar_job_20241217_040743.out')

### 3v. Receiving live updates from the output file
While the job is running, we can stream any new lines written to the output file using the `updates()` method. This method receives a `seconds` parameter, and will provide live updates for the number of seconds provided by the user (default 10). If the user specifies `seconds=0`, updates will be provided indefinitely until stopped with a keyboard interruption (typically via `Ctrl-c`)

In [10]:
hpc_job.updates(seconds=0.5)

 doing BGC with MARBL
     73 4383.5506 5.12349924213-03 4.4860296698-03  0.005048004060  0.004185154992     11     27   12
 doing BGC with MARBL
     74 4383.5513 5.12251601734-03 4.4832235674-03  0.005017372766  0.004176282261     11     27   12
 doing BGC with MARBL
     75 4383.5520 5.12157658667-03 4.4804579761-03  0.004989313894  0.004166898153     11     27   12
 doing BGC with MARBL
     76 4383.5527 5.12072658584-03 4.4777445740-03  0.004964180878  0.004157188270     11     27   12
 doing BGC with MARBL
     77 4383.5534 5.11981172189-03 4.4751024064-03  0.004952864963  0.003930362089     12     26   12


## 4. Cancelling a job
We can cancel the job using the `cancel` method:

In [11]:
hpc_job.cancel()

Job 36284834 cancelled


In [12]:
hpc_job.status

<ExecutionStatus.CANCELLED: 5>

## 5. Summary

[(return to top)](#Contents)

In this guide, we set up and ran the example `Simulation` that we built in [another tutorial](../tutorials/2_importing_and_running_a_simulation_from_a_blueprint.html), with a particular focus on the `SchedulerJob` instance associated with the run. We looked at tracking the run's status and output files, and cancelling the run.