> [!Warning] 
> **This project is still in an early phase of development.**
>
> The [python API](https://c-star.readthedocs.io/en/latest/api.html) is not yet stable, and some aspects of the schema for the [blueprint](https://c-star.readthedocs.io/en/latest/terminology.html#term-blueprint) will likely evolve. 
> Therefore whilst you are welcome to try out using the package, we cannot yet guarantee backwards compatibility. 
We expect to reach a more stable version in Q1 2025.
>
> To see which systems C-Star has been tested on so far, see [Supported Systems](https://c-star.readthedocs.io/en/latest/machines.html).

# Handling jobs on HPC systems
On this page, we will look at how to use C-Star on supported HPC systems with job schedulers, including:

- Submitting a job to a scheduler queue
- Checking the id of a job submitted to the queue
- Checking the status of a job submitted to the queue
- Receiving live updates from a job submitted to the queue
- Cancelling a job submitted to the queue

## Importing an example Case and running it on HPC with a job scheduler:
We will import and set up the case from the [previous example](2_importing_and_running_a_case_from_a_blueprint.html)

In [2]:
import cstar

example_case_1 = cstar.Case.from_blueprint(blueprint  = "../examples/alpha_example/cstar_blueprint_alpha_example.yaml",
                                           caseroot   = "../examples/alpha_example/example_case", 
                                           start_date = "2012-01-01 12:00:00", 
                                           end_date   = "2012-01-03 12:00:00")

## A quick look at the system's scheduler

Before running the case, let's take a look at this system's (i.e. NERSC Perlmutter's) scheduler. We can do this via the global variable `cstar_sysmgr`, using its `scheduler` property:

In [3]:
from cstar.system.manager import cstar_sysmgr
print(cstar_sysmgr.scheduler)

SlurmScheduler
--------------
primary_queue: regular
queues:
- regular
- shared
- debug
other_scheduler_directives: {'-C': 'cpu'}
global max cpus per node: 256
global max mem per node: 1007.12890625GB
documentation: https://docs.nersc.gov/systems/perlmutter/architecture/


From here we can see some global properties of the current system's scheduler, including its queues and a link to its official documentation.

We can query a queue to see its time limit before submitting a job to it:

In [4]:
print(cstar_sysmgr.scheduler.get_queue("shared"))

SlurmQOS:
--------
name: shared
max_walltime: 48:00:00



## Submitting a job to the scheduler queue
We can now set up and run the job as in the [previous example](2_importing_and_running_a_case_from_a_blueprint.html), assigning the `SlurmJob` instance returned by `Case.run()` to a variable we can keep track of.

In [None]:
example_case_1.setup()
example_case_1.build()
example_case_1.pre_run()

hpc_job = example_case_1.run(account_key="m4746", walltime="00:10:00", queue="shared")

## Tracking the submitted job
### Viewing the submitted script
We can see the script that was submitted to the scheduler using the `script` property:

In [6]:
print(hpc_job.script)

#!/bin/bash
#SBATCH --job-name=cstar_job_20241217_040743
#SBATCH --output=/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/alpha_example/example_case/output/cstar_job_20241217_040743.out
#SBATCH --qos=shared
#SBATCH --ntasks=9
#SBATCH --account=m4746
#SBATCH --export=ALL
#SBATCH --mail-type=ALL
#SBATCH --time=00:10:00
#SBATCH -C cpu

srun -n 9 /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/alpha_example/example_case/additional_source_code/ROMS/roms /global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/alpha_example/example_case/namelists/ROMS/roms.in


We can see where the script is saved using the `script_path` property:

In [7]:
hpc_job.script_path

PosixPath('/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/docs/cstar_job_20241217_040743.sh')

We can see the output file where the job's output will be written using the `output_file` property:

### Viewing the output file path
The output file contains the standard output and error streams returned by the job

In [8]:
hpc_job.output_file

PosixPath('/global/cfs/cdirs/m4746/Users/dafydd/my_c_star/examples/alpha_example/example_case/output/cstar_job_20241217_040743.out')

### Checking the job ID
We can check the scheduler-assigned job ID using the `id` property:

In [9]:
hpc_job.id

34020872

### Checking the status
We can check the job status using the `status` property. Possible values are:

- `UNSUBMITTED`: the job is not yet submitted to the scheduler
- `PENDING`: the job is in the queue
- `RUNNING`: the job is underway
- `COMPLETED`: the job is finished
- `CANCELLED`: the job was cancelled by the user
- `FAILED`: the job finished unsuccessfully
- `HELD`: the job is being held in the queue
- `ENDING`: the job is in the process of finishing
- `UNKNOWN`: the job status cannot be determined

In [15]:
hpc_job.status

<JobStatus.RUNNING: 3>

### Receiving live updates from a job submitted to the queue
While the job is running, we can stream any new lines written to the output file using the `updates()` method. This method receives a `seconds` parameter, and will provide live updates for the number of seconds provided by the user (default 10). If the user specifies `seconds=0`, updates will be provided indefinitely until stopped with a keyboard interruption (typically via `Ctrl-c`)

In [16]:
hpc_job.updates(seconds=5)

     10 4383.5069 5.02261364210-03 4.6229272130-03  0.006169338248  0.004148496602     12     30   10
 doing BGC with MARBL
     11 4383.5076 5.02524104236-03 4.6174088782-03  0.006089739153  0.004130219590     12     30   10
 doing BGC with MARBL
     12 4383.5083 5.02943938880-03 4.6138698068-03  0.006006833682  0.004103156765     12     30   10
 doing BGC with MARBL
     13 4383.5090 5.03496502648-03 4.6120087550-03  0.005923808501  0.003714023558     12     30   11
 doing BGC with MARBL
     14 4383.5097 5.04151895536-03 4.6114496961-03  0.005847022951  0.003678689848     12     30   11
 doing BGC with MARBL
     15 4383.5104 5.04862756371-03 4.6117434009-03  0.005773042437  0.003638573100     12     30   11
 doing BGC with MARBL
     16 4383.5111 5.05595116522-03 4.6124263070-03  0.005702953333  0.003595429362     12     30   11
 doing BGC with MARBL
     17 4383.5118 5.06309274221-03 4.6130131036-03  0.005638502215  0.003550950282     12     30   11
 doing BGC with MARBL
     18 

### Cancelling a job
We can cancel the job using the `cancel` method:

In [17]:
hpc_job.cancel()

Job 34020872 cancelled


In [18]:
hpc_job.status

<JobStatus.CANCELLED: 5>