## Generating multiple SLURM scripts

This notebook is an example of a notebook which generates either a single or multiple SLURM scripts for use on a supercomputer using the SLURM queue system. If you want to use this, **you will need to heavily edit this to suit your own particular setup.**

This notebook does assume that you have a local copy of the input files on the machine where you are running this notebook. If you don't, you'll have to edit this a bit more.

In the case presented below, there are 16 folders each containing ~100 sets of input files for Sorcha. An example of a folder layout and file/folder naming system is shown below.

![alternative text](example_file_structure.png)

In [None]:
import glob
import os
import re

Below are defined a number of parameters, most of which go into the header of the SLURM scripts. You will likely need to edit these to match your own preferences.

The top parameter controls the number of SLURM scripts you want to generate, corresponding to the number of input folders. It's perfectly fine to just put '1' here if you only need one.

In [None]:
number_of_files = 16
filename = 'batch_script'

job_name = 'Sorcha_batch'
ntasks = '100'
mem_per_cpu = '7G'
output_path = 'path/to/terminal/output'
time = '3:00:00'
partition = 'your_partition'

The below should be the folder/pattern where your input files are **currently** located, i.e. on your local machine. The code uses this to get the list of input filenames. If you are running this notebook on the supercomputer, inputs_in is the same path as inputs_out below.

In this example, the expectation is that the input folders are called './dp03_inputs_kelvin/kelvin_dp03_batch_1', './dp03_inputs_kelvin/kelvin_dp03_batch_2', './dp03_inputs_kelvin/kelvin_dp03_batch_3', etc, as shown in the above graphic.

In [None]:
inputs_in = './dp03_inputs_kelvin/kelvin_dp03_batch_'

The below parameters define where the inputs, configuration file, output folder, pointing database and SPICE files are located on the machine on which you will be running Sorcha. Edit these to where these files and folders will be located on your supercomputer.

In [None]:
inputs_out = '/supercomputer_inputs_location/sorcha_batch_'
config = '/supercomputer_inputs_location/sorcha_config.ini'
outputs_out = '/supercomputer_outputs_location/sorcha_batch_'
pointing = 'supercomputer_outputs_location/baseline_v2.0_10yrs.db'
ar_data_path = '/supercomputer_cache_location/sorcha_cache_files'

The below function creates the header of the SLURM scripts, including any introductory commands such as loading Anaconda and activating the correct Conda environment. Once again, you will likely need to edit this heavily for your own setup.

In [None]:
def print_header(filename, n, job_name, ntasks, mem_per_cpu, output_path, time, partition):

    with open(filename, "a") as the_file:
        the_file.write("#!/bin/bash\n")
        the_file.write("#SBATCH --job-name=" + job_name + str(n) + "\n")
        the_file.write("#SBATCH --ntasks=" + ntasks + "\n")
        the_file.write("#SBATCH --mem-per-cpu=" + mem_per_cpu + "\n")
        the_file.write("#SBATCH --cpus-per-task=1\n")
        the_file.write("#SBATCH --output=" + output_path + job_name + str(n) + ".out\n")
        the_file.write("#SBATCH --time=" + time + "\n")
        the_file.write("#SBATCH --partition=" + partition + "\n")
        the_file.write("#SBATCH --mail-user=YOUR EMAIL ADDRESS GOES HERE\n") # put your own email address in here!!
        the_file.write("#SBATCH --mail-type=BEGIN,FAIL,END\n")
        the_file.write("\n")
        the_file.write("dt=$(date '+%d/%m/%Y %H:%M:%S');\n")
        the_file.write("echo \"$dt Beginning Sorcha.\"\n")
        the_file.write("\n")
        the_file.write("module load apps/anaconda3/2022.10/bin\n")
        the_file.write("\n")
        the_file.write("source activate sorcha\n")
        the_file.write("\n")
        the_file.write("\n")

The below prints a footer to the SLURM scripts, which you can also edit if you like.

In [None]:
def print_footer(filename):
    
    with open(filename, "a") as the_file:
        the_file.write("\n")
        the_file.write("dt=$(date '+%d/%m/%Y %H:%M:%S');\n")
        the_file.write("echo \"$dt Sorcha complete.\"\n")

The below function shouldn't need to be edited.

In [None]:
def get_sorted_list_of_files(filepath, stem):
    """Globs for a list of files using the suggested filepath and stem (which should
    include wildcards) then sorts the list. If no files are found, the code exits.

    Parameters:
    -----------
    filepath (string): filepath of folder where files are located

    stem (string): string containing filename pattern to search for

    Returns:
    -----------
    globbed_list (list): sorted list of filename strings

    """

    globbed_list = glob.glob(os.path.join(filepath, stem))
    globbed_list.sort()

    if not globbed_list:
        print("Could not find any files on given input path {} using stem {}.".format(filepath, stem))

    return globbed_list

The below function may need to be edited. It assumes that your input files take a specific format where the orbit files contain the pattern \*orbit\* and the physical parameters files contain the pattern \*physical\*. You may also wish to edit Sorcha's command line arguments here.

In [None]:
def add_SLURM_commands(filename, n, inputs_in, inputs_out, config, outputs_out, pointing):

    sorcha_base_command = "srun --exclusive -N1 -n1 -c1 sorcha" # you may want to edit this if you know what you're doing

    orbits = get_sorted_list_of_files(inputs_in+str(n), '*orbit*') # edit these two lines if your files have a different naming pattern
    params = get_sorted_list_of_files(inputs_in+str(n), '*physical*')

    for i, orbits_fn in enumerate(orbits):

        root_fn = os.path.basename(os.path.splitext(orbits_fn)[0]).replace('_orbit', '')

        params_fn_new = os.path.join(inputs_out+str(n), os.path.basename(params[i]))
        orbits_fn_new = os.path.join(inputs_out+str(n), os.path.basename(orbits_fn))

        output_folder = os.path.join(outputs_out+str(n), root_fn)
        mkdir_command = " ".join(["mkdir", output_folder])

        full_command = [
            sorcha_base_command, # you may want to edit the command line arguments for Sorcha
            "-c",
            config,
            "-ob",
            orbits_fn_new,
            "-p",
            params_fn_new,
            "-pd",
            pointing,
            "-o",
            output_folder,
            "-t",
            "_".join(['SorchaOutput', root_fn]),
        ]

        #ephem_out = os.path.join(output_folder, "_".join(["ephem", root_fn + ".txt"]))
        #full_command.extend(["-ew", ephem_out])

        full_command.extend(["-ar", ar_data_path])

        command_out = " ".join(full_command)

        with open(filename, "a") as the_file:
            the_file.write(mkdir_command + "\n")
            the_file.write(command_out + " & \n")

    with open(filename, "a") as the_file:
        the_file.write("wait\n")

Run the below cell to generate your SLURM scripts.

In [None]:
for i in range(1, number_of_files+1):

    script_filename = filename+str(i)+'.sh'
    
    print_header(script_filename, i, job_name, ntasks, mem_per_cpu, output_path, time, partition)
    add_SLURM_commands(script_filename, i, inputs_in, inputs_out, config, outputs_out, pointing)
    print_footer(script_filename)