## Submit notebook to SLURM

In this notebook one can:
- load a template notebook and it's settings as a dictionary
- change the starting settings, e.g. to point to a different dataset
- save it as a new notebook 
- submit it as a job to SLURM cluster.

  **Note: You need to have set up your SSH key to access WILSON before doing this!**

Creator(s):
- Mohsen Danaie (ePSIC)- mohsen.danaie@diamond.ac.uk 

In [18]:
import sys
sys.path.append('/dls_sw/e02/software/epsic_tools')
import epsic_tools.api as ep
import pprint
import re
import subprocess
import os
from __future__ import print_function,unicode_literals
import subprocess
import glob

In [19]:
starting_notebook_path = '/dls/science/groups/e02/Mohsen/code/jupyterhub_active/user_visits_notebooks/MG38764-5_Liddy_GasCell'
starting_notebook_name = 'import_K3-EELS_align_template'
nb = ep.notebook_utils.NotebookHelper(starting_notebook_path, starting_notebook_name)

The template notebook should have the starting settings in the **second cell (index 2)** as a markdown cell -This can easily be changed -, with the following format:

property_name=string

Example:

images_path=/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (1)/STEM SI_HAADF Image.dm4

**These should be separated by a line, with no spaces around the = sign.**

After reading these settings, and modifying them, we then inject the new settings into an empty cell with index 3 (so the fourth cell from top). Hence, it is important to keep that cell empty in the template notebook!

In [20]:
old_settings = nb.get_settings(2) # settings should be cell index 2
old_settings = old_settings.split('\n')
old_keys = [i.split('=')[0] for i in old_settings]
old_vals = [i.split('=')[1] for i in old_settings]
old_dict = dict(zip(old_keys, old_vals))
pprint.pprint(old_dict)

2025-04-16 12:11:05,756:/dls_sw/e02/software/epsic_tools/epsic_tools/toolbox/notebook_utils.py:31:importing jupyter code
2025-04-16 12:11:05,759:/dls_sw/e02/software/epsic_tools/epsic_tools/toolbox/notebook_utils.py:34:reading notebook from /dls/science/groups/e02/Mohsen/code/jupyterhub_active/user_visits_notebooks/MG38764-5_Liddy_GasCell/import_K3-EELS_align_template.ipynb


{'cal_hl_path': '/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (4)/STEM '
                'SI_EELS HL SI.dm4',
 'cal_ll_path': '/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (4)/STEM '
                'SI_EELS LL SI.dm4',
 'file_path_HL': '/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (1)/STEM '
                 'SI_EELS HL SI.dm5',
 'file_path_LL': '/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (1)/STEM '
                 'SI_EELS LL SI.dm5',
 'images_path': '/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (1)/STEM '
                'SI_HAADF Image.dm4',
 'save_path': '/dls/e01/data/2025/mg38764-5/processing/SI(1)'}


In [21]:
old_settings

['file_path_HL=/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (1)/STEM SI_EELS HL SI.dm5',
 'file_path_LL=/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (1)/STEM SI_EELS LL SI.dm5',
 'images_path=/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (1)/STEM SI_HAADF Image.dm4',
 'cal_ll_path=/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (4)/STEM SI_EELS LL SI.dm4',
 'cal_hl_path=/dls/e01/data/2025/mg38764-5/raw/SCD2070/InSitu (4)/STEM SI_EELS HL SI.dm4',
 'save_path=/dls/e01/data/2025/mg38764-5/processing/SI(1)']

In [22]:
old_keys

['file_path_HL',
 'file_path_LL',
 'images_path',
 'cal_ll_path',
 'cal_hl_path',
 'save_path']

In the path defined below, when the job runs, you can find the logs - error / out - from the jobs, the bash script defining the job, the modified notebook and the python script written based on the notebook, and a separate text file with all the outputs from the notebook cells (with file name: **log_nb.out**)

In [23]:
save_path_new = '/dls/e01/data/2025/mg38764-5/processing/SI(1)_test'
if not os.path.exists(save_path_new):
    os.makedirs(save_path_new)

In [24]:
# Submit as script to get output
new_notebook_path = os.path.join(save_path, 'submitted_version.ipynb')
new_text_path = os.path.join(save_path, 'submitted_version.txt')
new_script_path = os.path.join(save_path, 'submitted_version.py')
new_setting = old_dict.copy()
# Here we are changing a setting:
new_setting['save_path'] = save_path_new

# With the line below we are injecting the new settings into the modified notebook
nb.set_settings(new_setting, new_notebook_path, blank_cell_index=3)

bash_script_path = os.path.join(save_path, 'cluster_submit.sh')
with open (bash_script_path, 'w') as f:
    f.write('''#!/usr/bin/env bash
#SBATCH --partition=cs04r
#SBATCH --job-name epsic_notebook
#SBATCH --time=01:00:00
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2
#SBATCH --mem=60G
'''
f"#SBATCH --error={save_path}{os.sep}error_%j.out\n"
f"#SBATCH --output={save_path}{os.sep}output_%j.out\n"
f"module load python/epsic3.10\n"
f"jupyter nbconvert --to notebook --inplace --ClearMetadataPreprocessor.enabled=True '{new_notebook_path}'\n"
f"jupyter nbconvert --to script '{new_notebook_path}' \n"
f"mv '{new_text_path}' '{new_script_path}' \n"
f"python '{new_script_path}' &> '{save_path}{os.sep}log_nb.out'\n"
           )


# SUBMIT!
sshProcess = subprocess.Popen(['ssh',
                               '-tt',
                               'wilson'],
                               stdin=subprocess.PIPE, 
                               stdout = subprocess.PIPE,
                               universal_newlines=True,
                               bufsize=0)
sshProcess.stdin.write("ls .\n")
sshProcess.stdin.write("echo END\n")
sshProcess.stdin.write(f"sbatch '{bash_script_path}'\n")
sshProcess.stdin.write("uptime\n")
sshProcess.stdin.write("logout\n")
sshProcess.stdin.close()


for line in sshProcess.stdout:
    if line == "END\n":
        break
    print(line,end="")

#to catch the lines up to logout
for line in  sshProcess.stdout: 
    print(line,end="")


2025-04-16 12:11:11,610:/dls_sw/e02/software/epsic_tools/epsic_tools/toolbox/notebook_utils.py:76:importing jupyter code
2025-04-16 12:11:11,611:/dls_sw/e02/software/epsic_tools/epsic_tools/toolbox/notebook_utils.py:79:reading notebook from /dls/science/groups/e02/Mohsen/code/jupyterhub_active/user_visits_notebooks/MG38764-5_Liddy_GasCell/import_K3-EELS_align_template.ipynb


ls .
echo END
sbatch '/dls/e01/data/2025/mg38764-5/processing/SI(1)_test/cluster_submit.sh'
uptime
logout
 ╔════════════════════════════════════════════════════════════════╗
 ║ Welcome To Wilson - The DLS Slurm Cluster                      ║
 ║ For Help or Support - Visit: https://schelpdesk.diamond.ac.uk  ║
 ╠════════════════════════════════════════════════════════════════╣
 ║ [1;31mPlease Refrain From Running any Task Directly on This Node.[0m    ║
 ║ [1;31mIt is a Submission Only node.[0m                                  ║
 ╠════════════════════════════════════════════════════════════════╣
 ║ For jobs needing to access the fast ethernet filesystem, please║
 ║ use the cs04r partition. For the Infiniband filesystem, please ║
 ║ use the cs05r partition status.                                ║
 ║                                                                ║
 ║ To get an interactive session on a cs04r partition node enter  ║
 ║ "iact02", for a cs05r partition node enter "iact03"  

Connection to wilson closed.


## Below is an example of submitting an array job for multiple datasets

In [61]:
# Create a list of the datasets to submit this notebook for
base_path = '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3'
data_files = glob.glob(base_path+ '/*/*.hdf5')
len(data_files)

8

In [62]:
data_files

['/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_140507/20230724_140507.hdf5',
 '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_134213/20230724_134213.hdf5',
 '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_141239/20230724_141239.hdf5',
 '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_135709/20230724_135709.hdf5',
 '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_141931/20230724_141931.hdf5',
 '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_143414/20230724_143414.hdf5',
 '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_142627/20230724_142627.hdf5',
 '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_135013/20230724_135013.hdf5']

In [63]:
for file in data_files:
    new_analysis_folder_name = 'ACOM_analysis_compare_with_p4mbm_better_peak_finding_v1'
    save_path = os.path.join(os.path.dirname(file), new_analysis_folder_name)
    print(save_path)
    if not os.path.exists(save_path):
        os.mkdir(save_path)

    # update the settings
    new_setting = old_dict.copy()
    new_setting['raw_data_path'] = file
    # new_setting['corr_factor'] = 1.2
    # new_setting['cif_file2_path'] = '/dls/e02/data/2023/mg34931-2/processing/notebooks/MFAPbI3_2300363_Chen_2023_JACS_noH.cif'
    new_setting['save_path_name'] = new_analysis_folder_name
    pprint.pprint(new_setting)

    new_notebook_path = os.path.join(save_path, 'submitted_version_two_crystals.ipynb')
    nb.set_settings(new_setting, new_notebook_path)

    
    # Create a bash script to submit to SLURM 
    bash_script_path = os.path.join(save_path, 'cluster_submit.sh')
    with open (bash_script_path, 'w') as f:
        f.write('''#!/usr/bin/env bash
#SBATCH --partition=cs05r
#SBATCH --job-name epsic_notebook
#SBATCH --time=05:00:00
#SBATCH --nodes=1
#SBATCH --gpus-per-node 1
#SBATCH --tasks-per-node=1
#SBATCH --mem=60G
'''
f"#SBATCH --error={save_path}{os.sep}error_%j.out\n"
f"#SBATCH --output={save_path}{os.sep}output_%j.out\n"
f"module load python/epsic3.10\n"
f"jupyter nbconvert --to notebook --inplace --ClearMetadataPreprocessor.enabled=True {new_notebook_path}\n"
f"jupyter nbconvert --to notebook --allow-errors --execute {new_notebook_path}\n"
           )


    # SUBMIT!
    sshProcess = subprocess.Popen(['ssh',
                                   '-tt',
                                   'wilson'],
                                   stdin=subprocess.PIPE, 
                                   stdout = subprocess.PIPE,
                                   universal_newlines=True,
                                   bufsize=0)
    sshProcess.stdin.write("ls .\n")
    sshProcess.stdin.write("echo END\n")
    sshProcess.stdin.write(f"sbatch {bash_script_path}\n")
    sshProcess.stdin.write("uptime\n")
    sshProcess.stdin.write("logout\n")
    sshProcess.stdin.close()
    
    
    for line in sshProcess.stdout:
        if line == "END\n":
            break
        print(line,end="")
    
    #to catch the lines up to logout
    for line in  sshProcess.stdout: 
        print(line,end="")


/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_140507/ACOM_analysis_compare_with_p4mbm_better_peak_finding_v1
{'cif_file1_path': '/dls/e02/data/2023/mg34931-2/processing/notebooks/FAPbI3_mod_simple.cif',
 'cif_file2_path': '/dls/science/groups/e02/Mohsen/code/jupyterhub_active/user_visits_notebooks/mg34931-2_Jihoo/cif_files/p4mbm.cif',
 'corr_factor': '1.1',
 'crop_q': '130',
 'fill_cross': '1',
 'hot_pix_thresh': '2',
 'load_prepared_data': '0',
 'prepared_data_path': '',
 'probe_path': '',
 'raw_data_path': '/dls/e02/data/2023/mg34931-2/processing/Merlin/FA_50pct_3/20230724_140507/20230724_140507.hdf5',
 'save_path_name': 'ACOM_analysis_compare_with_p4mbm_better_peak_finding_v1',
 'syn_probe_rad': '3',
 'syn_probe_width': '3',
 'synthetic_probe': '1',
 'v_max': '0.1',
 'v_min': '0.01'}

 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$

Connection to wilson closed.



 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$$$$\|  $$$$$$\
| $$  | $$| $$     | $$___\$$      | $$__| $$| $$__/ $$| $$   \$$
| $$  | $$| $$      \$$    \       | $$    $$| $$    $$| $$
| $$  | $$| $$      _\$$$$$$\      | $$$$$$$$| $$$$$$$ | $$   __
| $$__/ $$| $$_____|  \__| $$      | $$  | $$| $$      | $$__/  \
| $$    $$| $$     \\$$    $$      | $$  | $$| $$       \$$    $$
 \$$$$$$$  \$$$$$$$$ \$$$$$$        \$$   \$$ \$$        \$$$$$$

Welcome To Wilson - The DLS Slurm Cluster
For Help or Support - Visit: https://schelpdesk.diamond.ac.uk

Please Refrain From Running Any Tasks Directly On This Node - It Is A Submission Node Only

For jobs needing to access the GPFS02 filesystem please use the cs04r partition, for the GPFS03 filesystem please use the cs05r partition status

To get an interactive session on a GPFS02 node enter "iact02", for a GPF

Connection to wilson closed.



 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$$$$\|  $$$$$$\
| $$  | $$| $$     | $$___\$$      | $$__| $$| $$__/ $$| $$   \$$
| $$  | $$| $$      \$$    \       | $$    $$| $$    $$| $$
| $$  | $$| $$      _\$$$$$$\      | $$$$$$$$| $$$$$$$ | $$   __
| $$__/ $$| $$_____|  \__| $$      | $$  | $$| $$      | $$__/  \
| $$    $$| $$     \\$$    $$      | $$  | $$| $$       \$$    $$
 \$$$$$$$  \$$$$$$$$ \$$$$$$        \$$   \$$ \$$        \$$$$$$

Welcome To Wilson - The DLS Slurm Cluster
For Help or Support - Visit: https://schelpdesk.diamond.ac.uk

Please Refrain From Running Any Tasks Directly On This Node - It Is A Submission Node Only

For jobs needing to access the GPFS02 filesystem please use the cs04r partition, for the GPFS03 filesystem please use the cs05r partition status

To get an interactive session on a GPFS02 node enter "iact02", for a GPF

Connection to wilson closed.



 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$$$$\|  $$$$$$\
| $$  | $$| $$     | $$___\$$      | $$__| $$| $$__/ $$| $$   \$$
| $$  | $$| $$      \$$    \       | $$    $$| $$    $$| $$
| $$  | $$| $$      _\$$$$$$\      | $$$$$$$$| $$$$$$$ | $$   __
| $$__/ $$| $$_____|  \__| $$      | $$  | $$| $$      | $$__/  \
| $$    $$| $$     \\$$    $$      | $$  | $$| $$       \$$    $$
 \$$$$$$$  \$$$$$$$$ \$$$$$$        \$$   \$$ \$$        \$$$$$$

Welcome To Wilson - The DLS Slurm Cluster
For Help or Support - Visit: https://schelpdesk.diamond.ac.uk

Please Refrain From Running Any Tasks Directly On This Node - It Is A Submission Node Only

For jobs needing to access the GPFS02 filesystem please use the cs04r partition, for the GPFS03 filesystem please use the cs05r partition status

To get an interactive session on a GPFS02 node enter "iact02", for a GPF

Connection to wilson closed.



 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$$$$\|  $$$$$$\
| $$  | $$| $$     | $$___\$$      | $$__| $$| $$__/ $$| $$   \$$
| $$  | $$| $$      \$$    \       | $$    $$| $$    $$| $$
| $$  | $$| $$      _\$$$$$$\      | $$$$$$$$| $$$$$$$ | $$   __
| $$__/ $$| $$_____|  \__| $$      | $$  | $$| $$      | $$__/  \
| $$    $$| $$     \\$$    $$      | $$  | $$| $$       \$$    $$
 \$$$$$$$  \$$$$$$$$ \$$$$$$        \$$   \$$ \$$        \$$$$$$

Welcome To Wilson - The DLS Slurm Cluster
For Help or Support - Visit: https://schelpdesk.diamond.ac.uk

Please Refrain From Running Any Tasks Directly On This Node - It Is A Submission Node Only

For jobs needing to access the GPFS02 filesystem please use the cs04r partition, for the GPFS03 filesystem please use the cs05r partition status

To get an interactive session on a GPFS02 node enter "iact02", for a GPF

Connection to wilson closed.



 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$$$$\|  $$$$$$\
| $$  | $$| $$     | $$___\$$      | $$__| $$| $$__/ $$| $$   \$$
| $$  | $$| $$      \$$    \       | $$    $$| $$    $$| $$
| $$  | $$| $$      _\$$$$$$\      | $$$$$$$$| $$$$$$$ | $$   __
| $$__/ $$| $$_____|  \__| $$      | $$  | $$| $$      | $$__/  \
| $$    $$| $$     \\$$    $$      | $$  | $$| $$       \$$    $$
 \$$$$$$$  \$$$$$$$$ \$$$$$$        \$$   \$$ \$$        \$$$$$$

Welcome To Wilson - The DLS Slurm Cluster
For Help or Support - Visit: https://schelpdesk.diamond.ac.uk

Please Refrain From Running Any Tasks Directly On This Node - It Is A Submission Node Only

For jobs needing to access the GPFS02 filesystem please use the cs04r partition, for the GPFS03 filesystem please use the cs05r partition status

To get an interactive session on a GPFS02 node enter "iact02", for a GPF

Connection to wilson closed.



 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$$$$\|  $$$$$$\
| $$  | $$| $$     | $$___\$$      | $$__| $$| $$__/ $$| $$   \$$
| $$  | $$| $$      \$$    \       | $$    $$| $$    $$| $$
| $$  | $$| $$      _\$$$$$$\      | $$$$$$$$| $$$$$$$ | $$   __
| $$__/ $$| $$_____|  \__| $$      | $$  | $$| $$      | $$__/  \
| $$    $$| $$     \\$$    $$      | $$  | $$| $$       \$$    $$
 \$$$$$$$  \$$$$$$$$ \$$$$$$        \$$   \$$ \$$        \$$$$$$

Welcome To Wilson - The DLS Slurm Cluster
For Help or Support - Visit: https://schelpdesk.diamond.ac.uk

Please Refrain From Running Any Tasks Directly On This Node - It Is A Submission Node Only

For jobs needing to access the GPFS02 filesystem please use the cs04r partition, for the GPFS03 filesystem please use the cs05r partition status

To get an interactive session on a GPFS02 node enter "iact02", for a GPF

Connection to wilson closed.



 _______   __        ______         __    __  _______    ______
|       \ |  \      /      \       |  \  |  \|       \  /      \
| $$$$$$$\| $$     |  $$$$$$\      | $$  | $$| $$$$$$$\|  $$$$$$\
| $$  | $$| $$     | $$___\$$      | $$__| $$| $$__/ $$| $$   \$$
| $$  | $$| $$      \$$    \       | $$    $$| $$    $$| $$
| $$  | $$| $$      _\$$$$$$\      | $$$$$$$$| $$$$$$$ | $$   __
| $$__/ $$| $$_____|  \__| $$      | $$  | $$| $$      | $$__/  \
| $$    $$| $$     \\$$    $$      | $$  | $$| $$       \$$    $$
 \$$$$$$$  \$$$$$$$$ \$$$$$$        \$$   \$$ \$$        \$$$$$$

Welcome To Wilson - The DLS Slurm Cluster
For Help or Support - Visit: https://schelpdesk.diamond.ac.uk

Please Refrain From Running Any Tasks Directly On This Node - It Is A Submission Node Only

For jobs needing to access the GPFS02 filesystem please use the cs04r partition, for the GPFS03 filesystem please use the cs05r partition status

To get an interactive session on a GPFS02 node enter "iact02", for a GPF

Connection to wilson closed.
