# Compiling and Running Programs on an HPC System

This notebook will guide you through the steps necessary to compile and run a computationally intensive C program on a High-Performance Computing (HPC) system. We will cover both basic and advanced topics, focusing on using specific compilers and modules available on the HPC.

## Why Use an HPC for Compiling?

Compiling and running programs on an HPC system can significantly enhance performance for compute-intensive tasks. This is due to several advantages that HPC systems provide:
- **Access to specialized compilers and libraries:** Optimized to exploit the hardware capabilities like multiple cores, high-performance GPUs, and fast interconnects.
- **Module systems for easy software management:** Allows users to easily load and switch between different software environments and libraries needed for different applications.
- **Enhanced computational power:** With more processors, memory, and storage than a typical desktop or laptop, HPC systems can handle much larger computations.

## Example Program: `calculate_pi.c`

Instead of a simple hello world program, we will use a more complex C program that calculates the value of Pi using the Monte Carlo method. This method involves simulating random points and assessing how many fall within a quarter circle inscribed in a unit square. The ratio of points inside the circle to the total points approximates Pi/4.

Here's the source code for `calculate_pi.c`:


In [17]:
import os

# Define the path for the C program file
c_program_path = "calculate_pi.c"

# Remove the existing file if it exists
if os.path.exists(c_program_path):
    os.remove(c_program_path)

# Create and write the C program
c_program = """
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char *argv[]) {
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <iterations>\\n", argv[0]);
        return 1;
    }
    
    int iterations = atoi(argv[1]);
    if (iterations <= 0) {
        fprintf(stderr, "Please provide a positive integer for iterations.\\n");
        return 1;
    }

    int inside = 0;
    double x, y, pi;

    srand(time(NULL)); // Seed the random number generator

    for (int i = 0; i < iterations; i++) {
        x = (double)rand() / RAND_MAX;
        y = (double)rand() / RAND_MAX;
        if (x * x + y * y <= 1) {
            inside++;
        }
    }

    pi = (double)inside / iterations * 4;
    printf("Approximation of Pi: %f\\n", pi);

    return 0;
}
"""

# Write the C program to a file
with open(c_program_path, "w") as file:
    file.write(c_program)

print(f"Complex C program written to {c_program_path} with command-line argument support.")


Complex C program written to calculate_pi.c with command-line argument support.


### 2. Compile the Program

Use the `gcc` command to compile `calculate_pi.c` and generate an executable named `calculate_pi`:

1. Load the Necessary Modules
HPC systems use module systems to manage software environments. Before compiling, load the appropriate compiler module. Here, we'll use the GCC compiler:

In [18]:
import subprocess
import os

# Compile the C program using gcc
compile_command = "gcc calculate_pi.c -o calculate_pi"  # Corrected output file name
compile_process = subprocess.run(compile_command, shell=True, capture_output=True, text=True)

# Print the output and error (if any) after compilation attempt
print("Compiling the C program...")
if compile_process.stdout:
    print("Output:", compile_process.stdout)
if compile_process.stderr:
    print("Error:", compile_process.stderr)

# Check if the executable was created
if os.path.exists("calculate_pi"):  # Corrected executable file name
    print("Compilation successful, executable 'calculate_pi' created.")
else:
    print("Compilation failed.")


Compiling the C program...
Compilation successful, executable 'calculate_pi' created.


### Run the Program

Now we will execute the program. 

**As it is doing  100000000 ITERATIONS it will take time, Be patient!** 

Execute the program with the following command to see the output:


In [23]:
import subprocess

# Compile the C program first if it hasn't been compiled
compile_command = ["gcc", "calculate_pi.c", "-o", "calculate_pi"]
subprocess.run(compile_command)

# Run the compiled program
run_program = subprocess.run(["./calculate_pi", "100000000"], capture_output=True, text=True)

# Print the output of the program
print(run_program.stdout)
print(run_program.stderr)


Approximation of Pi: 3.141493




# Resource managers and Slurm
## What is a Resource Manager?
An HPC system is made up of smaller constituent systems all working together. Normally, all of our interactions  are with one computer, which is the login node of a cluster. This is because we have not yet learned to use a _resource manager_. A _resource manager_ is a program that contains both a server, running on a head node, and any number of clients, running on worker nodes. The client allows worker nodes to ask the head node for work, and the server provides jobs to carry out. Almost all clusters have some form of resource manager on them which allows users to submit and monitor jobs to be run on the worker nodes. Most resource managers also have scheduling systems which allow them to run jobs in different orders based on a number of parameters. 




## What is an HPC Cluster?

A High-Performance Computing (HPC) cluster is a collection of interconnected computers that work together to perform complex computations. Each computer in the cluster is known as a node, and the nodes are connected through a high-speed network.

### Key Components of an HPC Cluster
- **Login Nodes**: Used for preparing jobs and submitting them to the scheduler. Not for running heavy computations.
- **Compute Nodes**: Dedicated to running computational jobs.
- **Scheduler**: Manages job submission, resource allocation, and job execution. SLURM is a popular scheduler.

## Introduction to SLURM

SLURM (Simple Linux Utility for Resource Management) is a powerful scheduler that helps manage resources and schedule jobs on an HPC cluster.

The following image describes the job flow of Slurm, a commonly used resource manager:

![SLURM architecture](https://slurm.schedmd.com/arch.gif)

### Main SLURM Commands
- `srun`: Run parallel jobs.
- `sbatch`: Submit a batch job script to the scheduler.
- `squeue`: View the job queue.
- `scancel`: Cancel a job.
- `sinfo`: View information about the nodes and partitions.

In this notebook, we will create, compile, and run a simple C program using SLURM.


# Understanding Cluster Configuration with `sinfo`

The `sinfo` command in SLURM provides detailed information about the current state of the nodes and partitions within the HPC cluster. This command is essential for users to understand the availability and status of resources before submitting jobs.

## Key Outputs of `sinfo`

- **PARTITION**: Shows the partition names.
- **AVAIL**: Indicates if the partition is available (`up`) or not (`down`).
- **TIMELIMIT**: Lists the maximum time that jobs are allowed to run in the partition.
- **NODES**: Shows the number of nodes in each state.
- **STATE**: Indicates the state of the nodes (e.g., `idle`, `alloc` for allocated, etc.).
- **NODELIST**: Provides the specific names or identifiers of the nodes.

By default, `sinfo` displays a brief summary. To get more detailed information, you can use various flags with this command.

## Example Commands

- `sinfo`: Provides a basic overview of the cluster.
- `sinfo -l`: Provides a detailed view.
- `sinfo -N`: Lists information node by node.
- `sinfo -s`: Displays a short format.

Let's run a basic `sinfo` command to see the current state of the cluster.


In [None]:
!sinfo

## Creating and Submitting a SLURM Job

Users submit tasks to a queue, which are then ordered by priority rules set by administrators, and those jobs get run on any available backend resources.


**srun** is used to submit a job for execution in real time

while

**sbatch** is used to submit a job script for later execution.

They both accept practically the same set of parameters. The main difference is that srun is interactive and blocking (you get the result in your terminal and you cannot write other commands until it is finished), while sbatch is batch processing and non-blocking (results are written to a file and you can submit other commands right away).

If you use **srun** in the background with the & sign, then you remove the 'blocking' feature of srun, which becomes interactive but non-blocking. It is still interactive though, meaning that the output will clutter your terminal, and the srun processes are linked to your terminal. If you disconnect, you will loose control over them, or they might be killed (depending on whether they use stdout or not basically). And they will be killed if the machine to which you connect to submit jobs is rebooted.
To run our compiled program on the HPC cluster, we need to create a SLURM job script. This script specifies the resources required and the commands to execute.

### SLURM Job Script Example
Below is a simple SLURM script that requests 1 compute node for 5 minutes and runs our `hello_hpc` executable.

```bash
#!/bin/bash
#SBATCH --job-name=calculate_pi
#SBATCH --output=calculate_pi.out
#SBATCH --error=calculate_pi.err
#SBATCH --time=00:05:00
#SBATCH --nodes=1
#SBATCH --mem=1G  # Allocates 1 GB of total memory to the job

# Load necessary modules
module load gcc

# Run the executable
srun ./calculate_pi 1000000000


In [24]:
import os

# Define the SLURM job script path
slurm_script_path = "calculate_pi.slurm"

# Remove existing SLURM script if it exists
if os.path.exists(slurm_script_path):
    os.remove(slurm_script_path)

# Create the SLURM job script with explicit path to bash
slurm_script = """#!/bin/bash
#SBATCH --job-name=calculate_pi
#SBATCH --output=calculate_pi.out
#SBATCH --error=calculate_pi.err
#SBATCH --time=00:05:00
#SBATCH --nodes=1
#SBATCH --mem=1G  # Allocates 1 GB of total memory to the job

# Load necessary modules
module load gcc

# Run the executable
srun ./calculate_pi 1000000000
"""

# Write the SLURM job script to a file
with open(slurm_script_path, "w") as file:
    file.write(slurm_script)

# Confirm the file has been written
print(f"SLURM job script written to {slurm_script_path}.")

# Make the script executable
os.chmod(slurm_script_path, 0o755)

# Read and print the contents of the SLURM job script
with open(slurm_script_path, "r") as file:
    script_content = file.read()

print("\nContents of the SLURM job script:")
print("----------------------------------")
print(script_content)




SLURM job script written to calculate_pi.slurm.

Contents of the SLURM job script:
----------------------------------
#!/bin/bash
#SBATCH --job-name=calculate_pi
#SBATCH --output=calculate_pi.out
#SBATCH --error=calculate_pi.err
#SBATCH --time=00:05:00
#SBATCH --nodes=1
#SBATCH --mem=1G  # Allocates 1 GB of total memory to the job

# Load necessary modules
module load gcc

# Run the executable
srun ./calculate_pi 1000000000



### Submitting and Monitoring a SLURM Job in Jupyter Notebook

This section of the notebook demonstrates how to submit a SLURM job using the `sbatch` command and monitor its status using the `squeue` command. We will execute these commands directly from the Jupyter Notebook using the `!` syntax, which allows us to run shell commands in a more interactive manner.

#### Submitting the SLURM Job

We use the `sbatch` command to submit a job to the SLURM scheduler. The job script `calculate_pi.slurm` contains instructions for the SLURM workload manager on how to execute the task. This script specifies the resources needed and the executable to run.

#### Allowing Time for Job Queueing
To ensure that the job is queued before we check its status, we include a short delay using Python's time.sleep() function. This is crucial as SLURM may take a few moments to update the queue, especially in busy environments.

#### Checking the Job Status
After submitting the job, we use the squeue command to check the status of jobs in the queue. This command lists all jobs that are currently queued or running, allowing us to monitor the status of our job.


In [25]:
import time
import subprocess

# Submit the SLURM job using the `!` syntax for direct shell command execution
!sbatch {"calculate_pi.slurm"}

# Wait for a few seconds to ensure the job is queued
time.sleep(3)

# Check the status of the job queue
!squeue

Submitted batch job 47
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                31 cpubase_b spawner-  user001  R      49:39      1 node1
                47 cpubase_b calculat  user001  R       0:03      1 node2


### Examining SLURM Job Output and Error Files

Once a SLURM job is submitted and executed, it generates output and error files specified in the job script. These files contain valuable information about the execution of the program, including any results printed to the console and any error messages that occurred during execution.

#### Understanding Output and Error Files

##### Output File (`calculate_pi.out`) **

The output file contains the standard output from the program execution. This includes any `printf` statements or other console outputs generated by the C program. In our case, this file will contain the approximate value of Pi calculated by our program.

##### Error File (`calculate_pi.err`)

The error file captures any standard error messages produced by the program. This includes any compilation or runtime errors, warnings, or other messages that are sent to the error stream.

#### Code to Display the Contents of Output and Error Files

Let's write code to read and display the contents of these files, allowing us to verify the results and diagnose any potential issues.


In [27]:
import os

# Paths to the output and error files
output_file = "calculate_pi.out"
error_file = "calculate_pi.err"

# Check and display the contents of the output file
if os.path.exists(output_file):
    print(f"\nContents of {output_file}:")
    print("----------------------------------")
    with open(output_file, "r") as file:
        output_content = file.read()
        print(output_content)
else:
    print(f"\n{output_file} does not exist.")

# Check and display the contents of the error file
if os.path.exists(error_file):
    print(f"\nContents of {error_file}:")
    print("----------------------------------")
    with open(error_file, "r") as file:
        error_content = file.read()
        print(error_content)
else:
    print(f"\n{error_file} does not exist.")



Contents of calculate_pi.out:
----------------------------------
Approximation of Pi: 3.141561


Contents of calculate_pi.err:
----------------------------------



# Understanding `srun` in SLURM

In SLURM, both `sbatch` and `srun` are used to execute jobs on an HPC cluster, but they serve different purposes and are used in distinct scenarios. Understanding when to use each command is essential for effective job management and resource utilization.

## `sbatch` vs. `srun`

### `sbatch`

- **Purpose**: Submits a batch job script to the scheduler to be executed at a later time when resources become available.
- **Usage**: Primarily used for batch processing of non-interactive tasks, where you write a script with job specifications and submit it to the queue.
- **Execution**: The job runs according to the specified resources and constraints in the SLURM script without user interaction during execution.

### `srun`

- **Purpose**: Launches parallel tasks and can be used for both interactive and non-interactive job execution.
- **Usage**: Often used for interactive jobs or to launch parallel tasks within an already scheduled job.
- **Execution**: `srun` can be used to run tasks interactively on compute nodes or to start tasks within a running job environment, providing more flexibility for dynamic task execution.

## When to Use `srun`

- **Interactive Jobs**: Use `srun` to start an interactive session on a compute node for testing, debugging, or running tasks interactively.
- **Within Scripts**: Use `srun` within an `sbatch` script to launch parallel tasks that require coordination across multiple CPUs or nodes.
- **Dynamic Execution**: Use `srun` to dynamically allocate resources and run tasks without needing to pre-write a batch script.

## Example Usage

We will demonstrate how to use `srun` to run a simple interactive job and a parallel computation task.



In [None]:
# Use srun to start an interactive session on a compute node
# Note: This command is typically run in a terminal, not directly executable in a Jupyter Notebook.

!srun --pty bash -i

# Explanation:
# --pty: Allocates a pseudo-terminal from the compute node allocated, allowing interactive command execution.
# bash -i: Starts an interactive bash shell session.


]0;user001@node1:~/hpc_lab_24[user001@node1 hpc_lab_24]$ 

## Interactive SLURM Usage in Jupyter Terminal

This guide will help you explore SLURM commands interactively within a Jupyter terminal. By practicing these commands, you'll gain familiarity with job scheduling, monitoring, and resource management on an HPC cluster.

### 1. Access the Shell in Jupyter

#### Open a New Terminal

- **Open a Launcher**: Click on the `+` icon or `New Launcher` to open the launcher.
- **Select Terminal**: From the launcher, click on `Terminal` to open a new shell session. This terminal acts like a login node interface.

### 2. Run Basic Linux Commands

Before diving into SLURM, familiarize yourself with some basic Linux commands to navigate and manage your files.

- **List Files and Directories**: 
    - Run the command `ls` to show the content of the current folder.
  
- **Print Current Directory**:
    - Run the command `pwd` to display the current directory path.

### 3. SLURM Commands for Job Management

Learn how to interact with SLURM to manage and monitor your computational jobs.

- **Check Available Partitions**:
  - Run the command `sinfo` to display available partitions and their status. This is useful for determining resource availability and node types.

- **View Job Queue**:
  - Run the command `squeue` to show the current job queue. This command displays jobs currently running or waiting, along with their IDs, user names, and statuses.

- **Submit a Job Script**:
  - Use the command `sbatch calculate_pi.slurm` to submit a batch job to the SLURM scheduler for execution when resources are available. Replace `calculate_pi.slurm` with the name of your actual job script.

- **Check Your Job Status**:
  - Use `squeue -u $USER` to list all jobs submitted by the current user, allowing you to monitor their progress and status.

- **Cancel a Job**:
  - Run `scancel <job_id>` to cancel a job specified by its job ID. Replace `<job_id>` with the actual job ID you wish to cancel.

### 4. Running Interactive Jobs

Explore interactive job sessions to dynamically test and run tasks on compute nodes.

- **Start an Interactive Session**:
  - Use `srun --pty bash -i` to allocate resources and start an interactive bash session on a compute node. This is ideal for debugging and interactive computations.

  **What You Can Do**:
  - Run commands interactively.
  - Test scripts with immediate feedback.
  - Explore resource usage in real-time.

### 5. Analyze Job Performance with `sacct`

After jobs have completed, use `sacct` to gather detailed information about their execution.

- **View Completed Job Details**:
  - Run `sacct --format=JobID,JobName,User,State,Elapsed,CPUTime,MaxRSS` to provide detailed statistics for completed jobs, such as CPU time, memory usage, and job state. This helps in understanding job performance and resource utilization.

## Discussion and Reflection

- **Efficiency**: Reflect on how interactive SLURM commands enhance your ability to manage computational workloads effectively.
- **Troubleshooting**: Consider how interactive sessions can assist in diagnosing job issues and refining scripts.
- **Further Exploration**: Explore additional SLURM commands and options to optimize job scheduling and resource allocation.

By following this guide, you will gain hands-on experience with SLURM and Linux shell commands, equipping you with the skills needed to navigate and utilize HPC resources effectively.


## Understanding `sacct` in SLURM

The `sacct` command in SLURM is used to report accounting information about jobs and job steps that are managed by the SLURM workload manager. It provides detailed information about the jobs, such as resource usage, runtime statistics, and job states, which are crucial for performance analysis and optimization.

### Key Features of `sacct`

- **Job and Step Information**: `sacct` provides data on both jobs and individual job steps, offering insights into how resources were utilized at each stage of execution.
- **Comprehensive Metrics**: Reports on CPU time, memory usage, job states, exit codes, and more, helping users identify bottlenecks or inefficiencies.
- **Historical Data**: Accesses records of past jobs, allowing users to review previous job performances and resource consumption.

### Common `sacct` Options

- `-j <job_id>`: Specifies a particular job ID to retrieve information for that job.
- `--format`: Customizes the output format by specifying the fields to display.
- `--starttime`: Limits the report to jobs that started after a specified time.
- `-a` or `--allusers`: Displays information for all users (requires admin privileges).

### Example Usage

We'll demonstrate how to use `sacct` to view detailed information about completed jobs, including custom formatting options.



#### Retrieving Job IDs of Previous Jobs Using `sacct`

The `sacct` command is useful for retrieving detailed information about past jobs, including their job IDs, which can be essential for tracking, debugging, or resubmitting jobs. By querying jobs based on a specific start time or other criteria, we can easily identify and work with past job records.

##### Example: Retrieve Job IDs for Jobs Started After a Specific Date

In this example, we will use `sacct` to list jobs that started after a specified date, focusing on displaying their job IDs, names, and other relevant metrics.



In [None]:
# Use sacct to view jobs started after a specific date, focusing on retrieving job IDs
!sacct --starttime=2024-08-01 --format=JobID,JobName,User,State,Elapsed,CPUTime,MaxRSS

# Explanation:
# --starttime=2024-08-01: Restricts the report to jobs that started on or after August 1, 2024.
#  --format: Customizes the output to show JobID, JobName, User, State, Elapsed time, CPUTime, and MaxRSS (maximum resident set size).


### Here we use the job id 12, update it with any job id you want

In [None]:
# Use sacct to retrieve information about the most recent jobs
# Note: This command is typically run in a terminal or a Jupyter Notebook with shell access.

!sacct -j 42 --format=JobID,JobName,User,State,Time,MaxRSS,MaxVMSize

# Explanation:
# -j <job_id>: Replace <job_id> with your job ID to view specific job details.
# --format: Specifies the fields to display. In this example, JobID, JobName, User, State, Time, MaxRSS (maximum resident set size), and MaxVMSize (maximum virtual memory size) are shown.


In [None]:
# Use sacct to view jobs started after a specific date
!sacct --starttime=2024-08-01 --format=JobID,JobName,User,State,Elapsed,CPUTime,MaxRSS

# Explanation:
# --starttime=2024-08-01: Restricts the report to jobs that started on or after August 1, 2024.
# --format: Customizes the output format to include job runtime and resource metrics like CPUTime and MaxRSS.


In [None]:
!sacct -N node2 --starttime 2024-08-10


## HPC Job Submission with Multiple Nodes in SLURM

In this notebook, we will explore how to submit a job that utilizes two nodes in an HPC cluster using the SLURM workload manager. We will cover basic job submission scripts, monitoring job status, and retrieving output. We will use GROMACS, a popular molecular dynamics simulation package, to demonstrate parallel execution using MPI.

### What is GROMACS?

GROMACS (GROningen MAchine for Chemical Simulations) is a powerful and versatile package for molecular dynamics, primarily designed for simulating biomolecular systems such as proteins, lipids, and nucleic acids. It is capable of running on various types of computer architectures and can scale efficiently on HPC systems.

#### Running GROMACS with `gmx_mpi mdrun`

The `gmx_mpi mdrun` command is the main computational engine of GROMACS. It performs molecular dynamics simulations using the input files generated by pre-processing tools. The `-s` option specifies the input file containing the molecular system's topology, which is typically a `.tpr` file. For this example, we run a short simulation designed to complete quickly.

#### Basic Job Submission

To run a GROMACS simulation on two nodes, we need to create a SLURM batch script. This script will specify the resources required and the commands to execute.

#### Creating a SLURM Job Script

Below is a Python code snippet to create a SLURM script for running a GROMACS job on two nodes using MPI:



In [12]:
import os

# Define the SLURM job script path
slurm_script_path = "quick_gromacs_job.slurm"

# Remove existing SLURM script if it exists
if os.path.exists(slurm_script_path):
    os.remove(slurm_script_path)

# Create a SLURM job script for a short GROMACS run
slurm_script = """#!/bin/bash
#SBATCH --job-name=quick_gromacs_job    # Job name
#SBATCH --output=quick_gromacs_job.out  # Standard output
#SBATCH --error=quick_gromacs_job.err   # Standard error
#SBATCH --time=00:05:00                 # Time limit of 5 minutes
#SBATCH --nodes=2                       # Number of nodes
#SBATCH --ntasks-per-node=1             # Number of tasks per node
#SBATCH --mem=2G                        # Allocates 1 GB of total memory per node

# Load necessary modules
module load gromacs

# Run a short GROMACS simulation using MPI
mpirun -np 2 gmx_mpi mdrun -s short_topol.tpr -nsteps 100
"""

# Write the SLURM job script to a file
with open(slurm_script_path, "w") as file:
    file.write(slurm_script)

# Confirm the file has been written
print(f"SLURM job script written to {slurm_script_path}.")

# Make the script executable
os.chmod(slurm_script_path, 0o755)

# Read and print the contents of the SLURM job script
with open(slurm_script_path, "r") as file:
    script_content = file.read()

print("\nContents of the SLURM job script:")
print("----------------------------------")
print(script_content)


SLURM job script written to quick_gromacs_job.slurm.

Contents of the SLURM job script:
----------------------------------
#!/bin/bash
#SBATCH --job-name=quick_gromacs_job    # Job name
#SBATCH --output=quick_gromacs_job.out  # Standard output
#SBATCH --error=quick_gromacs_job.err   # Standard error
#SBATCH --time=00:05:00                 # Time limit of 5 minutes
#SBATCH --nodes=1                       # Number of nodes
#SBATCH --ntasks-per-node=1             # Number of tasks per node
#SBATCH --mem=2G                        # Allocates 1 GB of total memory per node

# Load necessary modules
module load gromacs

# Run a short GROMACS simulation using MPI
mpirun -np 2 gmx_mpi mdrun -s short_topol.tpr -nsteps 100



In [13]:
import time
import subprocess

# Submit the SLURM job using the `!` syntax for direct shell command execution
!sbatch {"quick_gromacs_job.slurm"}

# Wait for a few seconds to ensure the job is queued
time.sleep(2)

# Check the status of the job queue
!squeue

Submitted batch job 41
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                39 cpubase_b quick_gr  user001 PD       0:00      2 (Resources)
                41 cpubase_b quick_gr  user001 PD       0:00      1 (Priority)
                31 cpubase_b spawner-  user001  R      23:46      1 node1


In [14]:
# Wait for a few seconds to ensure the job is queued
time.sleep(5)

# Check the status of the job queue
!squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                39 cpubase_b quick_gr  user001 PD       0:00      2 (Resources)
                31 cpubase_b spawner-  user001  R      23:58      1 node1
                41 cpubase_b quick_gr  user001  R       0:01      1 node2


#### Wait until the job has finished to get the output

In [15]:
import os

# Paths to the output and error files
output_file = "quick_gromacs_job.out"
error_file = "quick_gromacs_job.err"

# Check and display the contents of the output file
if os.path.exists(output_file):
    print(f"\nContents of {output_file}:")
    print("----------------------------------")
    with open(output_file, "r") as file:
        output_content = file.read()
        print(output_content)
else:
    print(f"\n{output_file} does not exist.")

# Check and display the contents of the error file
if os.path.exists(error_file):
    print(f"\nContents of {error_file}:")
    print("----------------------------------")
    with open(error_file, "r") as file:
        error_content = file.read()
        print(error_content)
else:
    print(f"\n{error_file} does not exist.")



Contents of quick_gromacs_job.out:
----------------------------------


Contents of quick_gromacs_job.err:
----------------------------------
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

  gmx_mpi

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to t