# VASP Recommendations Analysis

In this study, the ESIF VASP Benchmarks (https://github.com/NREL/ESIFHPC3/tree/code-examples/VASP) 1 and 2 were used. Benchmark 1 is a system of 16 atoms (Cu<sub>4</sub>In<sub>4</sub>Se<sub>8</sub>), and Benchmark 2 is a system of 519 atoms (Ag<sub>504</sub>C<sub>4</sub>H<sub>10</sub>S<sub>1</sub>). 

On Swift, the default builds of VASP installed on the system as modules were used. The Intel MPI build was built with Intel compilers and the mkl math library, and it was accessed via the "vaspintel" module. The OpenMPI build was compiled with gnu using gcc and fortran compilers and used OpenMPI's math libraries, and was accessed via the "vasp" module. Both builds run VASP 6.1.1. 

On Eagle, the default build of VASP installed on the system is an Intel MPI version of VASP. The Intel MPI build was built with Intel compilers and the mkl math library, and it was accessed via the "vasp" module. It runs VASP 6.1.2. No Open MPI VASP build is accessible through the default modules on Eagle, but an Open MPI build can be accessed in an environment via "source /nopt/nrel/apps/210830a/myenv.2108301742, ml vasp/6.1.1-l2mkbb2". The OpenMPI build was compiled with gnu using gcc and fortran compilers and used OpenMPI's math libraries. It runs VASP 6.1.1.

The VASP repo contains scripts that can be used to run the Intel MPI and Open MPI builds used in the study to perform calculations on Swift and Eagle.

# Table of Contents :
* [Running VASP on Eagle](#Running-VASP-on-Eagle)
> * [Recommendations for Running VASP on Eagle](#Recommendations-for-Running-VASP-on-Eagle)
> * [Benchmark 2](#Benchmark-2-on-Eagle)
>> * [Running on half-full nodes](#Running-on-half-full-nodes-on-Eagle)
>> * [Running on GPUs](#Running-on-GPUs-on-Eagle-with-Benchmark-2)
>> * [Comparing CPUs/Node and GPUs](#Comparing-CPUs/Node-and-GPUs)
>> * [MPI](#MPI-on-Eagle-with-Benchmark-2)
>> * [cpu-bind](#cpu-bind-on-Eagle-with-Benchmark-2)
> * [Benchmark 1](#Benchmark-1-on-Eagle)
>> * [KPOINTS Scaling](#KPOINTS-Scaling-on-Eagle)
>> * [Changing KPAR and NPAR](#Changing-KPAR-and-NPAR-on-Eagle)
>> * [Running on GPUs](#Running-on-GPUs-on-Eagle-with-Benchmark-1)
>> * [MPI](#MPI-on-Eagle-with-Benchmark-1)
>> * [cpu-bind](#cpu-bind-on-Eagle-with-Benchmark-1)
* [Running VASP On Swift](#Running-VASP-On-Swift)
> * [Recommendations for Running VASP on Swift](#Recommendations-for-Running-VASP-on-Swift)
> * [Benchmark 2](#Benchmark-2-on-Swift)
>> * [Using Different CPUs/Node](#Using-Different-CPUs/Node)
>> * [MPI](#MPI-on-Swift-with-Benchmark-2)
>> * [cpu-bind](#cpu-bind-on-Swift-with-Benchmark-2)
> * [Benchmark 1](#Benchmark-1-on-Swift)
>> * [KPOINTS Scaling](#KPOINTS-Scaling-on-Swift)
>> * [Changing KPAR and NPAR](#Changing-KPAR-and-NPAR-on-Swift)
>> * [MPI](#MPI-on-Swift-with-Benchmark-1)
>> * [cpu-bind](#cpu-bind-on-Swift-with-Benchmark-1)

# Import necessary packages

In [16]:
import numpy as np
import pandas as pd
import datetime as dt
from matplotlib import pyplot as plt
import os
import plotly.express as px
import plotly.io as io

<IPython.core.display.Javascript object>

In [17]:
# load formatting extension, not necessary
%load_ext nb_black

The nb_black extension is already loaded. To reload it, use:
  %reload_ext nb_black


<IPython.core.display.Javascript object>

In [18]:
# read aggregate data file
data = pd.read_csv("aggregate_data.csv")

<IPython.core.display.Javascript object>

In [19]:
# remove lines with calculations that have errors, which is marked in the energy column
# marked if no energy value is produced or if an error is found by inspecting raw data
data_valid = data[data.Energy != "error"]

<IPython.core.display.Javascript object>

In [20]:
# get time expressed in seconds
def get_seconds(row):
    time_str = row["Runtime"]
    baseline = "00:00:00"
    read_time = dt.datetime.strptime(time_str, "%H:%M:%S")
    read_baseline = dt.datetime.strptime(baseline, "%H:%M:%S")
    total_time = read_time - read_baseline
    seconds = total_time.total_seconds()
    return seconds


data_valid["Runtime(s)"] = data_valid.apply(get_seconds, axis=1)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



<IPython.core.display.Javascript object>

In [21]:
# correct spelling inconsistencies
# only correcting intel_impi inconsistency and HPC System inconsistency here because these are the only one that has been causing issues so far.
# Haven't checked for others yet
def intel_impi_label(row):
    if (
        row["MPI"] == "intel-impi"
        or row["MPI"] == "intel_impi"
        or row["MPI"] == "intelmpi"
    ):
        return "intelmpi"
    if (
        row["MPI"] == "intel_impi_clara"
        or row["MPI"] == "intel_impi-clara"
        or row["MPI"] == "intelmpi_clara"
    ):
        return "intel_impi_clara"
    else:
        return row["MPI"]


def HPC_sys_label(row):
    if row["HPC System"] == "Swift" or row["HPC System"] == "swift":
        return "Swift"
    elif row["HPC System"] == "Eagle" or row["HPC System"] == "eagle":
        return "Eagle"
    if row["HPC System"] == "Vermillion" or row["HPC System"] == "vermillion":
        return "Vermillion"
    else:
        return row["HPC System"]


data_valid["MPI"] = data_valid.apply(intel_impi_label, axis=1)
data_valid["HPC System"] = data_valid.apply(HPC_sys_label, axis=1)
data_valid = data_valid[data_valid["MPI"] != "intelmpi_tim"]
data_valid = data_valid[data_valid["MPI"] != "intel_impi_clara"]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



<IPython.core.display.Javascript object>

In [22]:
# create a new column with run time/num. electronic steps
def get_scaled_time(row):
    time = row["Runtime(s)"]
    elec_steps = row["Electronic Steps"]
    time_per_step = float(time) / float(elec_steps)
    return time_per_step


data_scaled_elec = data_valid
data_scaled_elec["time_per_step"] = data_valid.apply(get_scaled_time, axis=1)

<IPython.core.display.Javascript object>

In [23]:
# create a new column that calculates a "rate" from the scaled time values calculated in the previous cell
def get_scaled_rate(row):
    time_per_step = row["time_per_step"]
    rate = 1.0 / float(time_per_step)
    return rate


data_scaled_elec["scaled_rate"] = data_scaled_elec.apply(get_scaled_rate, axis=1)

<IPython.core.display.Javascript object>

In [24]:
def get_processor(row):
    partition = row["Partition"]
    if "gpu" in partition:
        return "GPU"
    else:
        return "CPU"


data_scaled_elec["Processor"] = data_scaled_elec.apply(get_processor, axis=1)

<IPython.core.display.Javascript object>

In [25]:
# view data scaled by number of electronic steps
data_scaled_elec

Unnamed: 0,Date,HPC System,Job ID,Partition,Benchmark Code,kpoints,math library,MPI,cpu-bind,Nodes,...,Energy,Electronic Steps,Runtime,KPAR,NPAR,node_fill,Runtime(s),time_per_step,scaled_rate,Processor
0,Thu Jun 30 16:25:35 MDT 2022,Swift,635515,parallel,2,1x1x1,mkl,intelmpi,cores,4,...,-1.27E+03,49,0:56:47,1,D,full,3407.0,69.530612,0.014382,CPU
1,2022-06-10T16:35:03,Swift,613797,parallel,2,1x1x1,mkl,intelmpi,cores,8,...,-1.27E+03,35,1:30:34,1,1,virtual,5434.0,155.257143,0.006441,CPU
2,Fri Jan 21 01:29:49 MST 2022,Eagle,8183122,short,1,10x10x5,mkl,intelmpi,cores,16,...,-6.03E+01,13,0:27:21,9,sqrt,full,1641.0,126.230769,0.007922,CPU
3,Fri Jan 21 01:50:17 MST 2022,Eagle,8183123,short,1,4x4x2,mkl,intelmpi,cores,16,...,-6.03E+01,14,0:00:41,9,sqrt,full,41.0,2.928571,0.341463,CPU
4,Thu Dec 2 21:45:36 MST 2021,Eagle,7952615,short,2,1x1x1,mkl,intelmpi,cores,16,...,-1.27E+03,36,0:25:49,1,D,full,1549.0,43.027778,0.023241,CPU
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1817,Mon Jul 11 12:09:04 MDT 2022,Swift,648422,parallel,1,8x8x4,mkl,intelmpi,cores,8,...,-6.03E+01,15,0:02:56,4,4,full,176.0,11.733333,0.085227,CPU
1818,Mon Jul 11 12:15:29 MDT 2022,Swift,648428,parallel,1,8x8x4,openmpi,openmpi,rank,8,...,-6.03E+01,15,0:06:15,4,4,full,375.0,25.000000,0.040000,CPU
1819,Mon Jul 11 12:18:48 MDT 2022,Swift,648434,parallel,1,8x8x4,mkl,intelmpi,rank,8,...,-6.03E+01,15,0:02:51,4,4,full,171.0,11.400000,0.087719,CPU
1820,Mon Jul 11 12:25:37 MDT 2022,Swift,648440,parallel,1,8x8x4,openmpi,openmpi,none,8,...,-6.03E+01,15,0:06:35,4,4,full,395.0,26.333333,0.037975,CPU


<IPython.core.display.Javascript object>

In [26]:
# define a function that calculates performance time scaling two values of a given parameter for a data frame


def get_scaling(df, column, val1, val2):
    kpoints_increase = []
    df_val1 = df[df[column] == val1]
    df_val2 = df[df[column] == val2]
    for cores in np.unique(df_val1["Cores"]):
        if cores in np.unique(df_val2["Cores"]):
            df_val1_time = df_val1[df_val1["Cores"] == cores]["time_per_step"]
            df_val2_time = df_val2[df_val2["Cores"] == cores]["time_per_step"]
            df_val1_avg = np.average(df_val1_time.to_numpy())
            df_val2_avg = np.average(df_val2_time.to_numpy())
            kpoints_increase.append(float(df_val1_avg) / float(df_val2_avg))
    return np.average(kpoints_increase)

<IPython.core.display.Javascript object>

In [27]:
# define a function that calculates performance time scaling two values of a given parameter for a data frame


def get_scaling_nodes(df, column, val1, val2):
    kpoints_increase = []
    df_val1 = df[df[column] == val1]
    df_val2 = df[df[column] == val2]
    for cores in np.unique(df_val1["Nodes"]):
        if cores in np.unique(df_val2["Nodes"]):
            df_val1_time = df_val1[df_val1["Nodes"] == cores]["time_per_step"]
            df_val2_time = df_val2[df_val2["Nodes"] == cores]["time_per_step"]
            df_val1_avg = np.average(df_val1_time.to_numpy())
            df_val2_avg = np.average(df_val2_time.to_numpy())
            kpoints_increase.append(float(df_val1_avg) / float(df_val2_avg))
    return np.average(kpoints_increase)

<IPython.core.display.Javascript object>

# Running VASP on Eagle

In [28]:
data_eagle = data_scaled_elec[data_scaled_elec["HPC System"] == "Eagle"]

<IPython.core.display.Javascript object>

## Recommendations for Running VASP on Eagle
### Recommended CPUs/Node

Run VASP on full nodes (36 CPUs/node). While using fewer cores per node yields an improvement in runtime/core, it will results in a larger allocation charge, as Eagle charges per node used, regardless of how much of the node is used. Eagle charges 3AUs/node-hour. 

### VASP on GPU Nodes

Two versions of the vasp_gpu executable are available on Eagle. The older vasp_gpu build on Eagle is built using CUDA, and the newer version is built using OpenACC. The OpenACC GPU build was found to run in an average of 27.7% of the time as the old GPU build using Benchmark 2. Little runtime improvement (compared to running on CPUs) was seen using the older, CUDA GPU build. Find a script for running the OpenACC GPU build in [this section](#Scripts-for-Running-VASP-on-Eagle).

Running the OpenACC GPU build of VASP (vasp_gpu) on GPU nodes improves performance for larger VASP calculations, but may increase the runtime for smaller calculations. Benchmark 2 calculations on using the OpenACC build on GPU nodes ran in an average of 27.7% of the time as CPU calculations on the same number of nodes, but Benchmark 1 calculationson using the OpenACC build on GPU nodes ran in an average of 150% of the time as CPU calculations on the same number of nodes.

   * Memory limitation: GPU nodes on Eagle cannot provide as much memory as CPU nodes for VASP jobs, and large VASP jobs may require more GPU nodes to provide enough memory for the calculation. For Benchmark 2, at least 2 full nodes were needed to provide enough memory to complete a calculation. Using more complicated parallelization schemes, the number of nodes necessary to provide enough memory scaled with the increase in number of problems handled simultaneousely. 

### MPI

Intel MPI is recommended over Open MPI. Using an Intel MPI build of VASP and running over Intel MPI, Benchmark 2 ran in average of 50% of the time as the same calculations using an Open MPI build of VASP over Open MPI. For Benchmark 1, Intel MPI calculations ran in an average of 63.5% of the time as Open MPI calculcations. 

### --cpu-bind Flag
The --cpu-bind flag changes how tasks are assigned to cores throughout the node. Setting --cpu-bind=cores or rank showed no improvement in the performance of VASP on 36 CPUs/node. Using 18 CPUs/node setting --cpu-bind=cores shows a small improvement in runtime (~5% decrease) using both Intel MPI and Open MPI.

cpu-bind can be set as a flag in an srun command, such as 
```
srun --cpu-bind=cores vasp_std
```
### KPAR

KPAR determines the number of groups across which to divide calculations at each kpoint, and one calculation from each group is performed at a time. The value of KPAR can be defined in the INCAR file. 
  * Per [VASP documentation](https://www.vasp.at/wiki/index.php/KPAR), the KPAR tag file should always be set to a value that evenly divides the total number of cores used. 
  * We found that runtime starts to increase (in other words, runtime scales positively with the number of nodes, rather than negatively) if you increase the number of nodes past the value of KPAR, so it is recommended to set KPAR no lower than the number of nodes used. 
  * Lower values of KPAR might be better for lower node counts. For example, Benchmark 1 calculations on 1-4 nodes were fastest using KPAR=4, but Benchmark 1 calculations on more than 4 nodes were fastest using KPAR=9.
  * The [GPU VASP documentation](https://www.vasp.at/wiki/index.php/OpenACC_GPU_port_of_VASP) recommends setting KPAR equal to the total number of GPUs used. Our results are relatively consistent with this recommendation. 

### K-Points Scaling

Runtime does not scale well with the number of kpoints. Benchmark 1 uses a 10x10x5 kpoints grid (500 kpoints). When run with a 4x4x2 kpoints grid (16 kpoints), we should expect the runtime to scale by 16/500 (3.2%) since calculations are being performed at 16 points rather than 500. However, the average scaling factor between Benchmark 1 jobs on Eagle with 10x10x5 grids and 4x4x2 grids is 28% (ranging from ~20%-57%). 

## Benchmark 2 on Eagle
Benchmark 2 is a system of 519 atoms (Ag<sub>504</sub>C<sub>4</sub>H<sub>10</sub>S<sub>1</sub>)

In [29]:
data_eagle_2 = data_eagle[data_eagle["Benchmark Code"] == "2"]

<IPython.core.display.Javascript object>

### Running on half-full nodes on Eagle

We ran Benchmark 2 on Eagle with 18 CPUs/Node (half-full) to compare the performance to running on 36 CPUs/Node

In [224]:
data_eagle_2_no_gpu = data_eagle_2[data_eagle_2["Partition"] != "gpu"]
data_eagle_2_no_gpu_intel = data_eagle_2_no_gpu[
    data_eagle_2_no_gpu["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_eagle_2_no_gpu_intel,
    x="Nodes",
    y="scaled_rate",
    color="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Bench. 2 on Eagle: half (18 CPUs/node) vs. full (36 CPUs/node) nodes with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [223]:
data_eagle_2_no_gpu = data_eagle_2[data_eagle_2["Partition"] != "gpu"]
data_eagle_2_no_gpu_open = data_eagle_2_no_gpu[data_eagle_2_no_gpu["MPI"] == "openmpi"]

fig = px.scatter(
    data_eagle_2_no_gpu_open,
    x="Nodes",
    y="scaled_rate",
    color="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Bench. 2 on Eagle: half (18 CPUs/node) vs. full (36 CPUs/node) nodes with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

### Running on GPUs on Eagle with Benchmark 2

We ran Benchmark 2 on Eagle's GPUs, using 2 GPUs/node. Two versions of the vasp_gpu executable are available on Eagle. The older vasp_gpu build on Eagle is built using CUDA, and the newer version is built using OpenACC. In the graphs below, the old gpu build is labeled as "gres:2", and the OpenACC build is labeled as "gres:2_OpenACC."

In [32]:
data_eagle_2_cut = data_eagle_2[data_eagle_2["Nodes"] < 17]
data_eagle_2_cut = data_eagle_2_cut[data_eagle_2_cut["node_fill"] != "partial"]
data_eagle_2_cut = data_eagle_2_cut[data_eagle_2_cut["node_fill"] != "half"]
data_eagle_2_cut = data_eagle_2_cut[data_eagle_2_cut["MPI"] == "intelmpi"]

# to get plot colors to match Benchmark 1 GPU plot colors
"""
def updata_gpus(row):
    if row["node_fill"] == "gres:2_openACC":
        return "gres:1_openACC"
    else:
        return row["node_fill"]


data_eagle_2_cut["node_fill"] = data_eagle_2_cut.apply(updata_gpus, axis=1)
"""

data_eagle_2_cut = data_eagle_2_cut.sort_values(by=["node_fill"])

fig = px.scatter(
    data_eagle_2_cut,
    x="Nodes",
    y="scaled_rate",
    color="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: CPUs vs. GPUs with Intel MPI",
)
fig.show()
print(
    "'full'=36 CPUs/node, 'gres:2'=old gpu build with 2 GPUs/node, 'gres:2_OpenACC'=OpenACC gpu build with 2 GPUs/node"
)

'full'=36 CPUs/node, 'gres:2'=old gpu build with 2 GPUs/node, 'gres:2_OpenACC'=OpenACC gpu build with 2 GPUs/node


<IPython.core.display.Javascript object>

In [33]:
data_eagle_2_intel = data_eagle_2[data_eagle_2["MPI"] == "intelmpi"]
proportion_gpu = get_scaling_nodes(data_eagle_2_intel, "node_fill", "gres:2", "full")
proportion_gpu_openacc = get_scaling_nodes(
    data_eagle_2_intel, "node_fill", "gres:2_openACC", "full"
)

print(
    f"Using the same number of nodes, calculations with two gpus per node using the old gpu build with Intel MPI nodes run in an average of {proportion_gpu:.4f} the amount of time as calculations on full nodes.",
)

print("\n")

print(
    f"Using the same number of nodes, calculations with two gpus per node using the OpenACC gpu build with Intel MPI run in an average of {proportion_gpu_openacc:.4f} the amount of time as calculations on full nodes.",
)

Using the same number of nodes, calculations with two gpus per node using the old gpu build with Intel MPI nodes run in an average of 0.9659 the amount of time as calculations on full nodes.


Using the same number of nodes, calculations with two gpus per node using the OpenACC gpu build with Intel MPI run in an average of 0.2767 the amount of time as calculations on full nodes.


<IPython.core.display.Javascript object>

### Comparing CPUs/Node and GPUs

Comparing runtime using 18 CPUs/Node, 36 CPUs/Node and 2 GPUs/Node

In [34]:
def get_average_runtime(system, benchmark, node_fill, MPI, time_metric, **kwargs):
    for key, value in kwargs.items():
        key = key
    if system == "Swift":
        if benchmark == 1:
            data = data_swift_1
        elif benchmark == 2:
            data = data_swift_2
    elif system == "Eagle":
        if benchmark == 1:
            data = data_eagle_1
        elif benchmark == 2:
            data = data_eagle_2
    data = data[data["node_fill"] == node_fill]
    data = data[data["MPI"] == MPI]
    if key == ("nodes" or "Nodes"):
        data = data[data["Nodes"] == value]
    elif key == ("cores" or "Cores"):
        data = data[data["Cores"] == value]
    return np.mean(data[time_metric].to_numpy())

<IPython.core.display.Javascript object>

In [35]:
from tabulate import tabulate

table_tps_eagle_2 = [
    [
        " ",
        "IntelMPI, N=1",
        "IntelMPI, N=2",
        "IntelMPI, N=4",
    ],
    [
        "36 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'full', 'intelmpi','time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'intelmpi','time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'intelmpi','time_per_step', nodes=4):.2f}",
    ],
    [
        "18 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi','time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi','time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi','time_per_step', nodes=4):.2f}",
    ],
    [
        "2 GPUs/Node (OpenACC)",
        "",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi','time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi','time_per_step', nodes=4):.2f}",
    ],
    [
        "18 CPUs / 36 CPUs",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi','time_per_step', nodes=1) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'time_per_step', nodes=1) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi','time_per_step', nodes=2) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'time_per_step', nodes=2) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi','time_per_step', nodes=4) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'time_per_step', nodes=4) :.2f}",
    ],
    [
        "2 GPUs / 36 CPUs",
        "",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi','time_per_step', nodes=2) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'time_per_step', nodes=2) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi','time_per_step', nodes=4) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'time_per_step', nodes=4) :.2f}",
    ],
]

table_rt_eagle_2 = [
    [
        " ",
        "IntelMPI, N=1",
        "IntelMPI, N=2",
        "IntelMPI, N=4",
    ],
    [
        "36 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
    ],
    [
        "18 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
    ],
    [
        "2 GPUs/Node (OpenACC)",
        "",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
    ],
    [
        "18 CPUs / 36 CPUs",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=1) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=1) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=2) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=2) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=4) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=4) :.2f}",
    ],
    [
        "2 GPUs / 36 CPUs",
        "",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', nodes=2) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=2) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', nodes=4) / get_average_runtime('Eagle', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=4) :.2f}",
    ],
]

print("Average Runtime per Electronic Step (s) Using Intel MPI")
print(tabulate(table_tps_eagle_2, headers="firstrow", tablefmt="fancy_grid"))
print("Average Total Runtime (s) to complete one job Using Intel MPI")
print(tabulate(table_rt_eagle_2, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime per Electronic Step (s) Using Intel MPI
╒═══════════════════════╤═════════════════╤═════════════════╤═════════════════╕
│                       │ IntelMPI, N=1   │   IntelMPI, N=2 │   IntelMPI, N=4 │
╞═══════════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ 36 CPUs/Node          │ 175.00          │           88.17 │           66.9  │
├───────────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ 18 CPUs/Node          │ 295.21          │          141.21 │           74.04 │
├───────────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ 2 GPUs/Node (OpenACC) │                 │           25.61 │           17.58 │
├───────────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ 18 CPUs / 36 CPUs     │ 1.69            │            1.6  │            1.11 │
├───────────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ 2 GPUs / 36 CPUs      │                 │            0.29 │   

<IPython.core.display.Javascript object>

In [36]:
table_tps_eagle_2 = [
    [
        " ",
        "OpenMPI, N=1",
        "OpenMPI, N=2",
        "OpenMPI, N=4",
    ],
    [
        "36 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'full', 'openmpi','time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'openmpi','time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'openmpi','time_per_step', nodes=4):.2f}",
    ],
    [
        "18 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi','time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi','time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi','time_per_step', nodes=4):.2f}",
    ],
    [
        "18 CPUs / 36 CPUs",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi','time_per_step', nodes=1) / get_average_runtime('Eagle', 2, 'full', 'openmpi', 'time_per_step', nodes=1) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi','time_per_step', nodes=2) / get_average_runtime('Eagle', 2, 'full', 'openmpi', 'time_per_step', nodes=2) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi','time_per_step', nodes=4) / get_average_runtime('Eagle', 2, 'full', 'openmpi', 'time_per_step', nodes=4) :.2f}",
    ],
]

table_rt_eagle_2 = [
    [
        " ",
        "OpenMPI, N=1",
        "OpenMPI, N=2",
        "OpenMPI, N=4",
    ],
    [
        "36 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'full', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'full', 'openmpi', 'Runtime(s)', nodes=4):.2f}",
    ],
    [
        "18 CPUs/Node",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi', 'Runtime(s)', nodes=4):.2f}",
    ],
    [
        "18 CPUs / 36 CPUs",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi', 'Runtime(s)', nodes=1) / get_average_runtime('Eagle', 2, 'full', 'openmpi', 'Runtime(s)', nodes=1) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi', 'Runtime(s)', nodes=2) / get_average_runtime('Eagle', 2, 'full', 'openmpi', 'Runtime(s)', nodes=2) :.2f}",
        f"{get_average_runtime('Eagle', 2, 'half', 'openmpi', 'Runtime(s)', nodes=4) / get_average_runtime('Eagle', 2, 'full', 'openmpi', 'Runtime(s)', nodes=4) :.2f}",
    ],
]

print("Average Runtime per Electronic Step (s) using Open MPI")
print(tabulate(table_tps_eagle_2, headers="firstrow", tablefmt="fancy_grid"))
print("Average Total Runtime (s) to complete one job using Open MPI")
print(tabulate(table_rt_eagle_2, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime per Electronic Step (s) using Open MPI
╒═══════════════════╤════════════════╤════════════════╤════════════════╕
│                   │   OpenMPI, N=1 │   OpenMPI, N=2 │   OpenMPI, N=4 │
╞═══════════════════╪════════════════╪════════════════╪════════════════╡
│ 36 CPUs/Node      │         236.79 │         152.51 │         122.21 │
├───────────────────┼────────────────┼────────────────┼────────────────┤
│ 18 CPUs/Node      │         396.11 │         205.29 │         138.55 │
├───────────────────┼────────────────┼────────────────┼────────────────┤
│ 18 CPUs / 36 CPUs │           1.67 │           1.35 │           1.13 │
╘═══════════════════╧════════════════╧════════════════╧════════════════╛
Average Total Runtime (s) to complete one job using Open MPI
╒═══════════════════╤════════════════╤════════════════╤════════════════╕
│                   │   OpenMPI, N=1 │   OpenMPI, N=2 │   OpenMPI, N=4 │
╞═══════════════════╪════════════════╪════════════════╪════════════════╡
│ 36 CPU

<IPython.core.display.Javascript object>

### MPI on Eagle with Benchmark 2

We ran Benchmark 2 using Intel MPI and Open MPI and compared the performance. The two versions were compiled as described at the top of this document and run with the respective MPI. 

In [37]:
data_eagle_2_full = data_eagle_2[data_eagle_2["node_fill"] == "full"]
data_eagle_2_half = data_eagle_2[data_eagle_2["node_fill"] == "half"]
data_eagle_2_gpu = data_eagle_2[data_eagle_2["Partition"] == "gpu"]

<IPython.core.display.Javascript object>

In [38]:
fig = px.scatter(
    data_eagle_2_full,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: Intel MPI vs. Open MPI on Full Nodes (36 CPUs/Node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [39]:
proportion = get_scaling(data_eagle_2_full, "MPI", "intelmpi", "openmpi")
print(
    f"On full nodes, calculations using intel mpi on full nodes run in an average of {proportion:.4f} the amount of time as calculations using open mpi.",
)

On full nodes, calculations using intel mpi on full nodes run in an average of 0.5108 the amount of time as calculations using open mpi.


<IPython.core.display.Javascript object>

In [40]:
fig = px.scatter(
    data_eagle_2_half,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: Intel MPI vs. Open MPI on Half-Full Nodes (18 CPUs/Node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [41]:
proportion = get_scaling(data_eagle_2_half, "MPI", "intelmpi", "openmpi")
print(
    f"On full nodes, calculations using intel mpi on half-full run in an average of {proportion:.4f} the amount of time as calculations using open mpi.",
)

On full nodes, calculations using intel mpi on half-full run in an average of 0.6010 the amount of time as calculations using open mpi.


<IPython.core.display.Javascript object>

### cpu-bind on Eagle with Benchmark 2

The --cpu-bind flag changes how tasks are assigned to cores throughout the node. We ran Benchmark 2 on Eagle with cpu-bind=cores and cpu-bind=rank to compare the performance to running without setting cpu-bind.

In [42]:
data_eagle_2_full_open = data_eagle_2_full[data_eagle_2_full["MPI"] == "openmpi"]

fig = px.scatter(
    data_eagle_2_full_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: cpu-bind on Full Nodes (36 CPUs/Node) with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [43]:
proportion_rank = get_scaling(data_eagle_2_full_open, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on full nodes with Open MPI using --cpu-bind=rank run in {proportion_rank:.4f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_eagle_2_full_open, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on full nodes with Open MPI using --cpu-bind=cores run in {proportion_cores:.4f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on full nodes with Open MPI using --cpu-bind=rank run in 0.9891 the amount of time as calculations without --cpu-bind.

On average, calculations on full nodes with Open MPI using --cpu-bind=cores run in 1.0029 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [44]:
data_eagle_2_full_intel = data_eagle_2_full[data_eagle_2_full["MPI"] == "intelmpi"]

fig = px.scatter(
    data_eagle_2_full_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: cpu-bind on Full Nodes (36 CPUs/Node) with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [45]:
proportion_rank = get_scaling(data_eagle_2_full_intel, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on full nodes with Intel MPI using --cpu-bind=rank run in {proportion_rank:.4f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_eagle_2_full_intel, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on full nodes with Intel MPI using --cpu-bind=cores run in {proportion_cores:.4f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on full nodes with Intel MPI using --cpu-bind=rank run in 0.9650 the amount of time as calculations without --cpu-bind.

On average, calculations on full nodes with Intel MPI using --cpu-bind=cores run in 1.0490 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [46]:
data_eagle_2_half_open = data_eagle_2_half[data_eagle_2_half["MPI"] == "openmpi"]

fig = px.scatter(
    data_eagle_2_half_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: cpu-bind on Half-Full Nodes (18 CPUs/Node) with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [47]:
proportion_rank = get_scaling(data_eagle_2_half_open, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on half-filled nodes with Open MPI using --cpu-bind=rank run in {proportion_rank:.4f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_eagle_2_half_open, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on half-filled nodes with Open MPI using --cpu-bind=cores run in {proportion_cores:.4f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on half-filled nodes with Open MPI using --cpu-bind=rank run in 1.1881 the amount of time as calculations without --cpu-bind.

On average, calculations on half-filled nodes with Open MPI using --cpu-bind=cores run in 0.9766 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [216]:
data_eagle_2_half_intel = data_eagle_2_half[data_eagle_2_half["MPI"] == "intelmpi"]

fig = px.scatter(
    data_eagle_2_half_intel,
    x="Nodes",
    y="scaled_rate",
    color="cpu-bind",
    symbol="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: cpu-bind on Half-Full Nodes (18 CPUs/Node) with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [49]:
proportion_rank = get_scaling(data_eagle_2_half_intel, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on half-filled nodes with Intel MPI using --cpu-bind=rank run in {proportion_rank:.4f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_eagle_2_half_intel, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on half-filled nodes with Intel MPI using --cpu-bind=cores run in {proportion_cores:.4f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on half-filled nodes with Intel MPI using --cpu-bind=rank run in 1.1841 the amount of time as calculations without --cpu-bind.

On average, calculations on half-filled nodes with Intel MPI using --cpu-bind=cores run in 0.9515 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [50]:
data_eagle_2_gpu_old = data_eagle_2_gpu[data_eagle_2_gpu["node_fill"] == "gres:2"]

fig = px.scatter(
    data_eagle_2_gpu_old,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Eagle: cpu-bind on GPU Nodes using the old GPU Build",
)
fig.show()

<IPython.core.display.Javascript object>

In [51]:
proportion_rank = get_scaling(data_eagle_2_gpu_old, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on gpu nodes using the old gpu build with --cpu-bind=rank run in {proportion_rank:.4f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_eagle_2_gpu_old, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on gpu nodes using the old build with with --cpu-bind=cores run in {proportion_cores:.4f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on gpu nodes using the old gpu build with --cpu-bind=rank run in 1.0529 the amount of time as calculations without --cpu-bind.

On average, calculations on gpu nodes using the old build with with --cpu-bind=cores run in 1.0045 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

## Benchmark 1 on Eagle
Benchmark 1 is a system of 16 atoms (Cu<sub>4</sub>In<sub>4</sub>Se<sub>8</sub>). Because it is smalled than Benchmark 2, Benchmark 1 was used to explore kpoints scaling as well as changes in performance due to the KPAR and NPAR tags.

In [53]:
data_eagle_1 = data_eagle[data_eagle["Benchmark Code"] == "1"]

<IPython.core.display.Javascript object>

### KPOINTS Scaling on Eagle

To get a good idea of scaling between calculations with 4x4x2 kpoints grids and 10x10x5 kpoints grids, look at the graphs in the following section. Each graph is followed by an average kpoints scaling for the given KPAR/NPAR configuration, which gives the average amount of time needed to run a 4x4x2 calculation expressed as a percentage of the time needed to run a 10x10x5 calculation on the same number of cores. The averages are compiled in tables at the end of the next section to compare kpoints scaling across KPAR/NPAR combinations.

KPOINTS scaling data and KPAR/NPAR data was only generated using Intel MPI, as it was found to be much faster than Open MPI.

### Changing KPAR and NPAR on Eagle

KPAR and NPAR are both tags that determine the parallelization scheme for VASP. KPAR determines the number of groups across which kpoint calculations are divided, and one kpoint from each group is worked on by the machine at a time. NPAR determines the number of bands that are treated at the same time. 

By default, VASP sets KPAR = 1 and NPAR = # of cores. This is represented in the first graph and is used as a baseline to determine runtime improvements from using other combinations of KPAR and NPAR. VASP ESIF Benchmark 1 is conigured to be run with KPAR = 9 and NPAR = 4, and in this analysis, we consider combinations of KPAR = 1,4,9 and NPAR = 1, # of cores, sqrt(# of cores). 

The graphs in this section show the performance of each KPAR/NPAR combination at different node counts. Each graph is followed by values that give the average runtime for the given KPAR/NPAP configuration expressed as a percentage of the default KPAR/NPAR configuration (KPAR=1, NPAR=# of cores).

kpoints scaling for the given KPAR/NPAR configuration, which gives the average amount of time needed to run a 4x4x2 calculation expressed as a percentage of the time needed to run a 10x10x5 calculation on the same number of cores. The averages are compiled in tables at the end of the next section to compare kpoints scaling across KPAR/NPAR combinations.

In [54]:
data_eagle_1_no_gpu = data_eagle_1[data_eagle_1["Partition"] != "gpu"]
data_eagle_1_no_gpu = data_eagle_1[data_eagle_1["node_fill"] == "full"]

data_eagle_1_K1 = data_eagle_1_no_gpu[data_eagle_1["KPAR"] == 1]
data_eagle_1_K4 = data_eagle_1_no_gpu[data_eagle_1["KPAR"] == 4]
data_eagle_1_K9 = data_eagle_1_no_gpu[data_eagle_1["KPAR"] == 9]

data_eagle_1_K1_N4 = data_eagle_1_K1[data_eagle_1_K1["NPAR"] == "4"]
data_eagle_1_K1_ND = data_eagle_1_K1[data_eagle_1_K1["NPAR"] == "D"]
data_eagle_1_K1_Nsqrt = data_eagle_1_K1[data_eagle_1_K1["NPAR"] == "sqrt"]
data_eagle_1_K4_ND = data_eagle_1_K4[data_eagle_1_K4["NPAR"] == "D"]
data_eagle_1_K9_N4 = data_eagle_1_K9[data_eagle_1_K9["NPAR"] == "4"]
data_eagle_1_K4_Nsqrt = data_eagle_1_K4[data_eagle_1_K4["NPAR"] == "sqrt"]
data_eagle_1_K4_N4 = data_eagle_1_K4[data_eagle_1_K4["NPAR"] == "4"]
data_eagle_1_K9_Nsqrt = data_eagle_1_K9[data_eagle_1_K9["NPAR"] == "sqrt"]


Boolean Series key will be reindexed to match DataFrame index.


Boolean Series key will be reindexed to match DataFrame index.


Boolean Series key will be reindexed to match DataFrame index.



<IPython.core.display.Javascript object>

In [55]:
# define a new function that alters the get_scaling function to compare two KPAR/NPAR configurations


def find_nearest(array, value):
    array = [int(i) for i in array]
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]


def get_scaling_KN(df1, df2, kpoints):
    kpoints_increase = []
    df1 = df1[df1["kpoints"] == kpoints]
    df2 = df2[df2["kpoints"] == kpoints]
    for cores_1 in np.unique(df1["Cores"]):
        cores_2 = find_nearest(np.unique(df2["Cores"]), cores_1)
        if np.abs(cores_2 - cores_1) < 3:
            df1_time = df1[df1["Cores"] == cores_1]["time_per_step"]
            df2_time = df2[df2["Cores"] == cores_2]["time_per_step"]
            df1_avg = np.average(df1_time.to_numpy())
            df2_avg = np.average(df2_time.to_numpy())
            kpoints_increase.append(float(df1_avg) / float(df2_avg))
    return np.average(kpoints_increase)

<IPython.core.display.Javascript object>

In [56]:
fig = px.scatter(
    data_eagle_1_K1_ND,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=1, NPAR=# of cores with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [57]:
proportion_K1_ND = get_scaling(data_eagle_1_K1_ND, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K1_ND:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

proportion_4x4x2_K1_ND = get_scaling_KN(data_eagle_1_K1_ND, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5_K1_ND = get_scaling_KN(
    data_eagle_1_K1_ND, data_eagle_1_K1_ND, "10x10x5"
)

On average, 4x4x2 kpoints grid calculations run in 0.2021 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [58]:
fig = px.scatter(
    data_eagle_1_K1_N4,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=1, NPAR=4 with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [59]:
proportion_K1_N4 = get_scaling(data_eagle_1_K1_N4, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K1_N4:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2241 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [60]:
proportion_4x4x2_K1_N4 = get_scaling_KN(data_eagle_1_K1_N4, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5_K1_N4 = get_scaling_KN(
    data_eagle_1_K1_N4, data_eagle_1_K1_ND, "10x10x5"
)
print(
    f"\nOn average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 ran in {proportion_4x4x2_K1_N4:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 ran in {proportion_10x10x5_K1_N4:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)


On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 ran in 0.4615 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 ran in 0.5315 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [61]:
fig = px.scatter(
    data_eagle_1_K1_Nsqrt,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=1, NPAR=sqrt with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [62]:
proportion_K1_Nsqrt = get_scaling(data_eagle_1_K1_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K1_Nsqrt:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI."
)

On average, 4x4x2 kpoints grid calculations run in 0.2276 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [63]:
proportion_4x4x2_K1_Nsqrt = get_scaling_KN(
    data_eagle_1_K1_Nsqrt, data_eagle_1_K1_ND, "4x4x2"
)
proportion_10x10x5_K1_Nsqrt = get_scaling_KN(
    data_eagle_1_K1_Nsqrt, data_eagle_1_K1_ND, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in {proportion_4x4x2_K1_Nsqrt:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI."
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in {proportion_10x10x5_K1_Nsqrt:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI."
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in 0.6876 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in 0.5152 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [220]:
fig = px.scatter(
    data_eagle_1_K4_ND,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Nodes",
        "Partition",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=4, NPAR=# of cores with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [219]:
proportion_K4_ND = get_scaling(data_eagle_1_K4_ND, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K4_ND:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2642 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [66]:
proportion_4x4x2_K4_ND = get_scaling_KN(data_eagle_1_K4_ND, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5_K4_ND = get_scaling_KN(
    data_eagle_1_K4_ND, data_eagle_1_K1_ND, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in {proportion_4x4x2_K4_ND:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in {proportion_10x10x5_K4_ND:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in 0.3692 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in 0.4101 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [67]:
data_eagle_1_K9_N4_intel = data_eagle_1_K9_N4[data_eagle_1_K9_N4["MPI"] == "intelmpi"]


data_eagle_1_K9_N4_intel = data_eagle_1_K9_N4_intel.sort_values(by=["kpoints"])

fig = px.scatter(
    data_eagle_1_K9_N4_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=9, NPAR=4 with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [68]:
proportion_K9_N4 = get_scaling(data_eagle_1_K9_N4_intel, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K9_N4:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.5706 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [69]:
proportion_4x4x2_K9_N4 = get_scaling_KN(data_eagle_1_K9_N4, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5_K9_N4 = get_scaling_KN(
    data_eagle_1_K9_N4, data_eagle_1_K1_ND, "10x10x5"
)
print(
    "On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4",
    proportion_4x4x2_K9_N4,
    "the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    "\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 with Intel MPI",
    proportion_10x10x5_K9_N4,
    "the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.567565649763784 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 with Intel MPI 0.4452716210121061 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [70]:
fig = px.scatter(
    data_eagle_1_K4_Nsqrt,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=4, NPAR=sqrt with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [71]:
proportion_K4_Nsqrt = get_scaling(data_eagle_1_K4_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K4_Nsqrt:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2230 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [72]:
proportion_4x4x2_K4_Nsqrt = get_scaling_KN(
    data_eagle_1_K4_Nsqrt, data_eagle_1_K1_ND, "4x4x2"
)
proportion_10x10x5_K4_Nsqrt = get_scaling_KN(
    data_eagle_1_K4_Nsqrt, data_eagle_1_K1_ND, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in) {proportion_4x4x2_K4_Nsqrt:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in) {proportion_10x10x5_K4_Nsqrt:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in) 0.3579 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in) 0.5626 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [73]:
fig = px.scatter(
    data_eagle_1_K4_N4,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=4, NPAR=4 with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [74]:
proportion_K4_N4 = get_scaling(data_eagle_1_K4_N4, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K4_N4:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2436 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [75]:
proportion_4x4x2_K4_N4 = get_scaling_KN(data_eagle_1_K4_N4, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5_K4_N4 = get_scaling_KN(
    data_eagle_1_K4_N4, data_eagle_1_K1_ND, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=4 {proportion_4x4x2_K4_N4:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=4 {proportion_10x10x5_K4_N4:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=4 0.3759 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=4 0.4265 the amount of time needed for calculations with the default KPAR/NPAR settings Intel MPI.


<IPython.core.display.Javascript object>

In [76]:
fig = px.scatter(
    data_eagle_1_K9_Nsqrt,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: KPAR=9, NPAR=sqrt(# of cores) with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [77]:
proportion_K9_Nsqrt = get_scaling(data_eagle_1_K9_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_K9_Nsqrt:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2575 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [78]:
proportion_4x4x2_K9_Nsqrt = get_scaling_KN(
    data_eagle_1_K9_Nsqrt, data_eagle_1_K1_ND, "4x4x2"
)
proportion_10x10x5_K9_Nsqrt = get_scaling_KN(
    data_eagle_1_K9_Nsqrt, data_eagle_1_K1_ND, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) {proportion_4x4x2_K9_Nsqrt:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) {proportion_10x10x5_K9_Nsqrt:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) 0.4161 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) 0.3885 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [79]:
data_eagle_1_gpu = data_eagle_1[data_eagle_1["Partition"] == "gpu"]


def to_string(row):
    return str(row["KPAR"])


data_eagle_1_gpu["KPAR"] = data_eagle_1_gpu.apply(to_string, axis=1)

fig = px.scatter(
    data_eagle_1_gpu,
    x="Nodes",
    y="scaled_rate",
    color="KPAR",
    # symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: OpenACC GPU build with Intel MPI and a 4x4x2 kpoints grid"
    # Changing KPAR for OpenACC GPU build with Intel MPI",
)
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



<IPython.core.display.Javascript object>

In [80]:
data_eagle_1_gpu_KPAR1 = data_eagle_1_gpu[data_eagle_1_gpu["KPAR"] == "1"]
data_eagle_1_gpu_KPAR4 = data_eagle_1_gpu[data_eagle_1_gpu["KPAR"] == "4"]
data_eagle_1_gpu_KPAR9 = data_eagle_1_gpu[data_eagle_1_gpu["KPAR"] == "9"]

proportion_KPAR1_gpu = get_scaling(
    data_eagle_1_gpu_KPAR1, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, OpenACC gpu calculations using KPAR=1 with a 4x4x2 kpoint grid run in {proportion_KPAR1_gpu:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

proportion_KPAR4_gpu = get_scaling(
    data_eagle_1_gpu_KPAR4, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, OpenACC gpu calculations using KPAR=4 with a 4x4x2 kpoint grid run in {proportion_KPAR4_gpu:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

proportion_KPAR9_gpu = get_scaling(
    data_eagle_1_gpu_KPAR9, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, OpenACC gpu calculations using KPAR=9 with a 4x4x2 kpoint grid run in {proportion_KPAR9_gpu:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, OpenACC gpu calculations using KPAR=1 with a 4x4x2 kpoint grid run in 0.2176 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.
On average, OpenACC gpu calculations using KPAR=4 with a 4x4x2 kpoint grid run in 0.2907 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.
On average, OpenACC gpu calculations using KPAR=9 with a 4x4x2 kpoint grid run in 0.2577 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [81]:
data_eagle_1_gpu_4x4x2 = data_eagle_1_gpu[data_eagle_1_gpu["kpoints"] == "4x4x2"]

proportion_KPAR1_4x4x2_gpu = get_scaling(data_eagle_1_gpu_4x4x2, "KPAR", "1", "1")
proportion_KPAR4_4x4x2_gpu = get_scaling(data_eagle_1_gpu_4x4x2, "KPAR", "4", "1")
proportion_KPAR9_4x4x2_gpu = get_scaling(data_eagle_1_gpu_4x4x2, "KPAR", "9", "1")
print(
    f"On average, OpenACC gpu calculations using a 4x4x2 KPOINTS grid with KPAR=4 ran in {proportion_KPAR4_4x4x2_gpu:.4f} the amount of time needed for calculations with the default KPAR=1.",
)
print(
    f"\nOn average, OpenACC gpu calculations using a 4x4x2 KPOINTS grid with KPAR=9 ran in {proportion_KPAR9_4x4x2_gpu:.4f} the amount of time needed for calculations with the default KPAR=1.",
)

data_eagle_1_gpu_10x10x5 = data_eagle_1_gpu[data_eagle_1_gpu["kpoints"] == "10x10x5"]

proportion_KPAR1_10x10x5_gpu = get_scaling(data_eagle_1_gpu_10x10x5, "KPAR", "1", "1")
proportion_KPAR4_10x10x5_gpu = get_scaling(data_eagle_1_gpu_10x10x5, "KPAR", "4", "1")
proportion_KPAR9_10x10x5_gpu = get_scaling(data_eagle_1_gpu_10x10x5, "KPAR", "9", "1")
print(
    f"\nOn average, OpenACC gpu calculations using a 10x10x5 KPOINTS grid with KPAR=4 ran in {proportion_KPAR4_10x10x5_gpu:.4f} the amount of time needed for calculations with the default KPAR=1.",
)
print(
    f"\nOn average, OpenACC gpu calculations using a 10x10x5 KPOINTS grid with KPAR=9 ran in {proportion_KPAR9_10x10x5_gpu:.4f} the amount of time needed for calculations with the default KPAR=1.",
)

On average, OpenACC gpu calculations using a 4x4x2 KPOINTS grid with KPAR=4 ran in 0.6493 the amount of time needed for calculations with the default KPAR=1.

On average, OpenACC gpu calculations using a 4x4x2 KPOINTS grid with KPAR=9 ran in 0.5841 the amount of time needed for calculations with the default KPAR=1.

On average, OpenACC gpu calculations using a 10x10x5 KPOINTS grid with KPAR=4 ran in 0.4918 the amount of time needed for calculations with the default KPAR=1.

On average, OpenACC gpu calculations using a 10x10x5 KPOINTS grid with KPAR=9 ran in 0.5021 the amount of time needed for calculations with the default KPAR=1.


<IPython.core.display.Javascript object>

In [82]:
table_KPAR_NPAR_CPU = [
    [
        " ",
        "KPOINTS Scaling",
        "4x4x2 % of Default Runtime",
        "10x10x5 % of Default Runtime",
    ],
    [
        "KPAR=1, NPAR=# of cores",
        f"{100*proportion_K1_ND:.2f}%",
        f"{100*proportion_4x4x2_K1_ND:.2f}%",
        f"{100*proportion_10x10x5_K1_ND:.2f}%",
    ],
    [
        "KPAR=1, NPAR=4",
        f"{100*proportion_K1_N4:.2f}%",
        f"{100*proportion_4x4x2_K1_N4:.2f}%",
        f"{100*proportion_10x10x5_K1_N4:.2f}%",
    ],
    [
        "KPAR=1, NPAR=sqrt",
        f"{100*proportion_K1_Nsqrt:.2f}%",
        f"{100*proportion_4x4x2_K1_Nsqrt:.2f}%",
        f"{100*proportion_10x10x5_K1_Nsqrt:.2f}%",
    ],
    [
        "KPAR=4, NPAR=# of cores",
        f"{100*proportion_K4_ND:.2f}%",
        f"{100*proportion_4x4x2_K4_ND:.2f}%",
        f"{100*proportion_10x10x5_K4_ND:.2f}",
    ],
    [
        "KPAR=4,NPAR=sqrt(# of cores)",
        f"{100*proportion_K4_Nsqrt:.2f}%",
        f"{100*proportion_4x4x2_K4_Nsqrt:.2f}%",
        f"{100*proportion_10x10x5_K4_Nsqrt:.2f}%",
    ],
    [
        "KPAR=4, NPAR=4",
        f"{100*proportion_K4_N4:.2f}%",
        f"{100*proportion_4x4x2_K4_N4:.2f}%",
        f"{100*proportion_10x10x5_K4_N4:.2f}%",
    ],
    [
        "KPAR=9, NPAR=4",
        f"{100*proportion_K9_N4:.2f}%",
        f"{100*proportion_4x4x2_K9_N4:.2f}%",
        f"{100*proportion_10x10x5_K9_N4:.2f}%",
    ],
    [
        "KPAR=9,NPAR=sqrt(# of cores)",
        f"{100*proportion_K9_Nsqrt:.2f}%",
        f"{100*proportion_4x4x2_K9_Nsqrt:.2f}%",
        f"{100*proportion_10x10x5_K9_Nsqrt:.2f}%",
    ],
]

print("KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on CPUs with Intel MPI")
print(tabulate(table_KPAR_NPAR_CPU, headers="firstrow", tablefmt="fancy_grid"))
print(
    "The 'KPOINTS Scaling' column gives the average 4x4x2 runtime as a percentage of the 10x10x5 runtime. The '4x4x2 % of Default Runtime' gives the average runtime for the given KPAR/NPAR configuration with a 4x4x2 grid as a percentage of the default (KPAR=1, NPAR=# of cores) runtime with a 4x4x2 grid. The '10x10x5 % of Default Runtime' shows the equivalent for calculations using a 10x10x5 grid."
)

KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on CPUs with Intel MPI
╒══════════════════════════════╤═══════════════════╤══════════════════════════════╤════════════════════════════════╕
│                              │ KPOINTS Scaling   │ 4x4x2 % of Default Runtime   │ 10x10x5 % of Default Runtime   │
╞══════════════════════════════╪═══════════════════╪══════════════════════════════╪════════════════════════════════╡
│ KPAR=1, NPAR=# of cores      │ 20.21%            │ 100.00%                      │ 100.00%                        │
├──────────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=1, NPAR=4               │ 22.41%            │ 46.15%                       │ 53.15%                         │
├──────────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=1, NPAR=sqrt            │ 22.76%            │ 68.76%                       │ 51.52%             

<IPython.core.display.Javascript object>

In [83]:
table_KPAR_NPAR_GPU = [
    [
        " ",
        "KPOINTS Scaling",
        "4x4x2 % of Default Runtime",
        "10x10x5 % of Default Runtime",
    ],
    [
        "KPAR=1, NPAR=4",
        f"{100*proportion_KPAR1_gpu:.2f}%",
        f"{100*proportion_KPAR1_4x4x2_gpu:.2f}%",
        f"{100*proportion_KPAR1_10x10x5_gpu:.2f}%",
    ],
    [
        "KPAR=4, NPAR=4",
        f"{100*proportion_KPAR4_gpu:.2f}%",
        f"{100*proportion_KPAR4_4x4x2_gpu:.2f}%",
        f"{100*proportion_KPAR4_10x10x5_gpu:.2f}%",
    ],
    [
        "KPAR=9, NPAR=4",
        f"{100*proportion_KPAR9_gpu:.2f}%",
        f"{100*proportion_KPAR9_4x4x2_gpu:.2f}%",
        f"{100*proportion_KPAR9_10x10x5_gpu:.2f}%",
    ],
]

print("KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on GPUs with Intel MPI")
print(tabulate(table_KPAR_NPAR_GPU, headers="firstrow", tablefmt="fancy_grid"))
print(
    "The 'KPOINTS Scaling' column gives the average 4x4x2 runtime as a percentage of the 10x10x5 runtime. The '4x4x2 % of Default Runtime' gives the average runtime for the given KPAR/NPAR configuration with a 4x4x2 grid as a percentage of the default (KPAR=1, NPAR=4) runtime with a 4x4x2 grid. The '10x10x5 % of Default Runtime' shows the equivalent for calculations using a 10x10x5 grid."
)

KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on GPUs with Intel MPI
╒════════════════╤═══════════════════╤══════════════════════════════╤════════════════════════════════╕
│                │ KPOINTS Scaling   │ 4x4x2 % of Default Runtime   │ 10x10x5 % of Default Runtime   │
╞════════════════╪═══════════════════╪══════════════════════════════╪════════════════════════════════╡
│ KPAR=1, NPAR=4 │ 21.76%            │ 100.00%                      │ 100.00%                        │
├────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=4, NPAR=4 │ 29.07%            │ 64.93%                       │ 49.18%                         │
├────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=9, NPAR=4 │ 25.77%            │ 58.41%                       │ 50.21%                         │
╘════════════════╧═══════════════════╧══════════════════════════════╧═════════════════════════════

<IPython.core.display.Javascript object>

### Since KPAR=9, NPAR=4 performed well for on CPUs and GPUs, we will use these jobs to continue our analysis. While KPAR=9, NPAR=4 does not show the fastest average rate, it is the configruation that performs best on high node counts. The other rates for the other configurations plateau before reaching 4 nodes. 

### Running on GPUs on Eagle with Benchmark 1

We ran Benchmark 1 on Eagle's GPUs, using 2 GPUs/node. Benchmark 1 was only run with the OpenACC GPU build. In the graphs below the OpenACC build is labeled as "gres:2_OpenACC."

In [84]:
data_cpu_comp = data_eagle_1_K9_N4_intel[data_eagle_1_K9_N4_intel["Nodes"] < 5]
data_cpu_comp = data_cpu_comp[data_cpu_comp["node_fill"] != "half"]
data_cpu_comp = data_cpu_comp[data_cpu_comp["node_fill"] != "partial"]

data_cpu_gpu_comp = pd.concat([data_cpu_comp, data_eagle_1_gpu_KPAR9])

<IPython.core.display.Javascript object>

In [86]:
data_cpu_gpu_comp_4x4x2 = data_cpu_gpu_comp[data_cpu_gpu_comp["kpoints"] == "4x4x2"]
data_cpu_gpu_comp_4x4x2 = data_cpu_gpu_comp_4x4x2.sort_values(by=["node_fill"])

fig = px.scatter(
    data_cpu_gpu_comp_4x4x2,
    x="Nodes",
    y="scaled_rate",
    color="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: CPUs vs. GPUs with Intel MPI using a 4x4x2 kpoints grid",
)
fig.show()

print("'full'=36 CPUs/Node, 'gres:2_OpenACC'=OpenACC GPU build on 2 GPUs/Node")

'full'=36 CPUs/Node, 'gres:2_OpenACC'=OpenACC GPU build on 2 GPUs/Node


<IPython.core.display.Javascript object>

In [87]:
data_cpu_gpu_comp_10x10x5 = data_cpu_gpu_comp[data_cpu_gpu_comp["kpoints"] == "10x10x5"]
data_cpu_gpu_comp_10x10x5 = data_cpu_gpu_comp_10x10x5.sort_values(by=["node_fill"])

fig = px.scatter(
    data_cpu_gpu_comp_10x10x5,
    x="Nodes",
    y="scaled_rate",
    color="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: CPUs vs. GPUs with Intel MPI using a 10x10x5 kpoints grid",
)
fig.show()

print("'full'=36 CPUs/Node, 'gres:2_OpenACC'=OpenACC GPU build on 2 GPUs/Node")

'full'=36 CPUs/Node, 'gres:2_OpenACC'=OpenACC GPU build on 2 GPUs/Node


<IPython.core.display.Javascript object>

In [88]:
def get_average_runtime_K9N4(
    system, benchmark, node_fill, MPI, time_metric, kpoints, proc, **kwargs
):
    for key, value in kwargs.items():
        key = key
    if proc == "CPU":
        data = data_eagle_1_K9_N4
    elif proc == "GPU":
        data = data_eagle_1_gpu_KPAR9
    data = data[data["kpoints"] == kpoints]
    data = data[data["node_fill"] == node_fill]
    data = data[data["MPI"] == MPI]
    if key == ("nodes" or "Nodes"):
        data = data[data["Nodes"] == value]
    elif key == ("cores" or "Cores"):
        data = data[data["Cores"] == value]
    return np.mean(data[time_metric].to_numpy())

<IPython.core.display.Javascript object>

In [89]:
table_tps_eagle_1 = [
    [
        " ",
        "IntelMPI, N=1",
        "IntelMPI, N=2",
        "IntelMPI, N=4",
        "OpenMPI, N=1",
        "OpenMPI, N=2",
        "OpenMPI, N=4",
    ],
    [
        "36 CPUs/Node",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '4x4x2', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '4x4x2', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '4x4x2', 'CPU', nodes=4):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'time_per_step', '4x4x2', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'time_per_step', '4x4x2', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'time_per_step', '4x4x2', 'CPU', nodes=4):.2f}",
    ],
    [
        "2 GPUs/Nodes",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '4x4x2', 'GPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '4x4x2', 'GPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '4x4x2', 'GPU', nodes=4):.2f}",
        "",
        "",
        "",
        # f"{one_node_avg_tps_eagle_gpu_open_1:.2f}",
        # f"{two_node_avg_tps_eagle_gpu_open_1:.2f}",
        # f"{four_node_avg_tps_eagle_gpu_open_1:.2f}",
    ],
    [
        "2GPUs/36CPUs",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '4x4x2', 'GPU', nodes=1) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '4x4x2', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '4x4x2', 'GPU', nodes=2) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '4x4x2', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '4x4x2', 'GPU', nodes=4) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '4x4x2', 'CPU', nodes=4):.2f}",
        "",
        "",
        "",
        # f"{float(one_node_avg_tps_eagle_gpu_open_1) / float(one_node_avg_tps_eagle_full_open_1):.2f}",
        # f"{float(two_node_avg_tps_eagle_gpu_open_1) / float(two_node_avg_tps_eagle_full_open_1):.2f}",
        # f"{float(four_node_avg_tps_eagle_gpu_open_1) / float(four_node_avg_tps_eagle_full_open_1):.2f}",
    ],
]

table_rt_eagle_1 = [
    [
        " ",
        "IntelMPI, N=1",
        "IntelMPI, N=2",
        "IntelMPI, N=4",
        "OpenMPI, N=1",
        "OpenMPI, N=2",
        "OpenMPI, N=4",
    ],
    [
        "36 CPUs",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=4):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=4):.2f}",
    ],
    [
        "2 GPUs/Node",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '4x4x2', 'GPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '4x4x2', 'GPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '4x4x2', 'GPU', nodes=4):.2f}",
        "",
        "",
        "",
        # f"{one_node_avg_rt_eagle_gpu_open_1:.2f}",
        # f"{two_node_avg_rt_eagle_gpu_open_1:.2f}",
        # f"{four_node_avg_rt_eagle_gpu_open_1:.2f}",
    ],
    [
        "2GPUs/36CPUs",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '4x4x2', 'GPU', nodes=1) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '4x4x2', 'GPU', nodes=2) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '4x4x2', 'GPU', nodes=4) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '4x4x2', 'CPU', nodes=4):.2f}",
        "",
        "",
        "",
        # f"{float(one_node_avg_rt_eagle_gpu_open_1) / float(one_node_avg_rt_eagle_full_open_1):.2f}",
        # f"{float(two_node_avg_rt_eagle_gpu_open_1) / float(two_node_avg_rt_eagle_full_open_1):.2f}",
        # f"{float(four_node_avg_rt_eagle_gpu_open_1) / float(four_node_avg_rt_eagle_full_open_1):.2f}",
    ],
]

print(
    "Average Runtime per Electronic Step (s) for Benchmark 1 with KPAR=4, NPAR=9 and a 4x4x2 KPOINTS grid."
)
print(tabulate(table_tps_eagle_1, headers="firstrow", tablefmt="fancy_grid"))
print(
    "Average Total Runtime (s) to complete one job for Benchmark 1 with KPAR=4, NPAR=9 and a 4x4x2 KPOINTS grid."
)
print(tabulate(table_rt_eagle_1, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime per Electronic Step (s) for Benchmark 1 with KPAR=4, NPAR=9 and a 4x4x2 KPOINTS grid.
╒══════════════╤═════════════════╤═════════════════╤═════════════════╤════════════════╤════════════════╤════════════════╕
│              │   IntelMPI, N=1 │   IntelMPI, N=2 │   IntelMPI, N=4 │ OpenMPI, N=1   │ OpenMPI, N=2   │ OpenMPI, N=4   │
╞══════════════╪═════════════════╪═════════════════╪═════════════════╪════════════════╪════════════════╪════════════════╡
│ 36 CPUs/Node │            8.71 │            5.4  │            3.69 │ 11.73          │ 10.04          │ 8.62           │
├──────────────┼─────────────────┼─────────────────┼─────────────────┼────────────────┼────────────────┼────────────────┤
│ 2 GPUs/Nodes │           10.67 │            6.33 │            5.93 │                │                │                │
├──────────────┼─────────────────┼─────────────────┼─────────────────┼────────────────┼────────────────┼────────────────┤
│ 2GPUs/36CPUs │            1.22 │          

<IPython.core.display.Javascript object>

In [90]:
table_tps_eagle_1 = [
    [
        " ",
        "IntelMPI, N=1",
        "IntelMPI, N=2",
        "IntelMPI, N=4",
        "OpenMPI, N=1",
        "OpenMPI, N=2",
        "OpenMPI, N=4",
    ],
    [
        "36 CPUs/Node",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '10x10x5', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '10x10x5', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '10x10x5', 'CPU', nodes=4):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'time_per_step', '10x10x5', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'time_per_step', '10x10x5', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'time_per_step', '10x10x5', 'CPU', nodes=4):.2f}",
    ],
    [
        "2 GPUs/Node",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '10x10x5', 'GPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '10x10x5', 'GPU', nodes=2):.2f}",
        "",
        "",
        "",
        "",
        # f"{one_node_avg_tps_eagle_gpu_open_1:.2f}",
        # f"{two_node_avg_tps_eagle_gpu_open_1:.2f}",
        # f"{four_node_avg_tps_eagle_gpu_open_1:.2f}",
    ],
    [
        "2GPUs/36CPUs",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '10x10x5', 'GPU', nodes=1) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '10x10x5', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'time_per_step', '10x10x5', 'GPU', nodes=2) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'time_per_step', '10x10x5', 'CPU', nodes=2):.2f}",
        "",
        "",
        "",
        "",
        # f"{float(one_node_avg_tps_eagle_gpu_open_1) / float(one_node_avg_tps_eagle_full_open_1):.2f}",
        # f"{float(two_node_avg_tps_eagle_gpu_open_1) / float(two_node_avg_tps_eagle_full_open_1):.2f}",
        # f"{float(four_node_avg_tps_eagle_gpu_open_1) / float(four_node_avg_tps_eagle_full_open_1):.2f}",
    ],
]

table_rt_eagle_1 = [
    [
        " ",
        "IntelMPI, N=1",
        "IntelMPI, N=2",
        "IntelMPI, N=4",
        "OpenMPI, N=1",
        "OpenMPI, N=2",
        "OpenMPI, N=4",
    ],
    [
        "36 CPUs/Node",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=4):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=2):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'full', 'openmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=4):.2f}",
    ],
    [
        "2 GPUs/Node",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '10x10x5', 'GPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '10x10x5', 'GPU', nodes=2):.2f}",
        "",
        "",
        "",
        "",
        # f"{one_node_avg_rt_eagle_gpu_open_1:.2f}",
        # f"{two_node_avg_rt_eagle_gpu_open_1:.2f}",
        # f"{four_node_avg_rt_eagle_gpu_open_1:.2f}",
    ],
    [
        "2GPUs/36CPUs",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '10x10x5', 'GPU', nodes=1) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=1):.2f}",
        f"{get_average_runtime_K9N4('Eagle', 1, 'gres:2_openACC', 'intelmpi', 'Runtime(s)', '10x10x5', 'GPU', nodes=2) / get_average_runtime_K9N4('Eagle', 1, 'full', 'intelmpi', 'Runtime(s)', '10x10x5', 'CPU', nodes=2):.2f}",
        "",
        "",
        "",
        "",
        # f"{float(one_node_avg_rt_eagle_gpu_open_1) / float(one_node_avg_rt_eagle_full_open_1):.2f}",
        # f"{float(two_node_avg_rt_eagle_gpu_open_1) / float(two_node_avg_rt_eagle_full_open_1):.2f}",
        # f"{float(four_node_avg_rt_eagle_gpu_open_1) / float(four_node_avg_rt_eagle_full_open_1):.2f}",
    ],
]

print(
    "Average Runtime per Electronic Step (s) for Benchmark 1 with KPAR=4, NPAR=9 and a 10x10x5 KPOINTS grid."
)
print(tabulate(table_tps_eagle_1, headers="firstrow", tablefmt="fancy_grid"))
print(
    "Average Total Runtime (s) to complete one job with KPAR=4, NPAR=9 and a 10x10x5 KPOINTS grid."
)
print(tabulate(table_rt_eagle_1, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime per Electronic Step (s) for Benchmark 1 with KPAR=4, NPAR=9 and a 10x10x5 KPOINTS grid.
╒══════════════╤═════════════════╤═════════════════╤═════════════════╤════════════════╤════════════════╤════════════════╕
│              │   IntelMPI, N=1 │   IntelMPI, N=2 │ IntelMPI, N=4   │ OpenMPI, N=1   │ OpenMPI, N=2   │ OpenMPI, N=4   │
╞══════════════╪═════════════════╪═════════════════╪═════════════════╪════════════════╪════════════════╪════════════════╡
│ 36 CPUs/Node │           30.22 │           16.93 │ 10.27           │ 40.83          │ 23.84          │ 15.40          │
├──────────────┼─────────────────┼─────────────────┼─────────────────┼────────────────┼────────────────┼────────────────┤
│ 2 GPUs/Node  │           43.87 │           23.27 │                 │                │                │                │
├──────────────┼─────────────────┼─────────────────┼─────────────────┼────────────────┼────────────────┼────────────────┤
│ 2GPUs/36CPUs │            1.45 │        

<IPython.core.display.Javascript object>

### MPI on Eagle with Benchmark 1
We ran Benchmark 1 using Intel MPI and Open MPI and compared the performance. The two versions were compiled as described at the top of this document and run with the respective MPI. 

In [92]:
data_eagle_1_K9_N4_4x4x2 = data_eagle_1_K9_N4[data_eagle_1_K9_N4["kpoints"] == "4x4x2"]
data_eagle_1_K9_N4_10x10x5 = data_eagle_1_K9_N4[
    data_eagle_1_K9_N4["kpoints"] == "10x10x5"
]

<IPython.core.display.Javascript object>

In [93]:
fig = px.scatter(
    data_eagle_1_K9_N4_4x4x2,
    x="Nodes",
    y="scaled_rate",
    color="MPI",
    # symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: Intel MPI vs. Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [94]:
fig = px.scatter(
    data_eagle_1_K9_N4_10x10x5,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: Intel MPI vs. Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [95]:
data_eagle_1_K9_N4_10x10x5 = data_eagle_1_K9_N4[
    data_eagle_1_K9_N4["kpoints"] == "10x10x5"
]
proportion = get_scaling(data_eagle_1_K9_N4_10x10x5, "MPI", "intelmpi", "openmpi")
print(
    f"Calculations using intel mpi run in an average of {proportion:.4f} the amount of time as calculations using open mpi using a 10x10x5 kpoints grid.",
)

Calculations using intel mpi run in an average of 0.6225 the amount of time as calculations using open mpi using a 10x10x5 kpoints grid.


<IPython.core.display.Javascript object>

In [96]:
data_eagle_1_K9_N4_4x4x2 = data_eagle_1_K9_N4[data_eagle_1_K9_N4["kpoints"] == "4x4x2"]
proportion = get_scaling(data_eagle_1_K9_N4_4x4x2, "MPI", "intelmpi", "openmpi")
print(
    f"Calculations using intel mpi run in an average of {proportion:.4f} the amount of time as calculations using open mpi using a 4x4x2 kpoints grid.",
)

Calculations using intel mpi run in an average of 0.6537 the amount of time as calculations using open mpi using a 4x4x2 kpoints grid.


<IPython.core.display.Javascript object>

### cpu-bind on Eagle with Benchmark 1
The --cpu-bind flag changes how tasks are assigned to cores throughout the node. We ran Benchmark 1 on Eagle with cpu-bind=cores and cpu-bind=rank to compare the performance to running without setting cpu-bind.

In [97]:
data_eagle_1_K9_N4_10x10x5_intel = data_eagle_1_K9_N4_10x10x5[
    data_eagle_1_K9_N4_10x10x5["MPI"] == "intelmpi"
]
data_eagle_1_K9_N4_10x10x5_open = data_eagle_1_K9_N4_10x10x5[
    data_eagle_1_K9_N4_10x10x5["MPI"] == "openmpi"
]
data_eagle_1_K9_N4_4x4x2_intel = data_eagle_1_K9_N4_4x4x2[
    data_eagle_1_K9_N4_4x4x2["MPI"] == "intelmpi"
]
data_eagle_1_K9_N4_4x4x2_open = data_eagle_1_K9_N4_4x4x2[
    data_eagle_1_K9_N4_4x4x2["MPI"] == "openmpi"
]

<IPython.core.display.Javascript object>

In [98]:
fig = px.scatter(
    data_eagle_1_K9_N4_10x10x5_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: Intel MPI with a 10x10x5 grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [99]:
proportion_rank = get_scaling(
    data_eagle_1_K9_N4_10x10x5_intel, "cpu-bind", "rank", "none"
)
proportion_cores = get_scaling(
    data_eagle_1_K9_N4_10x10x5_intel, "cpu-bind", "cores", "none"
)
print(
    f"Calculations using --cpu-bind=rank run in an average of {proportion_rank:.4f} the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Intel MPI.",
)
print(
    f"\nCalculations using --cpu-bind=cores run in an average of {proportion_cores:.4f} the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Intel MPI.",
)

Calculations using --cpu-bind=rank run in an average of 1.0140 the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Intel MPI.

Calculations using --cpu-bind=cores run in an average of 1.0260 the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Intel MPI.


<IPython.core.display.Javascript object>

In [100]:
fig = px.scatter(
    data_eagle_1_K9_N4_4x4x2_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: Intel MPI with a 4x4x2 grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [101]:
proportion_rank = get_scaling(
    data_eagle_1_K9_N4_4x4x2_intel, "cpu-bind", "rank", "none"
)
proportion_cores = get_scaling(
    data_eagle_1_K9_N4_4x4x2_intel, "cpu-bind", "cores", "none"
)
print(
    f"Calculations using --cpu-bind=rank run in an average of {proportion_rank:.4f} the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid  with Intel MPI.",
)
print(
    f"\nCalculations using --cpu-bind=cores run in an average of {proportion_cores:.4f} the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid with Intel MPI.",
)

Calculations using --cpu-bind=rank run in an average of 0.9920 the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid  with Intel MPI.

Calculations using --cpu-bind=cores run in an average of 2.7891 the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid with Intel MPI.


<IPython.core.display.Javascript object>

In [102]:
fig = px.scatter(
    data_eagle_1_K9_N4_10x10x5_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: Open MPI with a 10x10x5 grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [103]:
proportion_rank = get_scaling(
    data_eagle_1_K9_N4_10x10x5_open, "cpu-bind", "rank", "none"
)
proportion_cores = get_scaling(
    data_eagle_1_K9_N4_4x4x2_open, "cpu-bind", "cores", "none"
)
print(
    f"Calculations using --cpu-bind=rank run in an average of {proportion_rank:.4f} the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Open MPI.",
)
print(
    f"\nCalculations using --cpu-bind=cores run in an average of {proportion_cores:.4f} the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Open MPI.",
)

Calculations using --cpu-bind=rank run in an average of 0.9494 the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Open MPI.

Calculations using --cpu-bind=cores run in an average of 0.6560 the amount of time as calculations without --cpu-bind using a 10x10x5 kpoints grid with Open MPI.


<IPython.core.display.Javascript object>

In [104]:
fig = px.scatter(
    data_eagle_1_K9_N4_4x4x2_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
        "Runtime(s)",
        "Job ID",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Eagle: Open MPI with a 4x4x2 grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [105]:
proportion_rank = get_scaling(data_eagle_1_K9_N4_4x4x2_open, "cpu-bind", "rank", "none")
proportion_cores = get_scaling(
    data_eagle_1_K9_N4_4x4x2_open, "cpu-bind", "cores", "none"
)
print(
    f"Calculations using --cpu-bind=rank run in an average of {proportion_rank:.4f} the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid with Open MPI.",
)
print(
    f"\nCalculations using --cpu-bind=cores run in an average of {proportion_cores:.4f} the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid with Open MPI.",
)

Calculations using --cpu-bind=rank run in an average of 0.5906 the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid with Open MPI.

Calculations using --cpu-bind=cores run in an average of 0.6560 the amount of time as calculations without --cpu-bind using a 4x4x2 kpoints grid with Open MPI.


<IPython.core.display.Javascript object>

# Running VASP On Swift

In [106]:
data_swift = data_scaled_elec[data_scaled_elec["HPC System"] == "Swift"]

<IPython.core.display.Javascript object>

## Recommendations for Running VASP on Swift

### Recommended CPUs/Node

On Swift, VASP is most efficiently run on partially full nodes. 32 CPUs/node was found to have the fastest runtime/core, followed by 64 CPUs/node and 128 CPUs/node. Compared to jobs on 64 CPUs/node, jobs on 32 CPUs/node using the same total number of cores ran in 70%-90% of the 64 CPUs/node runtime. On Swift, each node has 64 physical cores, and each core is subdivided into two virtual cores in a processes that is identical to hyperthreading. Because of this, up 128 cores can be requested from a single Swift node, but each core will only represent half of a physical core. 

Unlike on Eagle, Swift charges for only the portion of the node requested by a job, as long as the memory requested for the job is no more than 2GB/CPU. If the entire 256GB of memory is requested per node, but only half of the CPUs per node are requested, you will be charged for the full node. Swift charges 5 AU/hour when running on 128 nodes (one full node), so running on 32 CPUs, for example, would charge only (32/128) * 5 AUs/hour rather than the full 5 AUs/node-hour. 

Unlike on Eagle, multiple jobs can run on the same nodes on Swift. This runtime performance was simualted by the "shared" nodes in the graphs. So if you are only using a fraction of a node, other users' jobs could be assigned to the rest of the node, which we suspect might deteriorate the performance since "shared" nodes in the graphs below are shown to have the slowest rates. Setting "#SBATCH --exclusive" in your run script prevents other users from using the same node as you, but you will be charged the full 5AUs/node, regardless of the number of CPUs/node you are using. In some cases, running on 32 CPUs/node with the --exculsive flag set might minimize your allocation charge. For example, in the "Open MPI, performance/node" graph, we see that 32 CPUs/node shows consistently the fastest runtime per node for all jobs using 2 or more nodes, so using 32 CPUs/node on 2 nodes could complete faster than 64 CPUs/node on 2 nodes. 

The graphs in the KPAR section below help identify the number of CPUs/node will be most efficient in running their jobs. The graphs that show "performance/core" match jobs that use the same number of cores (but different number of nodes depending on CPUs/node), and they show the efficiency per core. These graphs are most useful if you will be charged by CPU (i.e. --exclusive tag is not set). The graphs that show "performance/node" match jobs that use the same number of nodes (but different number of cores depending on CPUs/node), and they show efficiency per node. These graphs are most useful if you will be charged by node (i.e. --exclusive tag is set). You can change this by setting "y=Nodes" or "y=Cores" in the plots' code.

### MPI

Intel MPI is recommended over Open MPI for all VASP calculations on Swift. Using an Intel MPI build of VASP and running over Intel MPI, Benchmark 2 ran in average of 76%, 72% and 46% of the time as the same calculations using an Open MPI build of VASP over Open MPI on 32, 64 and 128 CPUs/node, respectively. For Benchmark 1, Intel MPI calculations ran in an average of 76.89% of the time as Open MPI calculcations. 

### --cpu-bind Flag

The --cpu-bind flag changes how tasks are assigned to cores throughout the node. On Swift, it is recommended not to use cpu-bind. Running VASP on 64 CPUs/node and 128 CPUs/node, setting --cpu-bind=cores or rank showed no improvement in runtime. Running VASP on 32 CPUs/node, setting --cpu-bind=cores or rank increased runtime by up to 40%. 
```
srun --cpu-bind=cores vasp_std
```

### KPAR

KPAR determines the number of groups across which to divide calculations at each kpoint, and one calculation from each group is performed at a time. The value of KPAR can be defined in the INCAR file. 
  * Per [VASP documentation](https://www.vasp.at/wiki/index.php/KPAR), the KPAR tag file should always be set to a value that evenly divides the total number of cores used as well as the total number of KPOINTS. VASP will not run if KPAR does not evenly divide the total number of cores, and so core counts were slightly altered for the jobs represented in the KPAR=9 graph in order to get VASP to run on Swift with KPAR=9. 
  * We found significant difference in runtime for Benchmark 1 between using a value of KPAR that does not evenly divide the number of kpoints and using a value of KPAR that does evenly divide the number of kpoints. In the KPAR=8 and KPAR=9 graphs below, notice that the 4x4x2 kpoints grid (16 total kpoints) runs much faster at all core counts using KPAR=8 than using KPAR=9. For the 10x10x5 kpoints grid (500 kpoints), 500 is not divisible by either 8 or 9, and we notice that the performance is approximately the same for KPAR=8 and KPAR=9. 
  * We found that runtime scales poorly if you increase the number of nodes past the value of KPAR , so it is recommended to set KPAR no lower than the number of nodes used.
  * Lower values of KPAR might be better for lower node counts. For example, Benchmark 1 calculations on 1-4 nodes were fastest using KPAR=4, but Benchmark 1 calculations on more than 4 nodes were fastest using KPAR=8.

      
### K-Points Scaling 

Runtime does not scale well with the number of kpoints. Benchmark 1 uses a 10x10x5 kpoints grid (500 kpoints). When run with a 4x4x2 kpoints grid (16 kpoints), we should expect the runtime to scale by 16/500 (3.2%) since calculations are being performed at 16 points rather than 500. However, the average scaling factor between Benchmark 1 jobs on Swift with 10x10x5 grids and 4x4x2 grids is 28% (ranging from ~19%-39%).

## Benchmark 2 on Swift
Benchmark 2 is a system of 519 atoms (Ag<sub>504</sub>C<sub>4</sub>H<sub>10</sub>S<sub>1</sub>).

In [107]:
data_swift_2 = data_swift[data_swift["Benchmark Code"] == "2"]

<IPython.core.display.Javascript object>

### Running on half filled nodes on Swift

In [108]:
data_swift_2

Unnamed: 0,Date,HPC System,Job ID,Partition,Benchmark Code,kpoints,math library,MPI,cpu-bind,Nodes,...,Energy,Electronic Steps,Runtime,KPAR,NPAR,node_fill,Runtime(s),time_per_step,scaled_rate,Processor
0,Thu Jun 30 16:25:35 MDT 2022,Swift,635515,parallel,2,1x1x1,mkl,intelmpi,cores,4,...,-1.27E+03,49,0:56:47,1,D,full,3407.0,69.530612,0.014382,CPU
1,2022-06-10T16:35:03,Swift,613797,parallel,2,1x1x1,mkl,intelmpi,cores,8,...,-1.27E+03,35,1:30:34,1,1,virtual,5434.0,155.257143,0.006441,CPU
31,Thu Jun 30 13:52:28 MDT 2022,Swift,635505,parallel,2,1x1x1,mkl,intelmpi,cores,8,...,-1.27E+03,35,0:34:13,1,D,full,2053.0,58.657143,0.017048,CPU
45,2021-11-11T15:45:30,Swift,16609,test,2,1x1x1,opnmpi,openmpi,cores,8,...,-1271.6723,35,1:33:04,1,D,full,5584.0,159.542857,0.006268,CPU
55,2022-06-10T12:48:32,Swift,613796,parallel,2,1x1x1,mkl,intelmpi,cores,4,...,-1.27E+03,35,0:58:22,1,1,virtual,3502.0,100.057143,0.009994,CPU
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1623,Thu Nov 18 18:03:46 MST 2021,Swift,23864,parallel,2,1x1x1,openmpi,openmpi,rank,1,...,-1271.6723,35,3:22:39,1,D,half,12159.0,347.400000,0.002879,CPU
1638,Tue Mar 1 14:16:41 MST 2022,Swift,294968,parallel,2,1x1x1,openmpi,openmpi,rank,1,...,-1.27E+03,35,3:22:31,1,D,half,12151.0,347.171429,0.002880,CPU
1685,Thu Nov 18 19:36:06 MST 2021,Swift,23866,parallel,2,1x1x1,mkl,intelmpi,rank,1,...,-1271.6723,36,4:52:25,1,D,partial,17545.0,487.361111,0.002052,CPU
1703,2021-11-06T15:16:32,Swift,15912,nosmt,2,1x1x1,opnmpi,openmpi,rank,1,...,-1271.6723,35,4:30:31,1,D,partial,16231.0,463.742857,0.002156,CPU


<IPython.core.display.Javascript object>

## Using Different CPUs/Node

<strong>Shared Nodes:</strong> On Swift, it's possible to run multiple jobs on the same node. To simulate VASP runtimes for this scenario, we ran two of the same VASP jobs on the same nodes at the same time. Each job ran on 32 CPUs/node, so both jobs together ran on 64 CPUs/node. This data is represented as "shared" in the graphs below. Each "shared" data point represents the runtime to complete both of the jobs (ran simultaneousely and finished around the same time) on only the number of cores used for a single job. For example, the runtime for "32 cores" represents the total time to run both of the jobs simultaneousely on 64 total CPUs, but "32 cores" corresponds to the CPUs allocated to a single job. 

Additionally, we ran Benchmark 2 on Swift using 32 CPUs/Node, 64 CPUs/Node and 128 CPU/Node. On Swift, each node has 64 physical cores, and each core is subdivided into two virtual cores in a processes that is identical to hyperthreading. Because of this, up 128 cores can be requested from a single Swift node, but each core will only represent half of a physical core. 

In [226]:
data_swift_2_open = data_swift_2[data_swift_2["MPI"] == "openmpi"]
data_swift_2_open = data_swift_2_open[data_swift_2_open["node_fill"] != "partial"]

# Sort dataframe to get correct colors and order for final plots
def get_order(row):
    if row["node_fill"] == "full":
        return "b"
    elif row["node_fill"] == "half":
        return "a"
    elif row["node_fill"] == "virtual":
        return "c"
    elif row["node_fill"] == "shared":
        return "d"


data_swift_2_open["plot_order"] = data_swift_2_open.apply(get_order, axis=1)
data_swift_2_open = data_swift_2_open.sort_values(by=["plot_order"])

fig = px.scatter(
    data_swift_2_open,
    x="Nodes",
    y="scaled_rate",
    color="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
        "time_per_step",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift with Open MPI",
)
fig.show()

print(
    "'half'=32 CPUs/node, 'full'=64 CPUs/node', 'shared'=2 jobs on the same nodes, 32 CPUs/node/job, 'virtual'=128 CPUs/node"
)

'half'=32 CPUs/node, 'full'=64 CPUs/node', 'shared'=2 jobs on the same nodes, 32 CPUs/node/job, 'virtual'=128 CPUs/node


<IPython.core.display.Javascript object>

In [229]:
data_swift_2_intel = data_swift_2[data_swift_2["MPI"] != "openmpi"]
data_swift_2_intel = data_swift_2_intel[data_swift_2_intel["node_fill"] != "partial"]
data_swift_2_intel = data_swift_2_intel[data_swift_2_intel["Cores"] < 600]

# Sort dataframe to get correct colors and order for final plots
def get_order(row):
    if row["node_fill"] == "full":
        return "b"
    elif row["node_fill"] == "half":
        return "a"
    elif row["node_fill"] == "virtual":
        return "c"
    elif row["node_fill"] == "shared":
        return "d"


data_swift_2_intel["plot_order"] = data_swift_2_intel.apply(get_order, axis=1)
data_swift_2_intel = data_swift_2_intel.sort_values(by=["plot_order"])


fig = px.scatter(
    data_swift_2_intel,
    x="Cores",
    y="scaled_rate",
    color="node_fill",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
        "time_per_step",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift with Intel MPI",
)
fig.show()
print(
    "'half'=32 CPUs/node, 'full'=64 CPUs/node', 'shared'=2 jobs on the same nodes, 32 CPUs/node/job, 'virtual'=128 CPUs/node"
)

'half'=32 CPUs/node, 'full'=64 CPUs/node', 'shared'=2 jobs on the same nodes, 32 CPUs/node/job, 'virtual'=128 CPUs/node


<IPython.core.display.Javascript object>

In [111]:
from tabulate import tabulate

table_tps = [
    [
        " ",
        "32 CPUs",
        "64 CPUs",
        "128 CPUs",
        "256 CPUs",
        "512 CPUs",
    ],
    [
        "Shared Nodes",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step', cores=32):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step', cores=64):.2f}",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        "",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=512):.2f}",
    ],
    [
        "128 CPUs/Node",
        "",
        "",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', cores=512):.2f}",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=32):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=512):.2f}",
    ],
    [
        "Shared / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step', cores=32) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=64)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step', cores=64) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=128)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        "",
        "",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', cores=128) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', cores=256) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=256):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', cores=512) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=512):.2f}",
    ],
    [
        "32 CPUs / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=64) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=64):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=128) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=256) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=256):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', cores=512) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', cores=512):.2f}",
    ],
]

print("Average Runtime per Electronic Step (s) for Intel MPI")
print(tabulate(table_tps, headers="firstrow", tablefmt="fancy_grid"))

table_tps_nodes = [
    [
        " ",
        "1 Node",
        "2 Nodes",
        "4 Nodes",
        "8 Nodes",
    ],
    [
        "Shared Nodes",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step', nodes=2):.2f}",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=8):.2f}",
    ],
    [
        "128 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', nodes=8):.2f}",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=8):.2f}",
    ],
    [
        "Shared / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step', nodes=1) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=1)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'time_per_step',nodes=2) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=2)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', nodes=1) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', nodes=2) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'time_per_step', nodes=4) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=4):.2f}",
        "",
    ],
    [
        "32 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=1) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=2) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=4) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=4):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'time_per_step', nodes=8) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'time_per_step', nodes=8):.2f}",
    ],
]

print("Average Runtime per Electronic Step (s) for Intel MPI")
print(tabulate(table_tps_nodes, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime per Electronic Step (s) for Intel MPI
╒════════════════════╤═══════════╤═══════════╤════════════╤════════════╤════════════╕
│                    │ 32 CPUs   │ 64 CPUs   │   128 CPUs │ 256 CPUs   │ 512 CPUs   │
╞════════════════════╪═══════════╪═══════════╪════════════╪════════════╪════════════╡
│ Shared Nodes       │           │ 660.32    │     310.68 │            │            │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 64 CPUs/Node       │           │ 377.89    │     228.62 │ 144.25     │ 118.21     │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 128 CPUs/Node      │           │           │     373.22 │ 255.42     │ 199.94     │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 32 CPUs/Node       │ 524.06    │ 331.06    │     183.81 │ 120.92     │ nan        │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤



Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars



<IPython.core.display.Javascript object>

In [112]:
table_tps = [
    [
        " ",
        "32 CPUs",
        "64 CPUs",
        "128 CPUs",
        "256 CPUs",
        "512 CPUs",
    ],
    [
        "Shared Nodes",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step', cores=32):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step', cores=64):.2f}",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        "",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=512):.2f}",
    ],
    [
        "128 CPUs/Node",
        "",
        "",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', cores=512):.2f}",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', cores=32):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', cores=256):.2f}",
        "",
    ],
    [
        "Shared / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step', cores=32) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=64)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step', cores=64) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=128)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        "",
        "",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', cores=128) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', cores=256) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=256):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', cores=512) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=512):.2f}",
    ],
    [
        "32 CPUs / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', cores=64) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=64):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', cores=128) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', cores=256) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', cores=256):.2f}",
        "",
    ],
]

print("Average Runtime per Electronic Step (s) for Open MPI")
print(tabulate(table_tps, headers="firstrow", tablefmt="fancy_grid"))

table_tps_nodes = [
    [
        " ",
        "1 Node",
        "2 Nodes",
        "4 Nodes",
        "8 Nodes",
    ],
    [
        "Shared Nodes",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step', nodes=2):.2f}",
        "",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=8):.2f}",
    ],
    [
        "128 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', nodes=4):.2f}",
        "",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=8):.2f}",
    ],
    [
        "Shared / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step', nodes=1) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=1)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'time_per_step',nodes=2) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=2)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', nodes=1) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', nodes=2) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'time_per_step', nodes=4) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=4):.2f}",
    ],
    [
        "32 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=1) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=2) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=4) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=4):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'time_per_step', nodes=8) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'time_per_step', nodes=8):.2f}",
    ],
]

print("Average Runtime per Electronic Step (s) for Open MPI")
print(tabulate(table_tps, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime per Electronic Step (s) for Open MPI
╒════════════════════╤═══════════╤═══════════╤════════════╤════════════╤════════════╕
│                    │ 32 CPUs   │ 64 CPUs   │   128 CPUs │ 256 CPUs   │ 512 CPUs   │
╞════════════════════╪═══════════╪═══════════╪════════════╪════════════╪════════════╡
│ Shared Nodes       │           │ 653.40    │     408.44 │            │            │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 64 CPUs/Node       │           │ 495.66    │     427.5  │ 341.79     │ 318.65     │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 128 CPUs/Node      │           │           │     637.58 │ 563.75     │ 570.36     │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 32 CPUs/Node       │ 566.73    │ 395.92    │     314.46 │ 234.17     │            │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│

<IPython.core.display.Javascript object>

In [113]:
table_rt = [
    [
        " ",
        "32 CPUs",
        "64 CPUs",
        "128 CPUs",
        "256 CPUs",
        "512 CPUs",
    ],
    [
        "Shared Nodes",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)', cores=32):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)', cores=64):.2f}",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        "",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=256):.2f}",
    ],
    [
        "128 CPUs/Node",
        "",
        "",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', cores=512):.2f}",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', cores=32):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', cores=256):.2f}",
        "",
    ],
    [
        "Shared / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)', cores=32) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=64)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)', cores=64) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=128)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        "",
        "",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', cores=128) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', cores=256) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=256):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', cores=512) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=512):.2f}",
    ],
    [
        "32 CPUs / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', cores=64) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=64):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', cores=128) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', cores=256) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', cores=256):.2f}",
        "",
    ],
]

print("Average Runtime (s) for Intel MPI")
print(tabulate(table_rt, headers="firstrow", tablefmt="fancy_grid"))

table_rt_nodes = [
    [
        " ",
        "1 Node",
        "2 Nodes",
        "4 Nodes",
        "8 Nodes",
    ],
    [
        "Shared Nodes",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=8):.2f}",
    ],
    [
        "128 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
        "",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=8):.2f}",
    ],
    [
        "Shared / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)', nodes=1) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=1)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'intelmpi', 'Runtime(s)',nodes=2) / (2*get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=2)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', nodes=1) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', nodes=2) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'intelmpi', 'Runtime(s)', nodes=4) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
        "",
    ],
    [
        "32 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=1) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=2) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=4) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=4):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'intelmpi', 'Runtime(s)', nodes=8) / get_average_runtime('Swift', 2, 'full', 'intelmpi', 'Runtime(s)', nodes=8):.2f}",
    ],
]

print("Average Runtime (s) for Intel MPI")
print(tabulate(table_rt_nodes, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime (s) for Intel MPI
╒════════════════════╤═══════════╤═══════════╤════════════╤════════════╤════════════╕
│                    │ 32 CPUs   │ 64 CPUs   │   128 CPUs │ 256 CPUs   │ 512 CPUs   │
╞════════════════════╪═══════════╪═══════════╪════════════╪════════════╪════════════╡
│ Shared Nodes       │           │ 23771.67  │   11183    │            │            │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 64 CPUs/Node       │           │ 13226.00  │    8077.33 │ 5942.00    │ 5942.00    │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 128 CPUs/Node      │           │           │   13314    │ 9612.00    │ 6998.00    │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 32 CPUs/Node       │ 18866.00  │ 11918.00  │    6504    │ 4338.00    │            │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ Shared / 64 CPUs  

<IPython.core.display.Javascript object>

In [114]:
table_rt = [
    [
        " ",
        "32 CPUs",
        "64 CPUs",
        "128 CPUs",
        "256 CPUs",
        "512 CPUs",
    ],
    [
        "Shared Nodes",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', cores=32):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', cores=64):.2f}",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        "",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=512):.2f}",
    ],
    [
        "128 CPUs/Node",
        "",
        "",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', cores=256):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', cores=512):.2f}",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', cores=32):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', cores=64):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', cores=128):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', cores=256):.2f}",
        "",
    ],
    [
        "Shared / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', cores=32) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=64)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', cores=64) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=128)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        "",
        "",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', cores=128) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', cores=256) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=256):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', cores=512) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=512):.2f}",
    ],
    [
        "32 CPUs / 64 CPUs",
        "",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', cores=64) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=64):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', cores=128) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=128):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', cores=256) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', cores=256):.2f}",
        "",
    ],
]

print("Average Runtime (s) for Open MPI")
print(tabulate(table_rt, headers="firstrow", tablefmt="fancy_grid"))

table_rt = [
    [
        " ",
        "1 Node",
        "2 Nodes",
        "4 Nodes",
        "8 Nodes",
    ],
    [
        "Shared Nodes",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        "",
        "",
    ],
    [
        "64 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=8):.2f}",
    ],
    [
        "128 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', nodes=4):.2f}",
        "",
    ],
    [
        "32 CPUs/Node",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=4):.2f}",
        f"{2*get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=8):.2f}",
        "",
    ],
    [
        "Shared / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', nodes=1) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=1)):.2f}",
        f"{get_average_runtime('Swift', 2, 'shared', 'openmpi', 'Runtime(s)', nodes=2) / (2*get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=2)):.2f}",
        "",
        "",
    ],
    [
        "128 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', nodes=1) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', nodes=2) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'virtual', 'openmpi', 'Runtime(s)', nodes=4) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=4):.2f}",
        "",
    ],
    [
        "32 CPUs / 64 CPUs",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=1) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=1):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=2) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=2):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=4) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=4):.2f}",
        f"{get_average_runtime('Swift', 2, 'half', 'openmpi', 'Runtime(s)', nodes=8) / get_average_runtime('Swift', 2, 'full', 'openmpi', 'Runtime(s)', nodes=8):.2f}",
    ],
]

print("Average Runtime (s) for Open MPI")
print(tabulate(table_rt, headers="firstrow", tablefmt="fancy_grid"))

Average Runtime (s) for Open MPI
╒════════════════════╤═══════════╤═══════════╤════════════╤════════════╤════════════╕
│                    │ 32 CPUs   │ 64 CPUs   │   128 CPUs │ 256 CPUs   │ 512 CPUs   │
╞════════════════════╪═══════════╪═══════════╪════════════╪════════════╪════════════╡
│ Shared Nodes       │           │ 22869.17  │   14295.5  │            │            │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 64 CPUs/Node       │           │ 17348.00  │   14962.7  │ 11962.67   │ 11152.67   │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 128 CPUs/Node      │           │           │   22315.3  │ 19731.33   │ 19962.67   │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ 32 CPUs/Node       │ 19835.56  │ 13857.33  │   11006    │ 8196.00    │            │
├────────────────────┼───────────┼───────────┼────────────┼────────────┼────────────┤
│ Shared / 64 CPUs   

<IPython.core.display.Javascript object>

### MPI on Swift with Benchmark 2
We ran Benchmark 2 using Intel MPI and Open MPI and compared the performance. The two versions were compiled as described at the top of this document and run with the respective MPI. 

In [115]:
data_swift_2_full = data_swift_2[data_swift_2["node_fill"] == "full"]
data_swift_2_half = data_swift_2[data_swift_2["node_fill"] == "half"]

<IPython.core.display.Javascript object>

In [116]:
fig = px.scatter(
    data_swift_2_full,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: Intel MPI vs. Open MPI using Full Nodes (64 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [117]:
proportion = get_scaling(data_swift_2_full, "MPI", "intelmpi", "openmpi")
print(
    f"On average, calculations running on full nodes using Intel MPI run in {proportion:.2f} the amount of time as calculations using OpenMPI",
)

On average, calculations running on full nodes using Intel MPI run in 0.52 the amount of time as calculations using OpenMPI


<IPython.core.display.Javascript object>

In [118]:
fig = px.scatter(
    data_swift_2_half,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: Intel MPI vs. Open MPI using half-full nodes (32 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [119]:
proportion = get_scaling(data_swift_2_half, "MPI", "intelmpi", "openmpi")
print(
    f"On average, calculations running on half-filled nodes using Intel MPI run in {proportion:.2f} the amount of time as calculations using OpenMPI",
)

On average, calculations running on half-filled nodes using Intel MPI run in 0.72 the amount of time as calculations using OpenMPI


<IPython.core.display.Javascript object>

In [120]:
data_swift_2_virtual = data_swift_2[data_swift_2["node_fill"] == "virtual"]

fig = px.scatter(
    data_swift_2_virtual,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: Intel MPI vs. Open MPI using virtual cores (128 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [121]:
proportion = get_scaling(data_swift_2_virtual, "MPI", "intelmpi", "openmpi")
print(
    f"On average, calculations running on virtual cores using Intel MPI run in {proportion:.2f} the amount of time as calculations using OpenMPI",
)

On average, calculations running on virtual cores using Intel MPI run in 0.46 the amount of time as calculations using OpenMPI


<IPython.core.display.Javascript object>

In [122]:
data_swift_2_shared = data_swift_2[data_swift_2["node_fill"] == "shared"]

fig = px.scatter(
    data_swift_2_shared,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Bench. 2 on Swift: Intel MPI vs. Open MPI Using Shared Nodes (2 jobs on same nodes, 32 CPUs/node/job)",
)
fig.show()

<IPython.core.display.Javascript object>

In [123]:
proportion = get_scaling(data_swift_2_shared, "MPI", "intelmpi", "openmpi")
print(
    f"On average, calculations running on shared nodes using Intel MPI run in {proportion:.2f} the amount of time as calculations using OpenMPI",
)

On average, calculations running on shared nodes using Intel MPI run in 0.89 the amount of time as calculations using OpenMPI


<IPython.core.display.Javascript object>

### cpu-bind on Swift with Benchmark 2
The --cpu-bind flag changes how tasks are assigned to cores throughout the node. We ran Benchmark 2 on Swift with cpu-bind=cores and cpu-bind=rank to compare the performance to running without setting cpu-bind.

In [124]:
data_swift_2_full_intel = data_swift_2_full[data_swift_2_full["MPI"] != "openmpi"]

fig = px.scatter(
    data_swift_2_full_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: cpu-bind using Intel MPI on Full Nodes (64 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [125]:
proportion_rank = get_scaling(data_swift_2_full_intel, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on full nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_full_intel, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on full nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on full nodes with --cpu-bind=rank run in 1.03 the amount of time as calculations without --cpu-bind.

On average, calculations on full nodes with --cpu-bind=cores run in 1.00 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [126]:
data_swift_2_full_open = data_swift_2_full[data_swift_2_full["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_2_full_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: cpu-bind using Open MPI on Full Nodes (64 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [127]:
proportion_rank = get_scaling(data_swift_2_full_open, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on full nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_full_open, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on full nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on full nodes with --cpu-bind=rank run in 1.01 the amount of time as calculations without --cpu-bind.

On average, calculations on full nodes with --cpu-bind=cores run in 1.03 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [210]:
data_swift_2_half_open = data_swift_2_half[data_swift_2_half["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_2_half_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: cpu-bind using Open MPI on Half-Full Nodes (32 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [129]:
proportion_rank = get_scaling(data_swift_2_half_open, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on half-filled nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_half_open, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on half-filled nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on half-filled nodes with --cpu-bind=rank run in 1.40 the amount of time as calculations without --cpu-bind.

On average, calculations on half-filled nodes with --cpu-bind=cores run in 1.06 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [215]:
data_swift_2_half_intel = data_swift_2_half[data_swift_2_half["MPI"] == "intelmpi"]

fig = px.scatter(
    data_swift_2_half_intel,
    x="Nodes",
    # y="scaled_rate",
    y="Runtime(s)",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: cpu-bind using Intel MPI on Half-Full Nodes (32 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [131]:
proportion_rank = get_scaling(data_swift_2_half_intel, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on half-filled nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_half_intel, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on half-filled nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on half-filled nodes with --cpu-bind=rank run in 1.35 the amount of time as calculations without --cpu-bind.

On average, calculations on half-filled nodes with --cpu-bind=cores run in 1.12 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [132]:
data_swift_2_shared_open = data_swift_2_shared[data_swift_2_shared["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_2_shared_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Bench. 2 on Swift: cpu-bind using Open MPI on Shared Nodes (2 jobs on same nodes, 32 CPUs/node/job)",
)
fig.show()

<IPython.core.display.Javascript object>

In [133]:
proportion_rank = get_scaling(data_swift_2_shared_open, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on shared nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_shared_open, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on shared nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on shared nodes with --cpu-bind=rank run in 0.97 the amount of time as calculations without --cpu-bind.

On average, calculations on shared nodes with --cpu-bind=cores run in 0.97 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [134]:
data_swift_2_shared_intel = data_swift_2_shared[
    data_swift_2_shared["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_swift_2_shared_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Bench. 2 on Swift: cpu-bind using Intel MPI on Shared Nodes (2 jobs on same nodes, 32 CPUs/node/job)",
)
fig.show()

<IPython.core.display.Javascript object>

In [135]:
proportion_rank = get_scaling(data_swift_2_shared_intel, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on shared nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_shared_intel, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on shared nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on shared nodes with --cpu-bind=rank run in 0.99 the amount of time as calculations without --cpu-bind.

On average, calculations on shared nodes with --cpu-bind=cores run in 0.99 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [136]:
data_swift_2_virtual_intel = data_swift_2_virtual[
    data_swift_2_virtual["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_swift_2_virtual_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: cpu-bind using Intel MPI on Virtual Cores (128 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [137]:
proportion_rank = get_scaling(data_swift_2_virtual_intel, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on shared nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_virtual_intel, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on shared nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on shared nodes with --cpu-bind=rank run in 0.99 the amount of time as calculations without --cpu-bind.

On average, calculations on shared nodes with --cpu-bind=cores run in 1.02 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [138]:
data_swift_2_virtual_open = data_swift_2_virtual[
    data_swift_2_virtual["MPI"] == "openmpi"
]

fig = px.scatter(
    data_swift_2_virtual_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 2 on Swift: cpu-bind using Open MPI on Virtual Cores (128 CPUs/node)",
)
fig.show()

<IPython.core.display.Javascript object>

In [139]:
proportion_rank = get_scaling(data_swift_2_virtual_open, "cpu-bind", "rank", "none")
print(
    f"On average, calculations on shared nodes with --cpu-bind=rank run in {proportion_rank:.2f} the amount of time as calculations without --cpu-bind.",
)

proportion_cores = get_scaling(data_swift_2_virtual_open, "cpu-bind", "cores", "none")
print(
    f"\nOn average, calculations on shared nodes with --cpu-bind=cores run in {proportion_cores:.2f} the amount of time as calculations without --cpu-bind.",
)

On average, calculations on shared nodes with --cpu-bind=rank run in 0.99 the amount of time as calculations without --cpu-bind.

On average, calculations on shared nodes with --cpu-bind=cores run in 1.01 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

## Benchmark 1 on Swift
Benchmark 1 is a system of 16 atoms (Cu<sub>4</sub>In<sub>4</sub>Se<sub>8</sub>). Because it is smalled than Benchmark 2, Benchmark 1 was used to explore kpoints scaling as well as changes in performance due to the KPAR and NPAR tags.

In [140]:
data_swift_1 = data_swift[data_swift["Benchmark Code"] == "1"]

data_swift_1_4x4x2 = data_swift_1[data_swift_1["kpoints"] == "4x4x2"]
data_swift_1_10x10x5 = data_swift_1[data_swift_1["kpoints"] == "10x10x5"]

<IPython.core.display.Javascript object>

### KPOINTS Scaling on Swift

To get a good idea of scaling between calculations with 4x4x2 kpoints grids and 10x10x5 kpoints grids, look at the graphs in the following section. Each graph is followed by an average kpoints scaling for the given KPAR/NPAR configuration, which gives the average amount of time needed to run a 4x4x2 calculation expressed as a percentage of the time needed to run a 10x10x5 calculation on the same number of cores. The averages are compiled in tables at the end of the next section to compare kpoints scaling across KPAR/NPAR combinations. 

### Changing KPAR and NPAR on Swift

KPAR and NPAR are both tags that determine the parallelization scheme for VASP. KPAR determines the number of groups across which kpoint calculations are divided, and one kpoint from each group is worked on by the machine at a time. NPAR determines the number of bands that are treated at the same time.

By default, VASP sets KPAR = 1 and NPAR = # of cores. This is represented in the first graph and is used as a baseline to determine runtime improvements from using other combinations of KPAR and NPAR. VASP ESIF Benchmark 1 is conigured to be run with KPAR = 9 and NPAR = 4, and in this analysis, we consider combinations of KPAR = 1,4,9 and NPAR = 1, # of cores, sqrt(# of cores).

The graphs in this section show the performance of each KPAR/NPAR combination at different node counts. Each graph is followed by values that give the average runtime for the given KPAR/NPAP configuration expressed as a percentage of the default KPAR/NPAR configuration (KPAR=1, NPAR=# of cores).

kpoints scaling for the given KPAR/NPAR configuration, which gives the average amount of time needed to run a 4x4x2 calculation expressed as a percentage of the time needed to run a 10x10x5 calculation on the same number of cores. The averages are compiled in tables at the end of the next section to compare kpoints scaling across KPAR/NPAR combinations.

In [141]:
data_swift_1_kpar = data_swift_1[data_swift_1["node_fill"] == "full"]

data_swift_1_K1 = data_swift_1_kpar[data_swift_1_kpar["KPAR"] == 1]
data_swift_1_K4 = data_swift_1_kpar[data_swift_1_kpar["KPAR"] == 4]
data_swift_1_K8 = data_swift_1_kpar[data_swift_1_kpar["KPAR"] == 8]
data_swift_1_K9 = data_swift_1_kpar[data_swift_1_kpar["KPAR"] == 9]

data_swift_1_K1_N4 = data_swift_1_K1[data_swift_1_K1["NPAR"] == "4"]
data_swift_1_K1_ND = data_swift_1_K1[data_swift_1_K1["NPAR"] == "D"]
data_swift_1_K1_Nsqrt = data_swift_1_K1[data_swift_1_K1["NPAR"] == "sqrt"]
data_swift_1_K4_ND = data_swift_1_K4[data_swift_1_K4["NPAR"] == "D"]
data_swift_1_K8_N4 = data_swift_1_K8[data_swift_1_K8["NPAR"] == "4"]
data_swift_1_K9_N4 = data_swift_1_K9[data_swift_1_K9["NPAR"] == "4"]
data_swift_1_K4_Nsqrt = data_swift_1_K4[data_swift_1_K4["NPAR"] == "sqrt"]
data_swift_1_K4_N4 = data_swift_1_K4[data_swift_1_K4["NPAR"] == "4"]
data_swift_1_K9_Nsqrt = data_swift_1_K9[data_swift_1_K9["NPAR"] == "sqrt"]

<IPython.core.display.Javascript object>

In [142]:
data_swift_1_K1_ND_intel = data_swift_1_K1_ND[data_swift_1_K1_ND["MPI"] == "intelmpi"]

fig = px.scatter(
    data_swift_1_K1_ND_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Nodes",
        "Partition",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=1, NPAR=# of cores with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [143]:
data_swift_1_K1_ND_intel = data_swift_1_K1_ND[data_swift_1_K1_ND["MPI"] == "intelmpi"]

proportion_swift_1_K1_ND_intel = get_scaling(
    data_swift_1_K1_ND_intel, "kpoints", "4x4x2", "10x10x5"
)
proportion_swift_1_K1_ND_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K1_ND_intel, data_swift_1_K1_ND_intel, "4x4x2"
)
proportion_swift_1_K1_ND_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K1_ND_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K1_ND_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2020 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [144]:
data_swift_1_K1_ND_open = data_swift_1_K1_ND[data_swift_1_K1_ND["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_1_K1_ND_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Nodes",
        "Partition",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=1, NPAR=# of cores with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [145]:
data_swift_1_K1_ND_open = data_swift_1_K1_ND[data_swift_1_K1_ND["MPI"] == "openmpi"]

proportion_swift_1_K1_ND_open = get_scaling(
    data_swift_1_K1_ND_open, "kpoints", "4x4x2", "10x10x5"
)
proportion_swift_1_K1_ND_open_4x4x2 = get_scaling_KN(
    data_swift_1_K1_ND_open, data_swift_1_K1_ND_open, "4x4x2"
)
proportion_swift_1_K1_ND_open_10x10x5 = get_scaling_KN(
    data_swift_1_K1_ND_open, data_swift_1_K1_ND_open, "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K1_ND_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.1893 the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.


<IPython.core.display.Javascript object>

In [146]:
data_swift_1_K1_N4_intel = data_swift_1_K1_N4[data_swift_1_K1_N4["MPI"] == "intelmpi"]

fig = px.scatter(
    data_swift_1_K1_N4_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=1, NPAR=4 with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [147]:
proportion_swift_1_K1_N4_intel = get_scaling(
    data_swift_1_K1_N4_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K1_N4_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2013 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [148]:
proportion_swift_1_K1_N4_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K1_N4_intel, data_swift_1_K1_ND_intel, "4x4x2"
)
proportion_swift_1_K1_N4_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K1_N4_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 {proportion_swift_1_K1_N4_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 {proportion_swift_1_K1_N4_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 0.2822 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 0.3203 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [149]:
data_swift_1_K1_N4_open = data_swift_1_K1_N4[data_swift_1_K1_N4["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_1_K1_N4_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=1, NPAR=4 with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [150]:
proportion_swift_1_K1_N4_open = get_scaling(
    data_swift_1_K1_N4_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K1_N4_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.1984 the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.


<IPython.core.display.Javascript object>

In [151]:
proportion_swift_1_K1_N4_open_4x4x2 = get_scaling_KN(
    data_swift_1_K1_N4_open, data_swift_1_K1_ND_open, "4x4x2"
)
proportion_swift_1_K1_N4_open_10x10x5 = get_scaling_KN(
    data_swift_1_K1_N4_open, data_swift_1_K1_ND_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 {proportion_swift_1_K1_N4_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 {proportion_swift_1_K1_N4_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 0.4405 the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 0.4250 the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.


<IPython.core.display.Javascript object>

In [152]:
data_swift_1_K1_Nsqrt_intel = data_swift_1_K1_Nsqrt[
    data_swift_1_K1_Nsqrt["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_swift_1_K1_Nsqrt_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=1, NPAR=sqrt with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [153]:
proportion_swift_1_K1_Nsqrt_intel = get_scaling(
    data_swift_1_K1_Nsqrt_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K1_Nsqrt_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations",
)

On average, 4x4x2 kpoints grid calculations run in 0.2009 the amount of time as the 10x10x5 kpoints grid calculations


<IPython.core.display.Javascript object>

In [154]:
proportion_swift_1_K1_Nsqrt_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K1_Nsqrt_intel, data_swift_1_K1_ND_intel, "4x4x2"
)
proportion_swift_1_K1_Nsqrt_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K1_Nsqrt_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) {proportion_swift_1_K1_Nsqrt_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) {proportion_swift_1_K1_Nsqrt_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) 0.8569 the amount of time needed for calculations with the default KPAR/NPAR settings.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) 1.0067 the amount of time needed for calculations with the default KPAR/NPAR settings.


<IPython.core.display.Javascript object>

In [155]:
data_swift_1_K1_Nsqrt_open = data_swift_1_K1_Nsqrt[
    data_swift_1_K1_Nsqrt["MPI"] == "openmpi"
]

fig = px.scatter(
    data_swift_1_K1_Nsqrt_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=1, NPAR=sqrt with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [156]:
proportion_swift_1_K1_Nsqrt_open = get_scaling(
    data_swift_1_K1_Nsqrt_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K1_Nsqrt_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations",
)

On average, 4x4x2 kpoints grid calculations run in 0.1921 the amount of time as the 10x10x5 kpoints grid calculations


<IPython.core.display.Javascript object>

In [157]:
proportion_swift_1_K1_Nsqrt_open_4x4x2 = get_scaling_KN(
    data_swift_1_K1_Nsqrt_open, data_swift_1_K1_ND_open, "4x4x2"
)
proportion_swift_1_K1_Nsqrt_open_10x10x5 = get_scaling_KN(
    data_swift_1_K1_Nsqrt_open, data_swift_1_K1_ND_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=# of cores {proportion_swift_1_K1_Nsqrt_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt# of cores {proportion_swift_1_K1_Nsqrt_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=# of cores 0.8713 the amount of time needed for calculations with the default KPAR/NPAR settings.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt# of cores 0.8207 the amount of time needed for calculations with the default KPAR/NPAR settings.


<IPython.core.display.Javascript object>

In [158]:
data_swift_1_K9_N4_intel = data_swift_1_K9_N4[data_swift_1_K9_N4["MPI"] == "intelmpi"]

fig = px.scatter(
    data_swift_1_K9_N4_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=9, NPAR=4 with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [159]:
proportion_swift_1_K9_N4_intel = get_scaling(
    data_swift_1_K9_N4_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K9_N4_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations",
)

On average, 4x4x2 kpoints grid calculations run in 0.4100 the amount of time as the 10x10x5 kpoints grid calculations


<IPython.core.display.Javascript object>

In [160]:
proportion_swift_1_K9_N4_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K9_N4_intel, data_swift_1_K1_ND_intel, "4x4x2"
)
proportion_swift_1_K9_N4_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K9_N4_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_N4_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_N4_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.2384 the amount of time needed for calculations with the default KPAR/NPAR settings.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.0592 the amount of time needed for calculations with the default KPAR/NPAR settings.


<IPython.core.display.Javascript object>

In [161]:
data_swift_1_K9_N4_open = data_swift_1_K9_N4[data_swift_1_K9_N4["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_1_K9_N4_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=9, NPAR=4 with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [162]:
proportion_swift_1_K9_N4_open = get_scaling(
    data_swift_1_K9_N4_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K9_N4_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.3866 the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.


<IPython.core.display.Javascript object>

In [163]:
proportion_swift_1_K9_N4_open_4x4x2 = get_scaling_KN(
    data_swift_1_K9_N4_open, data_swift_1_K1_ND_open, "4x4x2"
)
proportion_swift_1_K9_N4_open_10x10x5 = get_scaling_KN(
    data_swift_1_K9_N4_open, data_swift_1_K1_ND_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_N4_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_N4_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.1485 the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.0915 the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.


<IPython.core.display.Javascript object>

In [164]:
data_swift_1_K8_N4_intel = data_swift_1_K8_N4[data_swift_1_K8_N4["MPI"] == "intelmpi"]

# sort data frame to get correct colors
def get_order(row):
    if row["kpoints"] == "4x4x2":
        return "a"
    if row["kpoints"] == "10x10x5":
        return "b"
    if row["kpoints"] == "8x8x4":
        return "c"


data_swift_1_K8_N4_intel["color_order"] = data_swift_1_K8_N4_intel.apply(
    get_order, axis=1
)
data_swift_1_K8_N4_intel.sort_values(by=["color_order"], inplace=True)

fig = px.scatter(
    data_swift_1_K8_N4_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=8, NPAR=4 with Intel MPI",
)
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



<IPython.core.display.Javascript object>

In [166]:
proportion_swift_1_K8_N4_intel = get_scaling(
    data_swift_1_K8_N4_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K8_N4_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations",
)

proportion_swift_1_K8_N4_intel_8x8x4 = get_scaling(
    data_swift_1_K8_N4_intel, "kpoints", "4x4x2", "8x8x4"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K8_N4_intel_8x8x4:.4f} the amount of time as the 8x8x4 kpoints grid calculations",
)

print(f"16/500={16/500}")
print(f"16/256={16/256}")

On average, 4x4x2 kpoints grid calculations run in 0.2892 the amount of time as the 10x10x5 kpoints grid calculations
On average, 4x4x2 kpoints grid calculations run in 0.3809 the amount of time as the 8x8x4 kpoints grid calculations
16/500=0.032
16/256=0.0625


<IPython.core.display.Javascript object>

In [167]:
proportion_swift_1_K8_N4_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K8_N4_intel, data_swift_1_K1_ND_intel, "4x4x2"
)
proportion_swift_1_K8_N4_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K8_N4_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=8, NPAR=4 {proportion_swift_1_K8_N4_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=8, NPAR=4 {proportion_swift_1_K8_N4_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=8, NPAR=4 0.1818 the amount of time needed for calculations with the default KPAR/NPAR settings.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=8, NPAR=4 0.0782 the amount of time needed for calculations with the default KPAR/NPAR settings.


<IPython.core.display.Javascript object>

In [168]:
data_swift_1_K8_N4_open = data_swift_1_K8_N4[data_swift_1_K8_N4["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_1_K8_N4_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=8, NPAR=4 with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [171]:
proportion_swift_1_K8_N4_open = get_scaling(
    data_swift_1_K8_N4_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K8_N4_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations",
)

proportion_swift_1_K8_N4_open_8x8x4 = get_scaling(
    data_swift_1_K8_N4_open, "kpoints", "4x4x2", "8x8x4"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K8_N4_open_8x8x4:.4f} the amount of time as the 8x8x4 kpoints grid calculations",
)

print(f"16/500={16/500}")
print(f"16/256={16/256}")

On average, 4x4x2 kpoints grid calculations run in 0.2868 the amount of time as the 10x10x5 kpoints grid calculations
On average, 4x4x2 kpoints grid calculations run in 0.3763 the amount of time as the 8x8x4 kpoints grid calculations
16/500=0.032
16/256=0.0625


<IPython.core.display.Javascript object>

In [172]:
proportion_swift_1_K8_N4_open_4x4x2 = get_scaling_KN(
    data_swift_1_K8_N4_open, data_swift_1_K1_ND_open, "4x4x2"
)
proportion_swift_1_K8_N4_open_10x10x5 = get_scaling_KN(
    data_swift_1_K8_N4_open, data_swift_1_K1_ND_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=8, NPAR=4 {proportion_swift_1_K8_N4_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=8, NPAR=4 {proportion_swift_1_K8_N4_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=8, NPAR=4 0.1198 the amount of time needed for calculations with the default KPAR/NPAR settings.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=8, NPAR=4 0.1256 the amount of time needed for calculations with the default KPAR/NPAR settings.


<IPython.core.display.Javascript object>

In [173]:
data_swift_1_K4_N4_intel = data_swift_1_K4_N4[data_swift_1_K4_N4["MPI"] == "intelmpi"]

fig = px.scatter(
    data_swift_1_K4_N4_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=4, NPAR=4 with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [174]:
proportion_swift_1_K4_N4_intel = get_scaling(
    data_swift_1_K4_N4_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K4_N4_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2185 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [175]:
proportion_swift_1_K4_N4_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K4_N4_intel, data_swift_1_K1_ND_intel, "4x4x2"
)
proportion_swift_1_K4_N4_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K4_N4_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_N4_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_N4_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.1180 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.0927 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [176]:
data_swift_1_K4_N4_open = data_swift_1_K4_N4[data_swift_1_K4_N4["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_1_K4_N4_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=4, NPAR=4 with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [177]:
proportion_swift_1_K4_N4_open = get_scaling(
    data_swift_1_K4_N4_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K4_N4_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2205 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [178]:
proportion_swift_1_K4_N4_open_4x4x2 = get_scaling_KN(
    data_swift_1_K4_N4_open, data_swift_1_K1_ND_open, "4x4x2"
)
proportion_swift_1_K4_N4_open_10x10x5 = get_scaling_KN(
    data_swift_1_K4_N4_open, data_swift_1_K1_ND_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_N4_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_N4_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.0876 the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.1018 the amount of time needed for calculations with the default KPAR/NPAR settings with Open MPI.


<IPython.core.display.Javascript object>

In [179]:
data_swift_1_K4_ND_intel = data_swift_1_K4_ND[data_swift_1_K4_ND["MPI"] == "intelmpi"]

fig = px.scatter(
    data_swift_1_K4_ND_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=4, NPAR=# of cores with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [180]:
proportion_swift_1_K4_ND_intel = get_scaling(
    data_swift_1_K4_ND_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K4_ND_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2127 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [181]:
proportion_swift_1_K4_ND_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K4_ND_intel, data_swift_1_K1_ND_intel, "4x4x2"
)
proportion_swift_1_K4_ND_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K4_ND_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_ND_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_ND_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.1470 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.0940 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [182]:
data_swift_1_K4_ND_open = data_swift_1_K4_ND[data_swift_1_K4_ND["MPI"] == "openmpi"]

fig = px.scatter(
    data_swift_1_K4_ND_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=4, NPAR=# of cores with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [183]:
proportion_swift_1_K4_ND_open = get_scaling(
    data_swift_1_K4_ND_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K4_ND_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2122 the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.


<IPython.core.display.Javascript object>

In [184]:
proportion_swift_1_K4_ND_open_4x4x2 = get_scaling_KN(
    data_swift_1_K4_ND_open, data_swift_1_K1_ND_open, "4x4x2"
)
proportion_swift_1_K4_ND_open_10x10x5 = get_scaling_KN(
    data_swift_1_K4_ND_open, data_swift_1_K1_ND_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_ND_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_ND_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.0977 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.1136 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [185]:
data_swift_1_K4_Nsqrt_intel = data_swift_1_K4_Nsqrt[
    data_swift_1_K4_Nsqrt["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_swift_1_K4_Nsqrt_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=4, NPAR=sqrt(# of cores) with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [186]:
proportion_swift_1_K4_Nsqrt_intel = get_scaling(
    data_swift_1_K4_Nsqrt_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K4_Nsqrt_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2092 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [187]:
proportion_swift_1_K4_Nsqrt_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K4_Nsqrt_intel, data_swift_1_K1_Nsqrt_intel, "4x4x2"
)
proportion_swift_1_K4_Nsqrt_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K4_Nsqrt_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_Nsqrt_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_Nsqrt_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.2451 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.1379 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [188]:
data_swift_1_K4_Nsqrt_open = data_swift_1_K4_Nsqrt[
    data_swift_1_K4_Nsqrt["MPI"] == "openmpi"
]

fig = px.scatter(
    data_swift_1_K4_Nsqrt_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=4, NPAR= sqrt(# of cores) with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [189]:
proportion_swift_1_K4_Nsqrt_open = get_scaling(
    data_swift_1_K4_Nsqrt_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K4_Nsqrt_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2110 the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.


<IPython.core.display.Javascript object>

In [190]:
proportion_swift_1_K4_Nsqrt_open_4x4x2 = get_scaling_KN(
    data_swift_1_K4_Nsqrt_open, data_swift_1_K1_Nsqrt_open, "4x4x2"
)
proportion_swift_1_K4_Nsqrt_open_10x10x5 = get_scaling_KN(
    data_swift_1_K4_Nsqrt_open, data_swift_1_K1_Nsqrt_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_ND_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K4_ND_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.0977 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.1136 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [191]:
data_swift_1_K9_Nsqrt_intel = data_swift_1_K9_Nsqrt[
    data_swift_1_K9_Nsqrt["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_swift_1_K9_Nsqrt_intel,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=9, NPAR=sqrt(# of cores) with Intel MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [192]:
proportion_swift_1_K9_Nsqrt_intel = get_scaling(
    data_swift_1_K9_Nsqrt_intel, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K9_Nsqrt_intel:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.2939 the amount of time as the 10x10x5 kpoints grid calculations with Intel MPI.


<IPython.core.display.Javascript object>

In [193]:
proportion_swift_1_K9_Nsqrt_intel_4x4x2 = get_scaling_KN(
    data_swift_1_K9_Nsqrt_intel, data_swift_1_K1_Nsqrt_intel, "4x4x2"
)
proportion_swift_1_K9_Nsqrt_intel_10x10x5 = get_scaling_KN(
    data_swift_1_K9_Nsqrt_intel, data_swift_1_K1_ND_intel, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_Nsqrt_intel_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_Nsqrt_intel_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.0336 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.0249 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [194]:
data_swift_1_K9_Nsqrt_open = data_swift_1_K9_Nsqrt[
    data_swift_1_K9_Nsqrt["MPI"] == "openmpi"
]

fig = px.scatter(
    data_swift_1_K9_Nsqrt_open,
    x="Cores",
    y="scaled_rate",
    color="kpoints",
    symbol="HPC System",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: KPAR=9, NPAR= sqrt(# of cores) with Open MPI",
)
fig.show()

<IPython.core.display.Javascript object>

In [195]:
proportion_swift_1_K9_Nsqrt_open = get_scaling(
    data_swift_1_K9_Nsqrt_open, "kpoints", "4x4x2", "10x10x5"
)
print(
    f"On average, 4x4x2 kpoints grid calculations run in {proportion_swift_1_K9_Nsqrt_open:.4f} the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.",
)

On average, 4x4x2 kpoints grid calculations run in 0.3860 the amount of time as the 10x10x5 kpoints grid calculations with Open MPI.


<IPython.core.display.Javascript object>

In [196]:
proportion_swift_1_K9_Nsqrt_open_4x4x2 = get_scaling_KN(
    data_swift_1_K9_Nsqrt_open, data_swift_1_K1_Nsqrt_open, "4x4x2"
)
proportion_swift_1_K9_Nsqrt_open_10x10x5 = get_scaling_KN(
    data_swift_1_K9_Nsqrt_open, data_swift_1_K1_Nsqrt_open, "10x10x5"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_Nsqrt_open_4x4x2:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 {proportion_swift_1_K9_Nsqrt_open_10x10x5:.4f} the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.",
)

On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.0176 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.

On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.0122 the amount of time needed for calculations with the default KPAR/NPAR settings with Intel MPI.


<IPython.core.display.Javascript object>

In [197]:
table_KPAR_NPAR_CPU = [
    [
        " ",
        "KPOINTS Scaling",
        "4x4x2 % of Default Runtime",
        "10x10x5 % of Default Runtime",
    ],
    [
        "KPAR=1, NPAR=# of cores",
        f"{100*proportion_swift_1_K1_ND_open:.2f}%",
        f"{100*proportion_swift_1_K1_ND_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K1_ND_open_10x10x5:.2f}%",
    ],
    [
        "KPAR=1, NPAR=4",
        f"{100*proportion_swift_1_K1_N4_open:.2f}%",
        f"{100*proportion_swift_1_K1_N4_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K1_N4_open_10x10x5:.2f}%",
    ],
    [
        "KPAR=1, NPAR=sqrt",
        f"{100*proportion_swift_1_K1_Nsqrt_open:.2f}%",
        f"{100*proportion_swift_1_K1_Nsqrt_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K1_Nsqrt_open_10x10x5:.2f}%",
    ],
    [
        "KPAR=4, NPAR=# of cores",
        f"{100*proportion_swift_1_K4_ND_open:.2f}%",
        f"{100*proportion_swift_1_K4_ND_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K4_ND_open_10x10x5:.2f}",
    ],
    [
        "KPAR=4, NPAR=sqrt",
        f"{100*proportion_swift_1_K4_Nsqrt_open:.2f}%",
        f"{100*proportion_swift_1_K4_Nsqrt_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K4_Nsqrt_open_10x10x5:.2f}%",
    ],
    [
        "KPAR=4, NPAR=4",
        f"{100*proportion_swift_1_K4_N4_open:.2f}%",
        f"{100*proportion_swift_1_K4_N4_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K4_N4_open_10x10x5:.2f}%",
    ],
    [
        "KPAR=8, NPAR=4",
        f"{100*proportion_swift_1_K8_N4_open:.2f}%",
        f"{100*proportion_swift_1_K8_N4_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K8_N4_open_10x10x5:.2f}%",
    ],
    [
        "KPAR=9, NPAR=4",
        f"{100*proportion_swift_1_K9_N4_open:.2f}%",
        f"{100*proportion_swift_1_K9_N4_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K9_N4_open_10x10x5:.2f}%",
    ],
    [
        "KPAR=9, NPAR=sqrt*",
        f"{100*proportion_swift_1_K9_Nsqrt_open:.2f}%",
        f"{100*proportion_swift_1_K9_Nsqrt_open_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K9_Nsqrt_open_10x10x5:.2f}%",
    ],
]

print("KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on CPUs with Open MPI")
print(tabulate(table_KPAR_NPAR_CPU, headers="firstrow", tablefmt="fancy_grid"))
print(
    "The 'KPOINTS Scaling' column gives the average 4x4x2 runtime as a percentage of the 10x10x5 runtime. The '4x4x2 % of Default Runtime' gives the average runtime for the given KPAR/NPAR configuration with a 4x4x2 grid as a percentage of the default (KPAR=1, NPAR=# of cores) runtime with a 4x4x2 grid. The '10x10x5 % of Default Runtime' shows the equivalent for calculations using a 10x10x5 grid."
)
print(
    "\n*doesn't include data for lower node counts, so is artifically skewed down because KPAR=9 performs the best at high node counts but poorly and low node counts."
)

KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on CPUs with Open MPI
╒═════════════════════════╤═══════════════════╤══════════════════════════════╤════════════════════════════════╕
│                         │ KPOINTS Scaling   │ 4x4x2 % of Default Runtime   │ 10x10x5 % of Default Runtime   │
╞═════════════════════════╪═══════════════════╪══════════════════════════════╪════════════════════════════════╡
│ KPAR=1, NPAR=# of cores │ 18.93%            │ 100.00%                      │ 100.00%                        │
├─────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=1, NPAR=4          │ 19.84%            │ 44.05%                       │ 42.50%                         │
├─────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=1, NPAR=sqrt       │ 19.21%            │ 87.13%                       │ 82.07%                         │
├─────────────────────────┼

<IPython.core.display.Javascript object>

In [198]:
table_KPAR_NPAR_CPU = [
    [
        " ",
        "KPOINTS Scaling",
        "4x4x2 % of Default Runtime",
        "10x10x5 % of Default Runtime",
    ],
    [
        "KPAR=1, NPAR=# of cores",
        f"{100*proportion_swift_1_K1_ND_intel:.2f}%",
        f"{100*proportion_swift_1_K1_ND_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K1_ND_intel_10x10x5:.2f}%",
    ],
    [
        "KPAR=1,NPAR=sqrt(# of cores)",
        f"{100*proportion_swift_1_K1_Nsqrt_intel:.2f}%",
        f"{100*proportion_swift_1_K1_Nsqrt_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K1_Nsqrt_intel_10x10x5:.2f}%",
    ],
    [
        "KPAR=1, NPAR=4",
        f"{100*proportion_swift_1_K1_N4_intel:.2f}%",
        f"{100*proportion_swift_1_K1_N4_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K1_N4_intel_10x10x5:.2f}%",
    ],
    [
        "KPAR=4, NPAR=# of cores",
        f"{100*proportion_swift_1_K4_ND_intel:.2f}%",
        f"{100*proportion_swift_1_K4_ND_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K4_ND_intel_10x10x5:.2f}",
    ],
    [
        "KPAR=4,NPAR=sqrt(# of cores)",
        f"{100*proportion_swift_1_K4_Nsqrt_intel:.2f}%",
        f"{100*proportion_swift_1_K4_Nsqrt_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K4_Nsqrt_intel_10x10x5:.2f}%",
    ],
    [
        "KPAR=4, NPAR=4",
        f"{100*proportion_swift_1_K4_N4_intel:.2f}%",
        f"{100*proportion_swift_1_K4_N4_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K4_N4_intel_10x10x5:.2f}%",
    ],
    [
        "KPAR=8, NPAR=4",
        f"{100*proportion_swift_1_K8_N4_intel:.2f}%",
        f"{100*proportion_swift_1_K8_N4_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K8_N4_intel_10x10x5:.2f}%",
    ],
    [
        "KPAR=9, NPAR=4",
        f"{100*proportion_swift_1_K9_N4_intel:.2f}%",
        f"{100*proportion_swift_1_K9_N4_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K9_N4_intel_10x10x5:.2f}%",
    ],
    [
        "KPAR=9,NPAR=sqrt(# of cores)*",
        f"{100*proportion_swift_1_K9_Nsqrt_intel:.2f}%",
        f"{100*proportion_swift_1_K9_Nsqrt_intel_4x4x2:.2f}%",
        f"{100*proportion_swift_1_K9_Nsqrt_intel_10x10x5:.2f}%",
    ],
]

print("KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on CPUs with Intel MPI")
print(tabulate(table_KPAR_NPAR_CPU, headers="firstrow", tablefmt="fancy_grid"))
print(
    "The 'KPOINTS Scaling' column gives the average 4x4x2 runtime as a percentage of the 10x10x5 runtime. The '4x4x2 % of Default Runtime' gives the average runtime for the given KPAR/NPAR configuration with a 4x4x2 grid as a percentage of the default (KPAR=1, NPAR=# of cores) runtime with a 4x4x2 grid. The '10x10x5 % of Default Runtime' shows the equivalent for calculations using a 10x10x5 grid."
)
print(
    "\n*doesn't include data for lower node counts, so is artifically skewed down because KPAR=9 performs the best at high node counts but poorly and low node counts."
)

KPOINTS Scaling and KPAR/NPAR Runtimes for Benchmark 1 on CPUs with Intel MPI
╒═══════════════════════════════╤═══════════════════╤══════════════════════════════╤════════════════════════════════╕
│                               │ KPOINTS Scaling   │ 4x4x2 % of Default Runtime   │ 10x10x5 % of Default Runtime   │
╞═══════════════════════════════╪═══════════════════╪══════════════════════════════╪════════════════════════════════╡
│ KPAR=1, NPAR=# of cores       │ 20.20%            │ 100.00%                      │ 100.00%                        │
├───────────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=1,NPAR=sqrt(# of cores)  │ 20.09%            │ 85.69%                       │ 100.67%                        │
├───────────────────────────────┼───────────────────┼──────────────────────────────┼────────────────────────────────┤
│ KPAR=1, NPAR=4                │ 20.13%            │ 28.22%                       │ 32.03%     

<IPython.core.display.Javascript object>

Now let's take the best performing combination of KPAR and NPAR to continue the analysis: KPAR=9, NPAR=4. All data going forward is for runs on full nodes (64 CPUs/node).

### MPI on Swift with Benchmark 1
We ran Benchmark 1 using Intel MPI and Open MPI and compared the performance. The two versions were compiled as described at the top of this document and run with the respective MPI. 

In [199]:
data_swift_1_K9_N4_4x4x2 = data_swift_1_K9_N4[data_swift_1_K9_N4["kpoints"] == "4x4x2"]

fig = px.scatter(
    data_swift_1_K9_N4_4x4x2,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: Intel MPI vs. Open MPI with 4x4x2 kpoints grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [200]:
data_swift_1_K9_N4_10x10x5 = data_swift_1_K9_N4[
    data_swift_1_K9_N4["kpoints"] == "10x10x5"
]

fig = px.scatter(
    data_swift_1_K9_N4_10x10x5,
    x="Cores",
    y="scaled_rate",
    color="MPI",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: Intel MPI vs. Open MPI with 10x10x5 kpoints grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [201]:
data_swift_1_K9_N4_4x4x2 = data_swift_1_K9_N4[data_swift_1_K9_N4["kpoints"] == "4x4x2"]
data_swift_1_K9_N4_10x10x5 = data_swift_1_K9_N4[
    data_swift_1_K9_N4["kpoints"] == "10x10x5"
]

proportion_4x4x2 = get_scaling(data_swift_1_K9_N4_4x4x2, "MPI", "intelmpi", "openmpi")
proportion_10x10x5 = get_scaling(
    data_swift_1_K9_N4_10x10x5, "MPI", "intelmpi", "openmpi"
)

print(
    f"On average, using a 4x4x2 kpoints grid, calculations using Intel MPI ran in {proportion_4x4x2:.4f} the amount of time as calculations using OpenMPI",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations using Intel MPI ran in {proportion_10x10x5:.4f} the amount of time as calculations using OpenMPI",
)

On average, using a 4x4x2 kpoints grid, calculations using Intel MPI ran in 0.7689 the amount of time as calculations using OpenMPI

On average, using a 10x10x5 kpoints grid, calculations using Intel MPI ran in 0.7351 the amount of time as calculations using OpenMPI


<IPython.core.display.Javascript object>

### cpu-bind on Swift with Benchmark 1
The --cpu-bind flag changes how tasks are assigned to cores throughout the node. We ran Benchmark 2 on Swift with cpu-bind=cores and cpu-bind=rank to compare the performance to running without setting cpu-bind.

In [202]:
data_swift_1_K9_N4_4x4x2_intel = data_swift_1_K9_N4_4x4x2[
    data_swift_1_K9_N4_4x4x2["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_swift_1_K9_N4_4x4x2_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: cpu-bind for Intel MPI with a 4x4x2 kpoints grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [203]:
data_swift_1_K9_N4_4x4x2_open = data_swift_1_K9_N4_4x4x2[
    data_swift_1_K9_N4_4x4x2["MPI"] == "openmpi"
]

fig = px.scatter(
    data_swift_1_K9_N4_4x4x2_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: cpu-bind for Open MPI with a 4x4x2 kpoints grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [204]:
data_swift_1_K9_N4_10x10x5_intel = data_swift_1_K9_N4_10x10x5[
    data_swift_1_K9_N4_10x10x5["MPI"] == "intelmpi"
]

fig = px.scatter(
    data_swift_1_K9_N4_10x10x5_intel,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: cpu-bind for Intel MPI with a 10x10x5 kpoints grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [205]:
data_swift_1_K9_N4_10x10x5_open = data_swift_1_K9_N4_10x10x5[
    data_swift_1_K9_N4_10x10x5["MPI"] == "openmpi"
]

fig = px.scatter(
    data_swift_1_K9_N4_10x10x5_open,
    x="Cores",
    y="scaled_rate",
    color="cpu-bind",
    symbol="kpoints",
    hover_data=[
        "HPC System",
        "Partition",
        "Nodes",
        "cpu-bind",
        "MPI",
        "Benchmark Code",
    ],
    labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
    title="Benchmark 1 on Swift: cpu-bind for Open MPI with a 10x10x5 kpoints grid",
)
fig.show()

<IPython.core.display.Javascript object>

In [206]:
proportion_4x4x2_intel_rank = get_scaling(
    data_swift_1_K9_N4_4x4x2_intel, "cpu-bind", "rank", "none"
)
proportion_4x4x2_intel_cores = get_scaling(
    data_swift_1_K9_N4_4x4x2_intel, "cpu-bind", "cores", "none"
)
proportion_10x10x5_intel_rank = get_scaling(
    data_swift_1_K9_N4_10x10x5_intel, "cpu-bind", "rank", "none"
)
proportion_10x10x5_intel_cores = get_scaling(
    data_swift_1_K9_N4_10x10x5_intel, "cpu-bind", "cores", "none"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in {proportion_4x4x2_intel_rank:.4f} the amount of time as calculations without --cpu-bind.",
)
print(
    f"\nOn average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in {proportion_4x4x2_intel_cores:.4f} the amount of time as calculations without --cpu-bind.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in {proportion_10x10x5_intel_rank:.4f} the amount of time as calculations without --cpu-bind.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in {proportion_10x10x5_intel_cores:.4f} the amount of time as calculations without --cpu-bind.",
)

On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in 0.9733 the amount of time as calculations without --cpu-bind.

On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in 1.0282 the amount of time as calculations without --cpu-bind.

On average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in 1.0039 the amount of time as calculations without --cpu-bind.

On average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in 1.0208 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [207]:
proportion_4x4x2_open_rank = get_scaling(
    data_swift_1_K9_N4_4x4x2_open, "cpu-bind", "rank", "none"
)
proportion_4x4x2_open_cores = get_scaling(
    data_swift_1_K9_N4_4x4x2_open, "cpu-bind", "cores", "none"
)
proportion_10x10x5_open_rank = get_scaling(
    data_swift_1_K9_N4_10x10x5_open, "cpu-bind", "rank", "none"
)
proportion_10x10x5_open_cores = get_scaling(
    data_swift_1_K9_N4_10x10x5_open, "cpu-bind", "cores", "none"
)
print(
    f"On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in {proportion_4x4x2_open_rank:.4f} the amount of time as calculations without --cpu-bind.",
)
print(
    f"\nOn average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in {proportion_4x4x2_open_cores:.4f} the amount of time as calculations without --cpu-bind.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in {proportion_10x10x5_open_rank:.4f} the amount of time as calculations without --cpu-bind.",
)
print(
    f"\nOn average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in {proportion_10x10x5_open_cores:.4f} the amount of time as calculations without --cpu-bind.",
)

On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in 0.9813 the amount of time as calculations without --cpu-bind.

On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in 1.0174 the amount of time as calculations without --cpu-bind.

On average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in 0.9287 the amount of time as calculations without --cpu-bind.

On average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in 0.9807 the amount of time as calculations without --cpu-bind.


<IPython.core.display.Javascript object>

In [177]:
os.system("jupyter nbconvert --execute --to html VASP_Recommendations.ipynb")

65280

<IPython.core.display.Javascript object>