# Results

Here, we will demonstrate the results of our new `qhist` version using a test dataset. First, let's make sure we have cleaned up any prior created data and qhist install for this notebook, and also avoid side-effects if this system does support a `qhist` config already.

In [1]:
# First, define the root path for our experiment directory
exp_root=$(pwd)

# Remove any existing files from prior notebook runs
rm -rf sample_logs local qhist
mkdir sample_logs

# Let's also make sure a system config does not intefere with our tests
unset QHIST_SERVER_CONFIG

Now, we can create the data:

In [2]:
# Create two "days" of data
cat > sample_logs/20250331 << "EOF"
03/31/2025 10:59:25;R;4215033.casper-pbs;user=vanderwb group=csgteam account="SCSG0001" project=_pbs_project_default jobname=STDIN queue=htc ctime=1743440355 qtime=1743440355 etime=1743440355 start=1743440360 exec_host=crhtc65/34 exec_vnode=(crhtc65:ncpus=1:mem=1kb) Resource_List.mem=30b Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=30:ompthreads=1 Resource_List.walltime=06:00:00 session=0 end=1743440365 Exit_status=-3 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0b resources_used.ncpus=1 resources_used.vmem=0kb resources_used.walltime=00:00:00 eligible_time=00:00:08 run_count=1
03/31/2025 10:59:35;E;4215033.casper-pbs;user=vanderwb group=csgteam account="SCSG0001" project=_pbs_project_default jobname=STDIN queue=htc ctime=1743440355 qtime=1743440355 etime=1743440355 start=1743440371 exec_host=crhtc65/34 exec_vnode=(crhtc65:ncpus=1:mem=1kb) Resource_List.mem=30b Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=30:ompthreads=1 Resource_List.walltime=06:00:00 session=0 end=1743440375 Exit_status=-1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0b resources_used.ncpus=1 resources_used.vmem=0kb resources_used.walltime=00:00:00 eligible_time=00:00:08 run_count=2
03/31/2025 11:03:12;E;4215034.casper-pbs;user=vanderwb group=csgteam account="SCSG0001" project=_pbs_project_default jobname=STDIN queue=htc ctime=1743440363 qtime=1743440363 etime=1743440363 start=1743440374 exec_host=crhtc86/12 exec_vnode=(crhtc86:ncpus=1:mem=31457280kb) Resource_List.mem=30gb Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mem=30GB:ompthreads=1 Resource_List.walltime=06:00:00 session=108694 end=1743440592 Exit_status=0 resources_used.cpupercent=2 resources_used.cput=00:00:09 resources_used.mem=756960kb resources_used.ncpus=1 resources_used.vmem=8653400kb resources_used.walltime=00:03:36 eligible_time=00:00:13 run_count=1
03/31/2025 11:34:35;E;4215265.casper-pbs;user=bneuman group=ncar account="SCSG0001" project=_pbs_project_default jobname=bneuman_matlab queue=htc ctime=1743442446 qtime=1743442446 etime=1743442446 start=1743442452 exec_host=crhtc72/0*5 exec_vnode=(crhtc72:ncpus=5:mem=10485760kb) Resource_List.mem=10gb Resource_List.mps=0 Resource_List.ncpus=5 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=5:ompthreads=5 Resource_List.walltime=00:25:00 session=95643 end=1743442475 Exit_status=0 resources_used.cpupercent=156 resources_used.cput=00:00:30 resources_used.mem=3163632kb resources_used.ncpus=5 resources_used.vmem=23445720kb resources_used.walltime=00:00:19 eligible_time=00:00:08 run_count=1
EOF

cat > sample_logs/20250401 << "EOF"
04/01/2025 13:07:35;E;4220853.casper-pbs;user=negins group=ncar account="P93300606" project=_pbs_project_default jobname=cr-jhub-batch-stable queue=htc ctime=1743527223 qtime=1743527223 etime=1743527223 start=1743527229 exec_host=crhtc65/20 exec_vnode=(crhtc65:ncpus=1:mem=4194304kb) Resource_List.mem=4gb Resource_List.mpiprocs=1 Resource_List.mps=0 Resource_List.ncpus=1 Resource_List.ngpus=0 Resource_List.nodect=1 Resource_List.nvpus=0 Resource_List.place=scatter Resource_List.select=1:ncpus=1:mpiprocs=1:ompthreads=1:mem=4GB Resource_List.walltime=02:00:00 session=104352 end=1743534455 Exit_status=-29 resources_used.cpupercent=8 resources_used.cput=00:03:52 resources_used.mem=261056kb resources_used.ncpus=1 resources_used.vmem=667248kb resources_used.walltime=02:00:22 eligible_time=00:00:09 run_count=1
EOF

## Installing `qhist` with the `Makefile`

Since we are running `qhist` as a command-line utility within a *bash* kernel, we will use the Makefile approach for installing the latest version of qhist.

In [3]:
# First, let's clone the qhist repository
git clone --depth 1 --branch v1.0 https://github.com/NCAR/qhist.git

# Now we install into a specified prefix, which will also install pbsparse
cd qhist
make install PREFIX=$exp_root/local
cd $exp_root

Cloning into 'qhist'...
remote: Enumerating objects: 27, done.        
remote: Counting objects: 100% (27/27), done.        
remote: Compressing objects: 100% (17/17), done.        
remote: Total 27 (delta 0), reused 22 (delta 0), pack-reused 0 (from 0)        
Receiving objects: 100% (27/27), 14.29 KiB | 975.00 KiB/s, done.
Note: switching to 'b93fef768c43812bf622621dc1f8548a27f043f2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

git submodule init
Submodule 'lib/pbsparse' (git@github.com:NCAR/pbsparse.git) registered for

### Configuring qhist

`qhist` relies on a configuration file, which will tell the tool where to find accounting logs. There are multiple ways to specify this config file:

1. Set the environment variable `QHIST_SERVER_CONFIG` to the path of your configuration file.
2. Put your configuration into `server.json` within the `cfg` subdirectory of your `qhist` installation.
3. Create a configuration file at `/etc/qhist/server.json`.

For this demonstration, we will use the second approach.

In [4]:
cat > $exp_root/local/lib/qhist/qhist/cfg/server.json << EOF
{
    "pbs_log_path"      : "$exp_root/sample_logs"
}
EOF

In [5]:
# Finally, let's add qhist to our PATH
export PATH=$exp_root/local/bin:$PATH

We should now be able to use qhist!

## Basic queries

Since we are not using data from the current date, we will need to specify the date directly using the `-p/--period` argument to `qhist`.

In [6]:
qhist --period 20250401

Job ID       User       Queue    Nodes  NCPUs NGPUs     End  Mem(GB)   CPU(%)  Elap(h)
------------ ---------- -------- ----- ------ ----- ------- -------- -------- --------
4220853      negins     htc          1      1     0 01-1307     0.25     8.00     2.01


By default, qhist will show tabular data in normal width, with only jobs that have "ended" being displayed. We can change this behavior via command-line arguments. For example, let's say we wanted to display `R` or "requeue" records in long-form/list format.

Let's run this over both days to ensure we capture the requeue record.

In [7]:
qhist -p 20250331-20250401 --event R --list

4215033.casper-pbs
   User          = vanderwb
   Queue         = htc
   Job Submit    = 2025-03-31 10:59:15
   Eligible Time = 2025-03-31 10:59:15
   Job Start     = 2025-03-31 10:59:20
   Job End       = 2025-03-31 10:59:25
   Used Mem(GB)  = 0
   Avg CPU (%)   = 0.00
   Waittime (h)  = 0.00
   Walltime (h)  = 6.00
   Elapsed (h)   = 0.00
   Job Name      = STDIN
   Exit Status   = -3
   Account       = SCSG0001
   Resources     = 1:ncpus=1:mem=30:ompthreads=1



## Advanced queries

We can also ask `qhist` to compute averages for numerical fields. This will be most useful if we specify a custom format with the fields of interest. We can see a majority of the available fields - which can be referenced by convenient short names - using `--format=help`:

In [8]:
qhist --format=help


This option allows you to specify a custom format. This setting's behavior
depends on which mode you are using:

For default and wide behavior, enter a string containing Python's format syntax
(modern version). For list and csv modes, a comma-delimited string with field
names is the expected input.

Examples:
    qhist --format="{id:9.9} {account:9.9} {reqmem:8.2f} {memory:8.2f}"
    qhist --list --format="account,reqmem,memory"

The following variables are available:

    account
    avgcpu
    cputype
    elapsed
    eligible
    end
    gputype
    memory
    mpiprocs
    name
    nodelist
    numcpus
    numgpus
    numnodes
    ompthreads
    placement
    queue
    reqmem
    resources
    start
    status
    submit
    user
    walltime



Now let's specify our custom format and compute averages. We can also filter jobs by `user` and `name` to get only **STDIN** (interactive) jobs by **vanderwb**. Once we layer on options, we see how `qhist` makes querying the PBS Pro accounting logs much easier than with the raw data.

In [9]:
my_format="{numcpus:5d} {memory:8.2f} {reqmem:8.2f} {elapsed:8.2f}"

In [10]:
qhist -p 20250331 --user vanderwb --name STDIN --format "$my_format" --average

NCPUs Mem(GB)  RMem(GB) Elap(h) 
----- -------- -------- --------
    1     0.00     0.00     0.00
    1     0.72    30.00     0.06

Averages across 2 jobs:

NCPUs Mem(GB)  RMem(GB) Elap(h) 
----- -------- -------- --------
 1.00     0.36    15.00     0.03


We can use free-form filtering to perform more complex searches. In the following example, we include both end and requeue records and then search for all jobs using more than 1 CPU core.

In [11]:
qhist -p 20250331-20250401 -e ER --filter "numcpus>1"

Job ID       User       Queue    Nodes  NCPUs NGPUs     End  Mem(GB)   CPU(%)  Elap(h)
------------ ---------- -------- ----- ------ ----- ------- -------- -------- --------
4215265      bneuman    htc          1      5     0 31-1134     3.02    31.20     0.01


## Other output modes

Finally, let's demonstrate other output modes. We have already seen the `-l/--list` output; we can also display jobs in two other modes - **csv** and **json**.

Here, we examine just a single job: `4220853.casper-pbs`.

In [12]:
# We also disable the data label header, to facilate machine readability
qhist -p 20250401 4220853.casper-pbs --csv --noheader

4220853.casper-pbs,negins,htc,2025-04-01 11:07:03,2025-04-01 11:07:03,2025-04-01 11:07:09,2025-04-01 13:07:35,0.24896240234375,8.0,0.0016666666666666668,2.0,2.006111111111111,cr-jhub-batch-stable,-29,P93300606,1:ncpus=1:mpiprocs=1:ompthreads=1:mem=4GB


While the output seems similar to the raw log output, now all fields are consistent across records and comma-separated (*whereas the raw records are either semicolon or space delimited depending on the type of record element*).

In [13]:
qhist -p 20250401 4220853.casper-pbs --json

{
    "4220853.casper-pbs": {
        "time": "2025-04-01 13:07:35",
        "type": "E",
        "short_id": "4220853",
        "user": "negins",
        "group": "ncar",
        "account": "P93300606",
        "project": "_pbs_project_default",
        "jobname": "cr-jhub-batch-stable",
        "queue": "htc",
        "ctime": "2025-04-01 11:07:03",
        "qtime": "1743527223",
        "etime": "2025-04-01 11:07:03",
        "start": "2025-04-01 11:07:09",
        "exec_host": "crhtc65/20",
        "exec_vnode": "(crhtc65:ncpus=1:mem=4194304kb)",
        "Resource_List": {
            "mem": 4.0,
            "mpiprocs": "1",
            "mps": "0",
            "ncpus": 1,
            "ngpus": 0,
            "nodect": 1,
            "nvpus": "0",
            "place": "scatter",
            "select": "1:ncpus=1:mpiprocs=1:ompthreads=1:mem=4GB",
            "walltime": 2.0
        },
        "session": "104352",
        "end": "2025-04-01 13:07:35",
        "Exit_status": "-29",
     

While we only show a single job in these two modes for readability, they support multiple-job queries as do all other output modes.