Skip to content

Add bitwise comparison step for FLUXNET test suite#76

Merged
SeanBryan51 merged 10 commits intomasterfrom
32-add-regression-test-to-benchcab
Jun 8, 2023
Merged

Add bitwise comparison step for FLUXNET test suite#76
SeanBryan51 merged 10 commits intomasterfrom
32-add-regression-test-to-benchcab

Conversation

@SeanBryan51
Copy link
Copy Markdown
Collaborator

@SeanBryan51 SeanBryan51 commented May 10, 2023

This change adds the ability for benchcab to run bitwise comparisons between NetCDF output files using the nccmp command. Comparisons are made between outputs that differ in their realisation and are matching in all other configurations (science configurations and meteorological forcing). Write standard output from comparison tasks on failure to the runs/site/analysis/bitwise-comparisons directory.

Since multiple realisations can be specified, comparisons are made between all pair wise combinations of realisations.

This change removes the --no-submit optional argument from benchcab fluxnet-run-tasks and instead submit a PBS job only when running benchcab run or benchcab fluxnet. We do this so that we can run the bitwise comparison step in the same job script used to run CABLE.

The comparison step can be run in isolation by executing benchcab fluxnet-bitwise-cmp. However, this should ideally be executed on a compute node (inside a PBS job for example).

This change also refactors the parallelisation scheme used for running CABLE tasks and comparison tasks so that workers fetch tasks from a multiprocessing.Queue object. Previously, if a process had completed the CABLE tasks it was allocated, it will remain idle until all other processes had completed their allocated tasks. This change prevents processes from idling if tasks are still yet to be completed.

The comparison step can be skipped by specifying --skip fluxnet-bitwise-cmp to composite commands such as benchcab run and benchcab fluxnet.

Fixes #32
Fixes #77

@SeanBryan51 SeanBryan51 linked an issue May 10, 2023 that may be closed by this pull request
@SeanBryan51 SeanBryan51 force-pushed the 32-add-regression-test-to-benchcab branch from c7a7876 to 7e39302 Compare May 10, 2023 02:23
@SeanBryan51
Copy link
Copy Markdown
Collaborator Author

SeanBryan51 commented May 10, 2023

TODO

  • measure the time it takes to run benchcab fluxnet-run-bitwise-cmp in a PBS job for five site and forty two site case with default science configurations
  • make benchcab fluxnet-run-bitwise-cmp an automated step when runnning benchcab run. Submit in single job script?
  • use diffn instead of diff
  • run-bitwise-cmp -> bitwise-cmp
  • write standard output from cdo diff to file on failure

@SeanBryan51 SeanBryan51 force-pushed the 32-add-regression-test-to-benchcab branch 2 times, most recently from 3ab85b0 to d49c329 Compare May 22, 2023 01:32
@SeanBryan51 SeanBryan51 force-pushed the 32-add-regression-test-to-benchcab branch from d49c329 to 865f41c Compare May 31, 2023 00:46
@SeanBryan51
Copy link
Copy Markdown
Collaborator Author

Now using nccmp instead of cdo diffn. This is because cdo diffn was using an unreasonable amount of memory per comparison and was causing processes to be killed. The memory usage of cdo diffn linearly increases with respect to time and may be a memory leak in the tool.

Tested for cdo versions 1.7.2, 1.9.8, 1.9.10 and 2.0.5

@SeanBryan51 SeanBryan51 force-pushed the 32-add-regression-test-to-benchcab branch 3 times, most recently from 7d82ad7 to eafacd3 Compare May 31, 2023 05:07
@SeanBryan51 SeanBryan51 marked this pull request as ready for review May 31, 2023 05:11
@SeanBryan51 SeanBryan51 requested a review from ccarouge May 31, 2023 05:12
This change adds the ability for benchcab to run bitwise comparisons
between NetCDF output files using the [`nccmp`] command. Comparisons
are made between outputs that differ in their realisation and are
matching in all other configurations (science configurations and
meteorological forcing). Write standard output from comparison tasks on
failure to the `runs/site/analysis/bitwise-comparisons` directory.

Since multiple realisations can be specified, comparisons are made
between all pair wise combinations of realisations.

This change removes the `--no-submit` optional argument from `benchcab
fluxnet-run-tasks` and instead submit a PBS job only when running
`benchcab run` or `benchcab fluxnet`. We do this so that we can run the
bitwise comparison step in the same job script used to run CABLE.

The comparison step can be run in isolation by executing `benchcab
fluxnet-bitwise-cmp`. However, this should ideally be executed on a
compute node (inside a PBS job for example).

This change also refactors the parallelisation scheme used for running
CABLE tasks and comparison tasks so that workers fetch tasks from a
`multiprocessing.Queue` object. Previously, if a process had completed
the CABLE tasks it was allocated, it will remain idle until all other
processes had completed their allocated tasks. This change prevents
processes from idling if tasks are still yet to be completed.

Fixes #32

[`nccmp`]: https://gitlab.com/remikz/nccmp
@SeanBryan51 SeanBryan51 force-pushed the 32-add-regression-test-to-benchcab branch from eafacd3 to ed2cad2 Compare May 31, 2023 08:34
@SeanBryan51 SeanBryan51 force-pushed the 32-add-regression-test-to-benchcab branch from e806471 to 7dc2dca Compare May 31, 2023 13:48
Copy link
Copy Markdown
Member

@ccarouge ccarouge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great. Just one comment to check for patch to run the comparison.

Comment thread benchcab/benchcab.py
run_comparisons(comparisons, verbose=self.args.verbose)
print("Successfully ran comparison tasks")

def fluxnet(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to move the no-submit option to the fluxnet command.

Comment thread benchcab/task.py
for task_b in tasks
if task_a.met_forcing_file == task_b.met_forcing_file
and task_a.sci_conf_id == task_b.sci_conf_id
and task_a.branch_id < task_b.branch_id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test on branch_patch so we don't run the test if only one of the tasks has a patch applied? Since in this case, we are fairly certain the science configurations are different.

If both branches have a patch, we should keep the comparison as the patches could be identical.

Copy link
Copy Markdown
Collaborator Author

@SeanBryan51 SeanBryan51 Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I agree with those cases where a user does not need to run a regression test. However, I think the best approach would be for the user manually switch off the regression test (we have an issue for this: #77).

It might be counter productive trying to guess when the user wants to switch on/off the regression test. For example, we might want to do a regression test as a sanity check.

@SeanBryan51 SeanBryan51 requested a review from ccarouge June 7, 2023 00:53
This change allows optional steps in the workflow to be skipped by
specifying `--skip <cmd>` to composite commands such as `benchcab run`
and `benchcab fluxnet`.

Fixes #77
@SeanBryan51 SeanBryan51 force-pushed the 32-add-regression-test-to-benchcab branch from 045325a to bfda7c6 Compare June 7, 2023 01:08
@SeanBryan51 SeanBryan51 merged commit 02ec7fb into master Jun 8, 2023
@SeanBryan51 SeanBryan51 deleted the 32-add-regression-test-to-benchcab branch June 8, 2023 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

disable analyses parts of the workflow on demand Add regression test to benchcab

2 participants