Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DataFile/Dataformat with OptionsNetCDF #2209

Merged
merged 116 commits into from
Sep 14, 2022
Merged

Conversation

ZedThree
Copy link
Member

@ZedThree ZedThree commented Jan 13, 2021

Replace all the existing DataFile/Dataformat infrastructure with OptionsNetCDF

Closes #1254

Currently:

  • restart files moved to OptionsNetCDF
  • additional BOUT.dmp.experimental.*.nc files using OptionsNetCDF -- separate files for now just for testing!
  • time-evolving variables added to Solver are written
  • build config is written
  • method for checking all time-evolving variables have the same sized time dimension

Missing:

  • user-added variables, using e.g. SAVE_ONCE or SAVE_REPEAT
  • upgrade tool -- EDIT: this might not happen, as it's seems quite tricky to automatically upgrade
  • direct read/write method -- direct read not yet implemented, but direct write can be done with e.g. OptionsNetCDF{}.write({{"key": value}})
  • single-method for adding a time-evolving variable to Options

One possibly blocking issue at the moment is that OptionsNetCDF writes all the keys in lowercase. This causes problems with e.g. collect, which expects BOUT_VERSION to be in uppercase, so thinks the files are pre-v0.2! That's a simple fix for that one variable in collect, but I am a little bit worried more generally about what tools expect certain variables in the files to be written in uppercase. @johnomotani what does xBOUT do?

We also need to decide on exactly what the interface will be for users to add variables. In the current form, there's now an Options member of PhysicsModel, output_options, which gets written to file in PhysicsModel::PhysicsModelMonitor::call.
My feeling is we probably want a PhysicsModel::outputVars(Options&) method which gets called by PhysicsModelMonitor.

Also make one-argument constructor explicit
NetCDF doesn't have this combination, only:
  - file exists, open read-only
  - file exists, open read-write
  - create new file, overwrite existing
  - create new file, fail if exists
Overloads for `Coordinates`, `Mesh`, `Solver`, `ParallelTransform`

Assignment needs to be `force`d in order to ensure any existing values
are overwritten.
This is a breaking change, as it requires changes to user code when
adding variables to the restart files

Removes support for HDF5 restart files
This is to keep the latest per-variable record consistent between
opening/closing the file, runs, etc. as netCDF doesn't (and won't)
provide functionality to do this natively.

Also add OptionsNetCDF::verifyTimesteps() to check that all variables
with a given time dimension all have the same size.
Creating an `OptionsNetCDF` with `FileMode::replace` overwrites the
file on _every_ call to `OptionsNetCDF::write`, which is probably not
what you want. One way around this would be to have the `NcFile`
object as a member variable and keep it open between calls to `write`.
This has the complication of us then needing to make sure to flush the
file correctly.

Instead, just change the mode after writing to file.
Currently saves files as "BOUT.dmp.experimental.*.nc"

Has main outputs
@ZedThree ZedThree added this to the BOUT-5.0 milestone Jan 13, 2021
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/bout/physicsmodel.hxx Outdated Show resolved Hide resolved
include/bout/physicsmodel.hxx Outdated Show resolved Hide resolved
include/bout/physicsmodel.hxx Outdated Show resolved Hide resolved
include/bout/physicsmodel.hxx Outdated Show resolved Hide resolved
include/bout/physicsmodel.hxx Outdated Show resolved Hide resolved
return;
}

std::string error_string = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant string initialization [readability-redundant-string-init]

std::string error_string = "";
            ^~~~~~~~~~~~~~~~~
            error_string

tests/unit/sys/test_options_netcdf.cxx Show resolved Hide resolved
tests/unit/sys/test_options_netcdf.cxx Show resolved Hide resolved
tests/unit/sys/test_options_netcdf.cxx Show resolved Hide resolved
tests/unit/sys/test_options_netcdf.cxx Show resolved Hide resolved
@johnomotani
Copy link
Contributor

what tools expect certain variables in the files to be written in uppercase. @johnomotani what does xBOUT do?

xarray/xBOUT are case sensitive at the moment. I guess we could work around by renaming everything to lower-case when we load. I'd prefer to keep the output case-sensitive though (otherwise for example variable names will be converted to lower-case when printed, which I would find surprising as a user).

Actually I would vote for making Options case-sensitive (I find that behaviour less surprising myself, and it makes the code slightly simpler), although it's not backward compatible. If we make warnings about unused options more prominent (e.g. print at the beginning of the run, after the first rhs call, and also at the end of the run), I think that would mitigate the worry about getting unexpected behaviour because an option was typed with the wrong case.

@ZedThree
Copy link
Member Author

I do think case-sensitive Options should be considered, though I would be more keen on it if we had a better way of dealing with unused/unknown/mistyped options.

If we didn't want to do that, we could probably modify Options to keep the original case but reads were case-insensitive by changing the option name from std::string to something like:

struct CaseInsensitiveKey {
  std::string original_case; // Store the original case as well as the lower case version
  std::string lower_case;
  
  CaseInsensitiveKey(const std::string& key) : original_case(key), lower_case(lowercase(key)) {};

  // Implicitly converts to std::string
  operator const std::string& () const { return original_case; }

  // Order by the original case -- needed for std::map
  bool operator< (const std::string& other) { return other < original_case; }

  // When checking for equality, check the original case, then lowercase
  bool operator== (const std::string& other) {
    if (other == original_case) {
      return true;
    }
    return lowercase(other) == lower_case;
  }
};

Also needs assignment, copy, move operators, but that would probably allow us to keep the case-insensitivity when reading the options file, but have case sensitivity when writing. But more code, more chance for things to go wrong.

@bendudson
Copy link
Contributor

bendudson commented Jan 14, 2021

I agree it's tricky, and case sensitivity simplifies lots of things in the code. On the other hand I would not want to have keys which differed by case, or users thinking they had changed an input where they hadn't (because they got the case wrong).

I worry that ordering by original case, and then comparing against both strings would break the behavior of maps e.g. a red-black tree might not match the case insensitive case if they've followed the ordering using the original case.

Perhaps a variation on your suggestion:

struct CaseInsensitiveKey {
  std::string original_case; // Store the original case as well as the lower case version
  std::string lower_case;
  
  CaseInsensitiveKey(const std::string& key) : original_case(key), lower_case(lowercase(key)) {};

  // Implicitly converts to std::string
  operator const std::string& () const { return original_case; }

  // Order by the lower case -- needed for std::map
  bool operator< (const std::string& other) { return lowercase(other) < lower_case; }

  // When checking for equality, check lowercase
  bool operator== (const std::string& other) {
    return lowercase(other) == lower_case;
  }
};

So we do all comparisons in lower case, but retain the original case

@bendudson
Copy link
Contributor

Quick question: If the keys to the map are CaseInsensitiveKey do we need comparisons to strings, or just between CaseInsensitiveKeys? That would eliminate conversion to lower case, apart from in the constructor

@johnomotani
Copy link
Contributor

If we went case-sensitive, how about something like this to help avoid errors:

  • Have an option like strict (true or false, default to false), that makes it an error to run with unused options - that should catch any typos or attempts to use the wrong case. I think Options already has some method to flag unused options - we could check whether there are any after the first rhs call and raise an error if there are, printing the unused options.
  • An option like validate_input. To start with this could be equivalent to just nout=0 strict, so that a user can check that their input file is OK by running something like
    ./mymodel validate_input
    
    Over time we could maybe add some extras, like if there is an unused option searching for similar options (e.g. different case, one-letter typos) and ask if those were intended.

I agree keys that differ by case are horrible, but I'd be inclined to leave it as just bad practice rather than make it an error. If you'd rather be certain, we could add a check like if (new_key.lower() == existing_key.lower()) throw BoutException(...)?

@bendudson
Copy link
Contributor

I think the strict check @johnomotani suggests would be useful whether or not keys are case sensitive. I'd support making that true by default. In which case that removes one of the arguments for case insensitivity.
I quite like your idea about checking when creating keys if they match existing keys case-insensitively, but currently keys are created automatically if they don't already exist, so this might need to be done every time an Options operator[] is called, with a scan and comparison against all the keys in the dictionary. Perhaps this is not needed if we're catching unused keys.
Perhaps this is something as you say that should be a good-practice thing rather than implemented in the code. We'd need to decide some convention on when to use upper case (or always lower case?)

@johnomotani
Copy link
Contributor

this might need to be done every time an Options operator[] is called

I think it'd only be needed when (possibly) creating a new key - if the requested key is already in the map, then we'd return before getting to the check.

@ZedThree
Copy link
Member Author

Ok, I will make a separate PR with modifications to case sensitivity:

  • remove case insensitivity (breaking change)
  • add strict (or better: input:strict?) option, defaulting to true, which checks for unused options
    • this could be backported, with default false
    • where should check be? In Solver::setModel, after call to PhysicsModel::initialise? Or in Solver::solve before call to Solver::run?
  • add validate_input (or better: input:validate?), which just stops after initial strict check (and crucially, doesn't write to file!)

Adding a simple Levenshtein distance would allow us to have more helpful feedback: "tpyo_option not found: did you mean typo_option?"

@bendudson
Copy link
Contributor

Thanks @ZedThree this sounds good thanks!

  • I vote for input:strict and input:validate to keep things namespaced, and gradually reduce the number of switches outside sections
  • It needs to be checked after the RHS has been called once. Perhaps here: https://github.com/boutproject/BOUT-dev/blob/master/src/solver/solver.cxx#L476
    but we would need to move the run_rhs call to before the dump_on_restart check, so that it is always run.
  • Definitely yes to making the inputs more helpful. For each unused key, we could loop through the keys which were set (presumably to default values) and suggest the closest match.

@johnomotani
Copy link
Contributor

input:strict sounds good, and I guess input:validate is the same number of characters as validate_input, so is better. To make validate more discoverable, could we maybe add a command-line option that comes up when you do ./mymodel --help?

@ZedThree
Copy link
Member Author

Yes, good idea! I'd really love for ./mymodel --help to list all the possible options, but I think that would require a massive overhaul of how we used options. It could probably be done using the init-time registration similar to how the factories work.

@bendudson
Copy link
Contributor

We could have input:validate write the BOUT.settings file before exiting. That should then contain all the settings, types and documentation.

@ZedThree
Copy link
Member Author

Oh yes, that's probably good enough. Maybe just print it to screen, to avoid clobbering an existing BOUT.settings file?

* case-sensitive-options: (406 commits)
  Suppress unneeded output in getPossibleBoundaries
  Move `WithQuietOutput` to `Output` header
  Fix performance hit due to creating meshes in getPossibleBoundaries
  Fix capitalised input variables in test-beuler
  CMake: Add test-beuler
  Remove duplicated unit test from merge conflict
  remove unused options
  move guards definition to mesh
  Remove manual ignoring of boundary options in test-vec
  Remove boundary inputs for `f` in PETSc 3D laplace test
  Mark possible boundary condition inputs as used
  Report enabled sanitizers in final config summary
  Fix clang-tidy suggestions: move-assignment should be noexcept
  Don't std::move trivially-movable objects
  Avoid calls to virtual functions in constructors
  Handle self-assignment in copy-assignment
  Make move-assignment operator noexcept
  Use explicit nullptr check in conditional
  Remove else-after-return
  Fix use-after-move in Field3D move-assignment
  ...
* fix-delp2-performance-regression:
  Return reference from `Coordinates::zlength`
  Fix 3D metrics branch of zlength
  [skip ci] Apply black changes
  Cache `Coordinates::zlength()` result
- Don't access static member through instance
- Pass string by const-ref
- Name function parameters
This avoids heavy performance penalty from reopening/closing file each
timestep
@ZedThree
Copy link
Member Author

ZedThree commented Dec 6, 2021

This now has #2471 merged in, as I was testing the performance with it.

I think I've now eliminated the performance hit this PR was introducing -- it seems to have been due to re-creating the NcFile with every call to OptionsNetCDF::write (i.e. every timestep). Now, the NcFile is a member of OptionsNetCDF and the file is opened on demand and synced every timestep.

OptionsNetCDF::read still creates a new NcFile object (i.e. immediately opens a file on disk, reads the contents, and then closes it).

I think the interaction between reading and writing here should be fine, but is a potential cause for concern. It would probably only affect restart files, and we shouldn't have them open for both reading and writing at the same time, so probably fine.

We now don't have a direct dependence on netCDF, avoid including
the netCDF headers everywhere, and also avoid some linking issues with
the make-script example
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/options_netcdf.hxx Outdated Show resolved Hide resolved
include/options_netcdf.hxx Outdated Show resolved Hide resolved
src/sys/options/options_netcdf.cxx Outdated Show resolved Hide resolved
src/sys/options/options_netcdf.cxx Outdated Show resolved Hide resolved
ZedThree and others added 8 commits June 29, 2022 17:05
* next: (548 commits)
  Apply clang-format changes
  Don't compile LaplaceXY2 with 3D metrics
  Apply clang-format changes
  Apply clang-format changes
  Laplacexy2 modifications
  Apply clang-format changes
  Apply clang-format changes
  nvcc needs toString(HYPRE_SOLVER_TYPE)
  nvcc now seems to need the factory flags
  Update fmt to latest version
  Apply clang-format changes
  Output BoutException.what() in PhysicsModel main()
  use hasParallelSlices
  Apply clang-format changes
  Only destroy if it exists
  Backgrounds don't work with parallel BCs
  Ensure that parallel slices exist
  CI: Remove unused stage from clang-tidy-review job
  CI: Bump clang-tidy-review to 0.8.3
  Prefer non-const return value
  ...
* next: (50 commits)
  Remove some useless forward declarations
  Revert "Remove some unused headers from vecops"
  Remove some unused headers from vecops
  Fix bad merge: missed removing extraneous `const` on return value
  Remove `Up/DownXSplitIndex`
  Remove last remaining uses of `deprecated.hxx`
  Apply clang-format
  Apply clang-format changes
  CI: Don't update submodules (already updated)
  CI: Workaround for git ownership checks in container
  CI: Bump actions/checkout version
  CI: Use latest version of cmake in clang-tidy-review
  Use MPI_COMM_NULL rather than nullptr
  Add 'u' format flag to BOUT.settings output
  Adds a 'u' format, commenting unused options
  Apply clang-format changes
  Print wether an option was marked as conditionally used
  Clang-format branch
  Remove `nout`, `timestep` arguments from `Solver::init`
  Use output step/timestep getters/setters in SlepcSolver
  ...
GCC seemed to be fine with the previous code, but clang didn't like it
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 31. Check the log or trigger a new build to see more.

include/bout/physicsmodel.hxx Show resolved Hide resolved
include/bout/physicsmodel.hxx Show resolved Hide resolved
include/bout/physicsmodel.hxx Show resolved Hide resolved
include/bout/physicsmodel.hxx Show resolved Hide resolved
include/bout/physicsmodel.hxx Show resolved Hide resolved
src/solver/impls/slepc/slepc.cxx Outdated Show resolved Hide resolved
src/solver/impls/slepc/slepc.cxx Outdated Show resolved Hide resolved
_set_no_check(std::move(val), std::move(source));
}
template <>
void Options::assign<>(Matrix<BoutReal> val, std::string source) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: the parameter 'val' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]

void Options::assign<>(Matrix<BoutReal> val, std::string source) {
                                        ^

}
template <>
void Options::assign<>(Matrix<BoutReal> val, std::string source) {
_set_no_check(std::move(val), std::move(source));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: passing result of std::move() as a const reference argument; no move will actually happen [performance-move-const-arg]

Suggested change
_set_no_check(std::move(val), std::move(source));
_set_no_check(val, std::move(source));

_set_no_check(std::move(val), std::move(source));
}
template <>
void Options::assign<>(Tensor<BoutReal> val, std::string source) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: the parameter 'val' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]

void Options::assign<>(Tensor<BoutReal> val, std::string source) {
                                        ^

* next: (33 commits)
  Add requires for FFTW to tests
  Add cross-ref in input expressions docs to defn of x,y,z
  Replace slightly-incorrect comment in BoutMesh::GlobalY()
  Documentation on value of `y` in input file expressions
  Correct docs about value of `x` in input file expressions
  Change unbalanced brackets check
  Remove `LaplacePDD` implementation
  Apply clang-format
  Fix some clang-tidy suggestions in PCR Thomas
  Remove duplicated header in Laplace PCR Thomas
  Fix type of `Coordinates::zlength/dz` in Laplace PCR Thomas
  Fix `LaplacePCR_THOMAS` constructor for new Laplace parameters
  Fully remove deprecated `Field3D.background`
  Docs: Fix some formatting issues
  Docs: Fix some broken tables
  Docs: Fix some xBout links
  Docs: Remove outdated note
  Docs: Fix some whitespace
  Docs: Update Ubuntu instructions
  Docs: Explain some common CMake options
  ...
@ZedThree
Copy link
Member Author

Fixed conflicts and an actual bug that clang-tidy caught. This is now ready to go in

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}
template <>
void Options::assign<>(Tensor<BoutReal> val, std::string source) {
_set_no_check(std::move(val), std::move(source));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: passing result of std::move() as a const reference argument; no move will actually happen [performance-move-const-arg]

Suggested change
_set_no_check(std::move(val), std::move(source));
_set_no_check(val, std::move(source));

// choice for this
return 0;
}
int current_time_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'current_time_index' is not initialized [cppcoreguidelines-init-variables]

Suggested change
int current_time_index;
int current_time_index = 0;

src/sys/options/options_netcdf.cxx Show resolved Hide resolved
src/sys/options/options_netcdf.cxx Show resolved Hide resolved
src/sys/options/options_netcdf.cxx Show resolved Hide resolved
src/sys/options/options_netcdf.cxx Show resolved Hide resolved
@ZedThree
Copy link
Member Author

I'm merging this now so that we can get some actual experience with it in next before the imminent 5.0 release

@ZedThree ZedThree merged commit ae92b73 into next Sep 14, 2022
@ZedThree ZedThree deleted the use-options-netcdf branch January 10, 2023 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants