Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make file-reading operations of input data abstractable #481

Merged
merged 24 commits into from
Jun 29, 2022

Conversation

giacomofiorin
Copy link
Member

This change is similar to what done earlier for std::ofstream (whose implementation was not working correctly years ago for some platform that NAMD was running on at the time).

This PR removes explicit uses of std::ifstream throughout most of the code, using instead its base class. Initialization and maintenance of a std::istream object is done by 2-3 additional proxy functions, which also take care of error checking, leading to significant consolidation of duplicated code. The only remaining exceptions are for use cases where it is clear that the input is read from files that was written by Colvars itself as the output of a simulation.

In the current design, stream objects are proxy members, and therefore access to them is restricted to the first thread only. There may be a need to expand this in the future, but I do not see it at the moment.

@HanatoK In the neural network class I removed a couple of exceptions, since the corresponding error check is done by code that should support VMD, but it should be okay to reintroduce them later.

src/colvar_UIestimator.h Outdated Show resolved Hide resolved
@giacomofiorin giacomofiorin marked this pull request as draft May 16, 2022 16:44
Comment on lines 167 to 169
std::istream *ifs_biases = proxy->input_stream(biases_file, "biases file");
if (ifs_biases == NULL) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that here the return value is a pointer solely to be able to track errors through null pointers. Instead of a pointer, could input_stream() return a reference to an istream that is marked as bad (stream.clear(ios::badbit)) in case of error? This would preserve more of the syntax on the caller side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to distinguish when the file is missing vs. corrupted/unreadable. In most of the current use cases, the distinction is not made, but I think that a few could use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned only one, but, there are 3 error bits that can be used: eofbit, failbit and badbit.
https://www.cplusplus.com/reference/ios/ios_base/iostate/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but that means that the entire framework always needs to rely on ifstream after all... You can't explicitly allocate an istream object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think istream also supports being converted to bool for checking these bits (https://en.cppreference.com/w/cpp/io/basic_ios/operator_bool).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but that means that the entire framework always needs to rely on ifstream after all... You can't explicitly allocate an istream object.

Can't you allocate an istringstream and return an istream reference to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhenin You came back with this request three times, which means that you feel very strongly about this design choice. I took a stab at it for a bit, but stumbled with two additional complications.

  1. Returning a reference to an invalid object means that I could not initialize it right in the return statement, and I had to allocate a specific object just to return an invalid one. This object is return to threads > 0 if there is a bug, which I hope means that the code crashes before multiple threads write to the same memory. In the back of my mind, I'm still concerned about deadlocks.
  2. References cannot be reassigned, so those cases where you are reusing the same stream object become more complex, not simpler.

I will be busy for several days, so I put the work in progress in this branch and putting it back on draft status. At least the memory leaks should be addressed.

@giacomofiorin giacomofiorin marked this pull request as ready for review May 16, 2022 18:39
Copy link
Member

@HanatoK HanatoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I test the code path in colvarproxy_io.cpp?

/// \brief Identifiers for output_stream objects: by default, these are the names of the files
std::list<std::string> output_stream_names;

std::map<std::string, std::istream *> input_streams_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are only input streams stored in std::map, while output streams are stored separately in two std::lists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter is much older, when we couldn't be sure that std::map would be available on so many exotic platforms. I plan on converting the latter to std::map as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK if you do not have time to unify the handling of std::istream* and std::ostream* at this time.

src/colvarproxy_io.cpp Show resolved Hide resolved
src/colvarproxy_io.cpp Outdated Show resolved Hide resolved
@HanatoK
Copy link
Member

HanatoK commented May 17, 2022

@giacomofiorin By the way, I see that VMD latest source code enables C++11 in the Makefile. Is it safe to use C++11 now? If so, we can use smart pointers to avoid the possible memory leaks.

@giacomofiorin
Copy link
Member Author

@giacomofiorin By the way, I see that VMD latest source code enables C++11 in the Makefile. Is it safe to use C++11 now? If so, we can use smart pointers to avoid the possible memory leaks.

That would be really nice! Probably also possible in some future. But not yet!!

The Lord of the VMD did add a CXX11 flag to appease us, but that flag is not on by default and he has not finished implementing it... You need something like these patches for it to work.

Sadly, until the final "official" VMD 1.9.4 builds are finally published, you should not take anything for granted regarding what VMD will support or not support.

@giacomofiorin giacomofiorin marked this pull request as ready for review June 21, 2022 19:12
@giacomofiorin
Copy link
Member Author

giacomofiorin commented Jun 22, 2022

How can I test the code path in colvarproxy_io.cpp?

@HanatoK As far as I understand your question, the code path is already exercised every time input files are being read, which indeed turned out a small inconsistency with how the output prefix was extracted in GROMACS vs. other codes (fixed in a08ccda).

Regarding using std::map for output streams, a different PR will follow later (this one is about bringing the input streams up to speed).

Do you see anything else that looks worth changing/fixing?

@HanatoK
Copy link
Member

HanatoK commented Jun 23, 2022

How can I test the code path in colvarproxy_io.cpp?

@HanatoK As far as I understand your question, the code path is already exercised every time input files are being read, which indeed turned out a small inconsistency with how the output prefix was extracted in GROMACS vs. other codes (fixed in a08ccda).

Regarding using std::map for output streams, a different PR will follow later (this one is about bringing the input streams up to speed).

Do you see anything else that looks worth changing/fixing?

Sorry for my late reply, but currently I really have no time to review the code before the end of June. I vaguely remember thant I cannot test the code in colvarproxy_io.cpp via NAMD, but I will confirm it if I have time.

@giacomofiorin
Copy link
Member Author

Sorry for my late reply, but currently I really have no time to review the code before the end of June. I vaguely remember thant I cannot test the code in colvarproxy_io.cpp via NAMD, but I will confirm it if I have time.

The input-streams code is currently used by all backends. It's the output streams that differ if you use NAMD, but then one can just run the LAMMPS or GROMACS tests.

If there is a specific input that you would like to be tested and is not tied to a specific engine, please send it along so that it can be added to the existing tests.

Comment on lines +718 to +734
error_code |= samples->read_multicol(samples_in_name,
"ABF samples file",
true);

is.open(gradients_in_name.c_str());
if (!is.is_open()) {
cvm::error("Error opening ABF gradient file " +
gradients_in_name + " for reading", COLVARS_INPUT_ERROR);
} else {
gradients->read_multicol(is, true);
is.close();
}
error_code |= gradients->read_multicol(gradients_in_name,
"ABF gradient file",
true);

if (b_CZAR_estimator) {
// Read eABF z-averaged data for CZAR
cvm::log("Reading z-histogram from " + z_samples_in_name + " and z-gradient from " + z_gradients_in_name);

is.clear();
is.open(z_samples_in_name.c_str());
if (!is.is_open()) cvm::error("Error opening eABF z-histogram file " + z_samples_in_name + " for reading");
z_samples->read_multicol(is, true);
is.close();
is.clear();

is.open(z_gradients_in_name.c_str());
if (!is.is_open()) cvm::error("Error opening eABF z-gradient file " + z_gradients_in_name + " for reading");
z_gradients->read_multicol(is, true);
is.close();
error_code |= z_samples->read_multicol(z_samples_in_name,
"eABF z-histogram file",
true);
error_code |= z_gradients->read_multicol(z_gradients_in_name,
"eABF z-gradient file",
true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splendid.

Copy link
Member

@jhenin jhenin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent changes! This abstraction really brought benefits already.

Copy link
Member

@HanatoK HanatoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

/// \brief Identifiers for output_stream objects: by default, these are the names of the files
std::list<std::string> output_stream_names;

std::map<std::string, std::istream *> input_streams_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK if you do not have time to unify the handling of std::istream* and std::ostream* at this time.

src/colvarproxy_io.cpp Show resolved Hide resolved
@giacomofiorin giacomofiorin added the maintenance No user-visible effects label Jun 29, 2022
@giacomofiorin giacomofiorin merged commit 914a4ee into master Jun 29, 2022
@giacomofiorin giacomofiorin deleted the input_streams branch September 13, 2022 14:59
giacomofiorin added a commit to Colvars/lammps that referenced this pull request May 17, 2023
This update consists exclusively of bugfixes or maintenance-related changes.

The following is a list of pull requests in the Colvars repository since the previous update to LAMMPS:

- 532 Add XYZ trajectory reading feature
  Colvars/colvars#532 (@jhenin, @giacomofiorin)

- 531 Delete objects quietly, unless explicitly requested via script (including VMD)
  Colvars/colvars#531 (@giacomofiorin)

- 530 Append newline to log and error messages if not already present
  Colvars/colvars#530 (@giacomofiorin)

- 528 Forward-declare OpenMP lock
  Colvars/colvars#528 (@giacomofiorin)

- 527 Remove unneeded STL container
  Colvars/colvars#527 (@giacomofiorin)

- 526 Allow collecting configuration files and strings before setting up interface
  Colvars/colvars#526 (@giacomofiorin, @jhenin)

- 523 Fallback to linearCombination when customFunction is missing in customColvar
  Colvars/colvars#523 (@HanatoK, @giacomofiorin)

- 522 Use iostream::fail() to check for I/O error
  Colvars/colvars#522 (@jhenin)

- 520 Fix ref count
  Colvars/colvars#520 (@giacomofiorin)

- 513 Set target temperature through a common code path
  Colvars/colvars#513 (@giacomofiorin, @jhenin)

- 509 Safer detection of Windows with recent Microsoft Visual Studio versions
  Colvars/colvars#509 (@akohlmey)

- 508 Update LAMMPS patching method to reflect Lepton availability
  Colvars/colvars#508 (@giacomofiorin)

- 497 Increase the precision of write_multicol
  Colvars/colvars#497 (@HanatoK)

- 496 Only perform MTS automatic enable/disable for timeStepFactor > 1
  Colvars/colvars#496 (@giacomofiorin)

- 493 Remove unused branch of quaternion input function
  Colvars/colvars#493 (@giacomofiorin)

- 489 Ensure there are spaces between the fields in the header
  Colvars/colvars#489 (@HanatoK)

- 487 Use map of output streams, and return references to its elements
  Colvars/colvars#487 (@giacomofiorin, @jhenin)

- 486 Remember first step of moving restraint
  Colvars/colvars#486 (@jhenin)

- 485 Add decoupling option for moving restraints
  Colvars/colvars#485 (@jhenin)

- 483 Update Lepton via patching procedure
  Colvars/colvars#483 (@giacomofiorin)

- 481 Make file-reading operations of input data abstractable
  Colvars/colvars#481 (@giacomofiorin)

Authors: @akohlmey, @giacomofiorin, @HanatoK, @jhenin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance No user-visible effects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants