Make file-reading operations of input data abstractable #481

giacomofiorin · 2022-05-16T16:36:27Z

This change is similar to what done earlier for std::ofstream (whose implementation was not working correctly years ago for some platform that NAMD was running on at the time).

This PR removes explicit uses of std::ifstream throughout most of the code, using instead its base class. Initialization and maintenance of a std::istream object is done by 2-3 additional proxy functions, which also take care of error checking, leading to significant consolidation of duplicated code. The only remaining exceptions are for use cases where it is clear that the input is read from files that was written by Colvars itself as the output of a simulation.

In the current design, stream objects are proxy members, and therefore access to them is restricted to the first thread only. There may be a need to expand this in the future, but I do not see it at the moment.

@HanatoK In the neural network class I removed a couple of exceptions, since the corresponding error check is done by code that should support VMD, but it should be okay to reintroduce them later.

src/colvar_UIestimator.h

jhenin · 2022-05-16T16:49:56Z

src/colvar_neuralnetworkcompute.cpp

+    std::istream *ifs_biases = proxy->input_stream(biases_file, "biases file");
+    if (ifs_biases == NULL) {
+        return;


My understanding is that here the return value is a pointer solely to be able to track errors through null pointers. Instead of a pointer, could input_stream() return a reference to an istream that is marked as bad (stream.clear(ios::badbit)) in case of error? This would preserve more of the syntax on the caller side.

I'd like to be able to distinguish when the file is missing vs. corrupted/unreadable. In most of the current use cases, the distinction is not made, but I think that a few could use it.

I mentioned only one, but, there are 3 error bits that can be used: eofbit, failbit and badbit.
https://www.cplusplus.com/reference/ios/ios_base/iostate/

Ok, but that means that the entire framework always needs to rely on ifstream after all... You can't explicitly allocate an istream object.

I think istream also supports being converted to bool for checking these bits (https://en.cppreference.com/w/cpp/io/basic_ios/operator_bool).

Ok, but that means that the entire framework always needs to rely on ifstream after all... You can't explicitly allocate an istream object.

Can't you allocate an istringstream and return an istream reference to it?

@jhenin You came back with this request three times, which means that you feel very strongly about this design choice. I took a stab at it for a bit, but stumbled with two additional complications.

Returning a reference to an invalid object means that I could not initialize it right in the return statement, and I had to allocate a specific object just to return an invalid one. This object is return to threads > 0 if there is a bug, which I hope means that the code crashes before multiple threads write to the same memory. In the back of my mind, I'm still concerned about deadlocks.

References cannot be reassigned, so those cases where you are reusing the same stream object become more complex, not simpler.

I will be busy for several days, so I put the work in progress in this branch and putting it back on draft status. At least the memory leaks should be addressed.

HanatoK

How can I test the code path in colvarproxy_io.cpp?

HanatoK · 2022-05-16T19:03:19Z

src/colvarproxy_io.h

+  /// \brief Identifiers for output_stream objects: by default, these are the names of the files
+  std::list<std::string>    output_stream_names;
+
+  std::map<std::string, std::istream *> input_streams_;


Why are only input streams stored in std::map, while output streams are stored separately in two std::lists?

The latter is much older, when we couldn't be sure that std::map would be available on so many exotic platforms. I plan on converting the latter to std::map as well.

I am OK if you do not have time to unify the handling of std::istream* and std::ostream* at this time.

src/colvarproxy_io.cpp

HanatoK · 2022-05-17T19:01:08Z

@giacomofiorin By the way, I see that VMD latest source code enables C++11 in the Makefile. Is it safe to use C++11 now? If so, we can use smart pointers to avoid the possible memory leaks.

giacomofiorin · 2022-05-17T19:42:00Z

@giacomofiorin By the way, I see that VMD latest source code enables C++11 in the Makefile. Is it safe to use C++11 now? If so, we can use smart pointers to avoid the possible memory leaks.

That would be really nice! Probably also possible in some future. But not yet!!

The Lord of the VMD did add a CXX11 flag to appease us, but that flag is not on by default and he has not finished implementing it... You need something like these patches for it to work.

Sadly, until the final "official" VMD 1.9.4 builds are finally published, you should not take anything for granted regarding what VMD will support or not support.

In these two instances the "master" was clearly an entity that *controls* others, not one that *shares knowledge* with them. Removing both.

Adding a new header file to include the definition of the most complex template member functions; part of a planned tidy up of colvargrid.h that would eventually lead to including fewer headers throughout the code.

…in comment

giacomofiorin · 2022-06-22T11:19:14Z

How can I test the code path in colvarproxy_io.cpp?

@HanatoK As far as I understand your question, the code path is already exercised every time input files are being read, which indeed turned out a small inconsistency with how the output prefix was extracted in GROMACS vs. other codes (fixed in a08ccda).

Regarding using std::map for output streams, a different PR will follow later (this one is about bringing the input streams up to speed).

Do you see anything else that looks worth changing/fixing?

HanatoK · 2022-06-23T18:57:29Z

How can I test the code path in colvarproxy_io.cpp?

@HanatoK As far as I understand your question, the code path is already exercised every time input files are being read, which indeed turned out a small inconsistency with how the output prefix was extracted in GROMACS vs. other codes (fixed in a08ccda).

Regarding using std::map for output streams, a different PR will follow later (this one is about bringing the input streams up to speed).

Do you see anything else that looks worth changing/fixing?

Sorry for my late reply, but currently I really have no time to review the code before the end of June. I vaguely remember thant I cannot test the code in colvarproxy_io.cpp via NAMD, but I will confirm it if I have time.

giacomofiorin · 2022-06-23T19:01:11Z

Sorry for my late reply, but currently I really have no time to review the code before the end of June. I vaguely remember thant I cannot test the code in colvarproxy_io.cpp via NAMD, but I will confirm it if I have time.

The input-streams code is currently used by all backends. It's the output streams that differ if you use NAMD, but then one can just run the LAMMPS or GROMACS tests.

If there is a specific input that you would like to be tested and is not tied to a specific engine, please send it along so that it can be added to the existing tests.

jhenin · 2022-06-29T09:03:33Z

src/colvarbias_abf.cpp

+    error_code |= samples->read_multicol(samples_in_name,
+                                         "ABF samples file",
+                                         true);

-    is.open(gradients_in_name.c_str());
-    if (!is.is_open()) {
-      cvm::error("Error opening ABF gradient file " +
-                 gradients_in_name + " for reading", COLVARS_INPUT_ERROR);
-    } else {
-      gradients->read_multicol(is, true);
-      is.close();
-    }
+    error_code |= gradients->read_multicol(gradients_in_name,
+                                           "ABF gradient file",
+                                           true);

    if (b_CZAR_estimator) {
      // Read eABF z-averaged data for CZAR
      cvm::log("Reading z-histogram from " + z_samples_in_name + " and z-gradient from " + z_gradients_in_name);
-
-      is.clear();
-      is.open(z_samples_in_name.c_str());
-      if (!is.is_open())  cvm::error("Error opening eABF z-histogram file " + z_samples_in_name + " for reading");
-      z_samples->read_multicol(is, true);
-      is.close();
-      is.clear();
-
-      is.open(z_gradients_in_name.c_str());
-      if (!is.is_open())  cvm::error("Error opening eABF z-gradient file " + z_gradients_in_name + " for reading");
-      z_gradients->read_multicol(is, true);
-      is.close();
+      error_code |= z_samples->read_multicol(z_samples_in_name,
+                                             "eABF z-histogram file",
+                                             true);
+      error_code |= z_gradients->read_multicol(z_gradients_in_name,
+                                               "eABF z-gradient file",
+                                               true);


jhenin

Excellent changes! This abstraction really brought benefits already.

HanatoK

Looks good to me!

HanatoK · 2022-06-29T16:00:36Z

src/colvarproxy_io.h

+  /// \brief Identifiers for output_stream objects: by default, these are the names of the files
+  std::list<std::string>    output_stream_names;
+
+  std::map<std::string, std::istream *> input_streams_;


I am OK if you do not have time to unify the handling of std::istream* and std::ostream* at this time.

src/colvarproxy_io.cpp

@jhenin

This update consists exclusively of bugfixes or maintenance-related changes. The following is a list of pull requests in the Colvars repository since the previous update to LAMMPS: - 532 Add XYZ trajectory reading feature Colvars/colvars#532 (@jhenin, @giacomofiorin) - 531 Delete objects quietly, unless explicitly requested via script (including VMD) Colvars/colvars#531 (@giacomofiorin) - 530 Append newline to log and error messages if not already present Colvars/colvars#530 (@giacomofiorin) - 528 Forward-declare OpenMP lock Colvars/colvars#528 (@giacomofiorin) - 527 Remove unneeded STL container Colvars/colvars#527 (@giacomofiorin) - 526 Allow collecting configuration files and strings before setting up interface Colvars/colvars#526 (@giacomofiorin, @jhenin) - 523 Fallback to linearCombination when customFunction is missing in customColvar Colvars/colvars#523 (@HanatoK, @giacomofiorin) - 522 Use iostream::fail() to check for I/O error Colvars/colvars#522 (@jhenin) - 520 Fix ref count Colvars/colvars#520 (@giacomofiorin) - 513 Set target temperature through a common code path Colvars/colvars#513 (@giacomofiorin, @jhenin) - 509 Safer detection of Windows with recent Microsoft Visual Studio versions Colvars/colvars#509 (@akohlmey) - 508 Update LAMMPS patching method to reflect Lepton availability Colvars/colvars#508 (@giacomofiorin) - 497 Increase the precision of write_multicol Colvars/colvars#497 (@HanatoK) - 496 Only perform MTS automatic enable/disable for timeStepFactor > 1 Colvars/colvars#496 (@giacomofiorin) - 493 Remove unused branch of quaternion input function Colvars/colvars#493 (@giacomofiorin) - 489 Ensure there are spaces between the fields in the header Colvars/colvars#489 (@HanatoK) - 487 Use map of output streams, and return references to its elements Colvars/colvars#487 (@giacomofiorin, @jhenin) - 486 Remember first step of moving restraint Colvars/colvars#486 (@jhenin) - 485 Add decoupling option for moving restraints Colvars/colvars#485 (@jhenin) - 483 Update Lepton via patching procedure Colvars/colvars#483 (@giacomofiorin) - 481 Make file-reading operations of input data abstractable Colvars/colvars#481 (@giacomofiorin) Authors: @akohlmey, @giacomofiorin, @HanatoK, @jhenin

jhenin reviewed May 16, 2022

View reviewed changes

src/colvar_UIestimator.h Outdated Show resolved Hide resolved

giacomofiorin marked this pull request as draft May 16, 2022 16:44

jhenin reviewed May 16, 2022

View reviewed changes

giacomofiorin force-pushed the input_streams branch from 81a4841 to 9a73b74 Compare May 16, 2022 18:39

giacomofiorin marked this pull request as ready for review May 16, 2022 18:39

HanatoK requested changes May 16, 2022

View reviewed changes

giacomofiorin force-pushed the input_streams branch from 4737c91 to 67427ce Compare May 16, 2022 21:02

giacomofiorin marked this pull request as draft May 17, 2022 14:39

giacomofiorin added 19 commits June 21, 2022 12:33

Remove unneeded include

02241b9

Remove two instances of "master" with clearly non-neutral meaning

0766e83

In these two instances the "master" was clearly an entity that *controls* others, not one that *shares knowledge* with them. Removing both.

Move colvarproxy_io to separate file

291cd9c

Use more sensible default values

55bea14

Use stubs to allow putting class members together more logically

72e5657

Remove unused function

2eb74cb

Only run Tcl unittest when building with Tcl

1d359a3

Small include tidyup

706ed10

Update (legacy) build recipes

04f9311

Leverage iosfwd whenever possible

0697e19

Initial implementation of input streams map

4929292

Allow simplified units in functional tests

2be9795

Read input files in binary mode

aa00881

Remove explicit use of std::ifstream from core classes

0a958d5

Define colvar_grid::read_multicol() wrapper to simplify I/O handling

6e75612

Adding a new header file to include the definition of the most complex template member functions; part of a planned tidy up of colvargrid.h that would eventually lead to including fewer headers throughout the code.

Remove explicit use of std::ifstream from derived classes

0550655

Fix wrong order of ops leading to restarts being read twice; explain …

97ab904

…in comment

Delete allocated object

2371b2f

Track exit codes at intermediate steps

5d52ac7

giacomofiorin force-pushed the input_streams branch from e3ea620 to 457a072 Compare June 21, 2022 19:04

giacomofiorin marked this pull request as ready for review June 21, 2022 19:12

Return references to streams instead of pointers

9149052

giacomofiorin force-pushed the input_streams branch from 457a072 to 9149052 Compare June 21, 2022 19:13

giacomofiorin added 3 commits June 21, 2022 18:50

More sensible position for parenthesis

80686bb

Simplify conditionals

82da8b7

Strip suffixes of input state files in GROMACS

a08ccda

giacomofiorin mentioned this pull request Jun 23, 2022

Finally remove support for appending to files? #484

Closed

giacomofiorin requested a review from jhenin June 28, 2022 15:53

jhenin reviewed Jun 29, 2022

View reviewed changes

jhenin approved these changes Jun 29, 2022

View reviewed changes

HanatoK approved these changes Jun 29, 2022

View reviewed changes

Close remaning open input streams

bb39565

giacomofiorin added the maintenance No user-visible effects label Jun 29, 2022

giacomofiorin merged commit 914a4ee into master Jun 29, 2022

giacomofiorin mentioned this pull request Jul 11, 2022

Use map of output streams, and return references to its elements #487

Merged

giacomofiorin deleted the input_streams branch September 13, 2022 14:59

giacomofiorin mentioned this pull request Apr 7, 2023

cv load with non-existent file fails silently #521

Closed

giacomofiorin mentioned this pull request May 17, 2023

Update Colvars library to version 2023-05-01 lammps/lammps#3783

Merged

8 tasks

giacomofiorin mentioned this pull request Jun 7, 2023

Close, but do not delete input streams #538

Merged

giacomofiorin mentioned this pull request Jun 15, 2023

Initialize input string streams alongside input file streams #543

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make file-reading operations of input data abstractable #481

Make file-reading operations of input data abstractable #481

giacomofiorin commented May 16, 2022

jhenin May 16, 2022

giacomofiorin May 16, 2022

jhenin May 16, 2022

giacomofiorin May 16, 2022

HanatoK May 16, 2022

jhenin May 17, 2022

giacomofiorin May 17, 2022 •

edited

HanatoK left a comment

HanatoK May 16, 2022

giacomofiorin May 16, 2022

HanatoK Jun 29, 2022

HanatoK commented May 17, 2022

giacomofiorin commented May 17, 2022

giacomofiorin commented Jun 22, 2022 •

edited

HanatoK commented Jun 23, 2022

giacomofiorin commented Jun 23, 2022

jhenin Jun 29, 2022

jhenin left a comment

HanatoK left a comment

HanatoK Jun 29, 2022

Make file-reading operations of input data abstractable #481

Make file-reading operations of input data abstractable #481

Conversation

giacomofiorin commented May 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giacomofiorin May 17, 2022 • edited

Choose a reason for hiding this comment

HanatoK left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HanatoK commented May 17, 2022

giacomofiorin commented May 17, 2022

giacomofiorin commented Jun 22, 2022 • edited

HanatoK commented Jun 23, 2022

giacomofiorin commented Jun 23, 2022

Choose a reason for hiding this comment

jhenin left a comment

Choose a reason for hiding this comment

HanatoK left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giacomofiorin May 17, 2022 •

edited

giacomofiorin commented Jun 22, 2022 •

edited