Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing containers of bools failes to compile #490

Closed
tomeichlersmith opened this issue Dec 30, 2021 · 8 comments · Fixed by #654
Closed

Writing containers of bools failes to compile #490

tomeichlersmith opened this issue Dec 30, 2021 · 8 comments · Fixed by #654
Assignees

Comments

@tomeichlersmith
Copy link

Describe the bug
Suppose I generate a data set of bools that I store in std::vector<bool> because the program does not know at compile time how many entries there will be. I want to save this buffer in the same way that I store other one-dimensional buffers into the output file.

To Reproduce
I have written a small main file (copied in full below) based off of the simple create dataset example.

  1. Attempt to compile with g++ main.cxx -lhdf5
  2. See the compile error copied below. This compile error is generated because of the STL specialization of std::vector<bool> which modifies how std::vector::data behaves.

Expected behavior
A perfect solution for this issue would be a modification of the template deduction within HighFive to handle the specialization of vectros of bools. A satisfactory solution would be suggestions on how to accomplish my goal of writing a dataset of boolean values whose size is not known at compile time.

Desktop (please complete the following information):
I am doing my developments within a container. The full docker build context can be found here.

  • OS: ubuntu:18.04
  • g++ v7.5.0
  • HDF5 v1.12.1
  • HighFive v2.3.1

Additional context
The reason I want to do this is because I would like to develop a buffering mechanism for the type of data my program works with. My data is very highly chunked and so it is much more efficient to only make occasional writes even long after a specific piece of data is chosen to be stored into the output file. This buffering mechanism would need to handle all so-called "atomic" types.

Program Code

// main.cxx
#include <iostream>
#include <string>
#include <vector>

#include <highfive/H5DataSet.hpp>
#include <highfive/H5DataSpace.hpp>
#include <highfive/H5File.hpp>

const std::string FILE_NAME("create_dataset_example.h5");
const std::string DATASET_NAME("dset");

// Create a dataset name "dset" of double 4x6
int main(void) {
  using namespace HighFive;
  try {
    // Create a new file using the default property lists.
    File file(FILE_NAME, File::ReadWrite | File::Create | File::Truncate);

    // Define the size of our dataset: 2x6
    std::vector<size_t> dims{3};

    // Create the dataset
    DataSet dataset =
      file.createDataSet<bool>(DATASET_NAME, DataSpace(dims));

   /**
    * Using the C-style array instead of std::vector writes appropriately
    */
    std::vector<bool> data = {true, false, true};
    //bool data[3] = {true, false, true};

    // write it
    dataset.write(data);

  } catch (Exception& err) {
    // catch and print any HDF5 error
    std::cerr << err.what() << std::endl;
  }

  return 0; // successfully terminated
}

Compile Error Printout

In file included from /usr/include/highfive/bits/H5Attribute_misc.hpp:25:0,
                 from /usr/include/highfive/bits/H5Annotate_traits_misc.hpp:18,
                 from /usr/include/highfive/H5File.hpp:89,
                 from main.cxx:7:
/usr/include/highfive/bits/H5Converter_misc.hpp: In instantiation of 'const value_type* HighFive::details::container_converter<Container, T>::transform_write(const Container&) const [with Container = std::vector<bool>; T = bool; HighFive::details::container_converter<Container, T>::value_type = bool]':
/usr/include/highfive/bits/H5Slice_traits_misc.hpp:214:14:   required from 'void HighFive::SliceTraits<Derivate>::write(const T&) [with T = std::vector<bool>; Derivate = HighFive::DataSet]'
main.cxx:30:23:   required from here
/usr/include/highfive/bits/H5Converter_misc.hpp:181:25: error: passing 'const std::vector<bool>' as 'this' argument discards qualifiers [-fpermissive]
         return vec.data();
                         ^
In file included from /usr/include/c++/7/vector:65:0,
                 from main.cxx:3:
/usr/include/c++/7/bits/stl_bvector.h:920:5: note:   in call to 'void std::vector<bool, _Alloc>::data() [with _Alloc = std::allocator<bool>]'
     data() _GLIBCXX_NOEXCEPT { }
     ^~~~
In file included from /usr/include/highfive/bits/H5Attribute_misc.hpp:25:0,
                 from /usr/include/highfive/bits/H5Annotate_traits_misc.hpp:18,
                 from /usr/include/highfive/H5File.hpp:89,
                 from main.cxx:7:
/usr/include/highfive/bits/H5Converter_misc.hpp:181:25: error: void value not ignored as it ought to be
         return vec.data();
@tdegeus
Copy link
Collaborator

tdegeus commented Jan 17, 2022

If I'm not mistaken this is the limitation of std::vector<bool> that is a limited specialisation of std::vector : https://en.cppreference.com/w/cpp/container/vector_bool

Other bool containers (Eigen or xtensor) will probably work. As to if std::vector<bool> could be make to work, I don't really know, I spent to few time around the std::vector<bool> limitation.

@tomeichlersmith
Copy link
Author

tomeichlersmith commented Jan 17, 2022

I was able to get a std::vector<bool> operational by doing a translation step. In C++17, this is as simple as using a if constepxr within my read/writing code. I could imagine something similar occurring within HighFive; however, it is a pretty crappy solution.

My program is working for now, so this isn't a bit deal; however, it could be a small enhancement to HighFive if y'all are interested.

/**
 * this->set_ is the HighFive::DataSet that has already been created.
 * i_file_ keeps track of entry of the DataSet that should be read/written the next time the buffer is filled.
 */

// writing
      if constexpr (std::is_same_v<AtomicType, bool>) {
        // handle bool specialization
        std::vector<short> buff;
        buff.reserve(buffer_.size());
        for (const auto& v : buffer_) buff.push_back(v);
        this->set_.select({i_file_}, {buff.size()}).write(buff);
      } else {
        this->set_.select({i_file_}, {buffer_.size()}).write(buffer_);
      }

// reading
      if constexpr (std::is_same_v<AtomicType,bool>) {
        // get around std::vector<bool> specialization
        std::vector<short> buff;
        this->set_.select({i_file_}, {request_len}).read(buff);
        buffer_.reserve(buff.size());
        for (const auto& v : buff) buffer_.push_back(v);
      } else {
        this->set_.select({i_file_}, {request_len}).read(buffer_);
      }

https://github.com/LDMX-Software/fire/blob/8ac44b8e027eb3ec5888022663a8875b9b3fde6c/include/fire/h5/Writer.h#L117-L125
https://github.com/LDMX-Software/fire/blob/8ac44b8e027eb3ec5888022663a8875b9b3fde6c/include/fire/h5/Reader.h#L143-L151

@tdegeus
Copy link
Collaborator

tdegeus commented Jan 17, 2022

Thanks. It seems that you copy to a temporary right?
I was ambitiously going to say that there might be one-liners, but that I cannot figure out.

I'm not against having some SFINAE to catch this case (rather in favour: it simplifies things for the user).
The 'issue' with providing it is, in my opinion, that it suggests to do something that it does not do: directly saving the bools. So the user might be a bit surprised by the data-type of the DataSet, and by the fact that a temp. copy is used.

@tomeichlersmith
Copy link
Author

temporary right?

Yes, that is the hacky solution I've used and it works fine for our purposes.

The other solution that I've played around with is to implement a translation to a boolean enum that is what is stored in the file (this is what h5py does); however, this still doesn't resolve the issue of "surprising" the user with a difference between on-disk and in-memory data types. I think from a HighFive point of view, it may make sense to leave out this auto translation and just document how to handle boolean vectors in order to make sure the user knows what is happening with their data.

@tomeichlersmith
Copy link
Author

Update I've found a better method which is still a mostly hacky solution, but it persists the type that the user intends.

The basic idea is to copy the std::vector<bool> buffer into a std::unique_ptr<bool[]> buffer, then use the read/write "raw" methods which accept an array pointer and the data type. Array-type unique pointers have been available since C++11 so I think this is actually a viable solution for introduction into HighFive.

Again, here are code snippets showing the relevant translation steps during read/write.

/**
 * this->set_ is the HighFive::DataSet that has already been created.
 * i_file_ keeps track of entry of the DataSet that should be read/written the next time the buffer is filled.
 */

// writing
      if constexpr (std::is_same_v<AtomicType, bool>) {
        // handle bool specialization
        auto buff = std::make_unique<bool[]>(buffer_.size());
        for (std::size_t i{0}; i < buffer_.size(); i++)
          buff[i] = buffer_.at(i);
        this->set_.select({i_file_}, {buff.size()})
          .write(buff.get(), HighFive::AtomicType<bool>());
      } else {
        this->set_.select({i_file_}, {buffer_.size()}).write(buffer_);
      }

// reading
      if constexpr (std::is_same_v<AtomicType,bool>) {
        // get around std::vector<bool> specialization
        auto buff = std::make_unique<bool[]>(request_len);
        this->set_.select({i_file_}, {request_len}).read(buff.get(),HighFive::AtomicType<bool>());
        buffer_.reserve(request_len);
        for (std::size_t i{0}; i < request_len; i++)
          buffer_.push_back(buff[i]);
      } else {
        this->set_.select({i_file_}, {request_len}).read(buffer_);
      }

@tdegeus
Copy link
Collaborator

tdegeus commented Jan 27, 2022

I see. This is indeed better I think as it writes bools as bools (unlike you previous solution).

When I was just searching, I found lots of complaints about std::vector<bool> but very few standard workarounds. I'm just wondering: they should exist right? Many people have presumable encountered this?

@tomeichlersmith
Copy link
Author

tomeichlersmith commented Jan 27, 2022

My understanding is that each compiler is free to implement a memory-efficient specialization of std::vector<bool> and so the standard sets a lot fewer requirements upon it. I think our use case for std::vector<bool> as an interface to eventual serialization is a special one - many use cases desire a much more memory efficient implementation so that you can treat std::vector<bool> as effectively a huge, resizable array of bits.

All of my research basically points to using std::array<bool,N> if the size is known at compile time and std::unique_ptr<bool[]> as a last resort for if the size can only be known at run time. Both of these are just C++-style, safe wrappers around C-style static bool arr[N] and dynamic bool* arr = new bool[size] arrays and so neither of them have a bool specialization that would interfere with eventual serialization.

@alkino
Copy link
Member

alkino commented Jan 20, 2023

Should be fixed by #654

@alkino alkino linked a pull request Jan 20, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants