Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New attempt for HDF5 support using HighFive #41

Merged
merged 1 commit into from Nov 24, 2018

Conversation

tdegeus
Copy link
Member

@tdegeus tdegeus commented Nov 9, 2018

Hereby my attempt for a cleaner HighFive support. It allows the following syntax:

#include <xtensor-io/xhighfive.hpp>

int main()
{
  HighFive::File file("example.h5", HighFive::File::Overwrite);

  xt::xtensor<double,2> A = xt::ones<double>({10,5});

  xt::dump(file, "/path/to/data", A);

  xt::xtensor<double,2> B = xt::cast<double,2>(file, "/path/to/data");

  return 0;
}

What do you guys think?

Note that:

  • I still want to add some overwrite function to overwrite an existing data-set (of the correct shape/type) with new data.

  • Ideally I want to add some user friendliness, like having dump also work for example for std::string, that I miss in HighFive, but I could also construct my own wrapper for that.

  • I believe that the recursive createGroup function should really be in the HighFive library. I have openend an issue there: Recursively create groups BlueBrain/HighFive#139

@tdegeus
Copy link
Member Author

tdegeus commented Nov 10, 2018

P.S. As I missed some functions to smoothen HighFive, I have made some wrapper functions:

@wolfv
Copy link
Member

wolfv commented Nov 10, 2018

do you want to integrate these functions in xtensor-io or rather upstream in HighFive?

Nice name, btw!

@tdegeus
Copy link
Member Author

tdegeus commented Nov 11, 2018

Well I don't know.

On first sight, I find the one-line syntax very much in the line of xtensor, and a bit less with HighFive, as HighFive appears to focus on keeping the full flexibility of HDF5. On the other hand, I believe that h5py features both. In any case, I find the one-line syntax vital to get readable main-files. So I strongly believe that such syntax should be available.

Maybe you can share your thoughts?

Note that the namespaces and the syntax is not in set in stone, neither of the PR nor of LowFive. We can discuss about everything.

@tdegeus
Copy link
Member Author

tdegeus commented Nov 17, 2018

I have upgraded the PR significantly. I will carefully review the code again tomorrow or so, but it would be great to have general feedback. The syntax is now:

#include <xtensor-io/xhighfive.hpp>

int main()
{
  HighFive::File file("example.h5", HighFive::File::Overwrite);

  xt::xtensor<double,2> A = xt::ones<double>({10,5});

  xt::dump(file, "/path/to/data", A);

  xt::overwrite(file, "/path/to/data", A);

  xt::xtensor<double,2> B = xt::load<xt::xtensor<double,2>>(file, "/path/to/data");

  return 0;
}

whereby dump, overwrite, and load feature many overloads.

As discussed with @wolfv in private, I will also submit these functions to be integrated upstream. They will stay also in xtensor-io to:

  1. Have full control over rapid upgrades.
  2. Allow for a unified syntax.
  3. Allow more dedicated xtensor support. Therefore your help is wanted! dump could feature more overloads, such as that it would be awesome to be able to write xt::views.

The more dedicated xtensor support will be exclusive to xtensor-io. I will only try to integrate basic xt::xtensor and xt::xarray support.

Beyond code review, to-do are:

  1. Documentation.
  2. (To be discussed) Tests. Here help is wanted, in so far to handle the HighFive header-only library and linking against hdf5 in CI.

@wolfv
Copy link
Member

wolfv commented Nov 18, 2018

Regarding testing: We ship + download all dependencies on conda. I can make a HighFive package for conda and we can first put it on the quantstack channel, but in the long term it should go to conda-forge !

@JohanMabille
Copy link
Member

Nice PR!

Regarding the interface, as discussed with @wolfv, and to summarize what have been told on gitter channel, there should be a clean separation between pure xtensor API and lower level API making use of HighFive:

namespace xt
{
    // Not specific to HDF5, can be used by functions writing other file formats,
    // so that should be in a separate header
    enum class dump_mode
    {
        create,
        overwrite
    }

    template <class T>
    void dump_hdf5(const std::string& file_path,
                   const std::string& data_path, 
                   const T& data,
                   dump_mode mode);

    template <class T>
    T load_hdf5(cons std::string& file_path, const std::string& data_path);
}

These functions forward to the dump / overwrite / load functions that accept an HighFive::File format that you have coded. @wolfv suggested these functions could be in their own namespace (xt::hdf5) to reinforce the separation between pure xtensor API and mixed API, I am not really opiniated about that.

@tdegeus
Copy link
Member Author

tdegeus commented Nov 19, 2018

The dump_hdf5 function should have one more argument I think: the mode with which to open the file. What would be the desired syntax for that?

I'm not strongly opinionated about the xt::hdf5 syntax neither, though I don't think you win anything as the HighFive functions are always distinguishable through their arguments. If you introduce a namespace though I feel that it should be HighFive as the primary interaction is through that library.

@JohanMabille @wolf

@JohanMabille
Copy link
Member

The dump_hdf5 function should have one more argument I think: the mode with which to open the file. What would be the desired syntax for that?

You mean so one can replace the entire file and not only the path to the data inside? That's what dump_mode is supposed to do, I forgot the mode to overwrite the data inside. That can be another enum specific to hdf5 (like hdf5_mode with the same values as the ones of dump_mode).

@tdegeus
Copy link
Member Author

tdegeus commented Nov 20, 2018

@JohanMabille Overwriting a DataSet may not be specific to HDF5, e.g. the same operation is perceivable for an NPZ-file (though I'm not an expert, so I don't know if it is actually possible). So I would argue for

  • file_mode: how to open the file.
  • dump_mode: how to handle existing DataSet.

@JohanMabille
Copy link
Member

Sounds perfect for me.

@tdegeus
Copy link
Member Author

tdegeus commented Nov 20, 2018

@JohanMabille @wolfv

Please find as I believe all discussed updates + documentation + tests.

I do still need help to complete docs/source/api_reference.rst with all the overloads.

@wolfv
Copy link
Member

wolfv commented Nov 20, 2018

Did you already add highfive to the conda install directive in .travis.yml?

We could try to make tests pass on Travis/Linux by adding highfive here: https://github.com/QuantStack/xtensor-io/blob/master/.travis.yml#L39

@wolfv
Copy link
Member

wolfv commented Nov 20, 2018

Ok, now i am hopeful that the tests will be passed.

Copy link
Member

@wolfv wolfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff in this PR :)

Some nitpicking follows:

For function arguments we usually write references like so: HighFive::File& file (e.g. & right after type, then space).
We put spaces around operators (e.g. i + j)

And I just asked for a bit of clarification with the scalar load interface.

Let's merge this soon!

CMakeLists.txt Outdated Show resolved Hide resolved
docs/source/api_reference.rst Show resolved Hide resolved
docs/source/xhighfive.rst Outdated Show resolved Hide resolved
docs/source/xhighfive.rst Outdated Show resolved Hide resolved
include/xtensor-io/xhighfive.hpp Outdated Show resolved Hide resolved
include/xtensor-io/xhighfive.hpp Show resolved Hide resolved
include/xtensor-io/xhighfive.hpp Show resolved Hide resolved
include/xtensor-io/xhighfive.hpp Outdated Show resolved Hide resolved
test/test_xhighfive.cpp Show resolved Hide resolved
@wolfv
Copy link
Member

wolfv commented Nov 22, 2018

btw feel free to squash my commits

@tdegeus
Copy link
Member Author

tdegeus commented Nov 22, 2018 via email

@wolfv
Copy link
Member

wolfv commented Nov 22, 2018

yep, that would be nice. Or at least reduce the number of commits a bit so we have a clean history.

@wolfv
Copy link
Member

wolfv commented Nov 23, 2018 via email

@tdegeus
Copy link
Member Author

tdegeus commented Nov 23, 2018

I'll push a complete version for you to double check. I think I have everything now.

@tdegeus
Copy link
Member Author

tdegeus commented Nov 23, 2018

It's there for you to check!

@tdegeus
Copy link
Member Author

tdegeus commented Nov 23, 2018

@wolfv Maybe you can add a conda install/compile strategy to the documentation?

@wolfv
Copy link
Member

wolfv commented Nov 24, 2018

Let's merge for now!

I'll try to find some time next week to make this interface also take an xexpression (as discussed).

@wolfv wolfv merged commit 5a19dc9 into xtensor-stack:master Nov 24, 2018
@wolfv
Copy link
Member

wolfv commented Nov 24, 2018

And then we should cut a new release soon

@tdegeus tdegeus deleted the highfive branch November 24, 2018 16:42
@tdegeus
Copy link
Member Author

tdegeus commented Nov 24, 2018

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants