Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for efficient serialization #36

Merged
merged 5 commits into from Aug 23, 2020
Merged

Add support for efficient serialization #36

merged 5 commits into from Aug 23, 2020

Conversation

Tessil
Copy link
Owner

@Tessil Tessil commented Aug 16, 2020

This PR adds two new methods template<class Serializer> void serialize(Serializer& serializer) const; and template<class Deserializer> static sparse_map deserialize(Deserializer& deserializer, bool hash_compatible = false); to tsl::robin_(pg)_map/set.

They take advantage of the internals of the data structure to provide an efficient way to serialize/deserialize the structure.

Fix feature request #28.

@wjakob
Copy link
Contributor

wjakob commented Aug 16, 2020

Just a minor suggestion (feel free to ignore, this is your project after all): this adds quite a bit of code to the header files to add a feature that may not be that relevant to most users. If there was a way to implement this an opt-in header #include <tsl/serialization.h> or similar, that would be even better.

@Tessil
Copy link
Owner Author

Tessil commented Aug 16, 2020

To avoid the extra code in the main header file I could either implement the (de)serialization as a free-function in a separate header or add an ifdef flag.

As the serialization process is implemented as a member function in all the others tsl libraries, I'd like to keep the API consistent among the libraries. I'm also not too keen to add an extra flag.

I have to see a bit how I could reduce the compilation speed and binary size of the library as it effectively starts to get a bit bloated.

@wjakob
Copy link
Contributor

wjakob commented Aug 16, 2020

One option could also be to forward-declare the member function and provide the implementation outside of the class definition in an extra header.

@wjakob
Copy link
Contributor

wjakob commented Aug 16, 2020

Thanks for this project btw, I love it!

@Tessil
Copy link
Owner Author

Tessil commented Aug 16, 2020

Forward declaring the member function is effectively a possibility, but it may lead to some confusion for the users as they will get linking errors when using serialize if they forget to include the extra header. I'd prefer that the headers include all what they need when calling function inside them, any other way would be intuitive and non-conventional.

@Tessil Tessil merged commit dc2023b into master Aug 23, 2020
@Tessil
Copy link
Owner Author

Tessil commented Aug 23, 2020

In the end I merged the changes as they are. Using free functions to serialize/deserialize would be a good option too, but I want to stay coherent with the other tsl data-structures which implement these functions inside the class (and I don't want to break the interface of the data-structures). The added code is not too much in itself (~150 lines).

I'll see if I can reduce a bit the bloat of the header file in other ways.

@Tessil Tessil deleted the serialization branch August 23, 2020 19:14
@Sonolil
Copy link

Sonolil commented May 5, 2021

I'm afraid it doesn't match quite well with cereal++'s interface. There is no particular deserialize method in cereal. I attempted to make a custom wrapper but I stopped once I realized there's no easy way to work with nested serialization.

On top of that, the method name 'serialize' (with all its template and parameters) clashes with cereal's intended usage. It's reserved, and inteferes with writing cereal's 'save/load' methods since cereal does not allow duplicate serialize methods. I ultimately had to rewrite the source code concerning the interface while keeping your serialization implementation intact.

reference: https://uscilab.github.io/cereal/serialization_functions.html

It's not urgent, it's solved as far as I'm concerned, but I thought you should still know about this,

@Tessil
Copy link
Owner Author

Tessil commented May 6, 2021

Hi,

You can resolve the name clash with CEREAL_SPECIALIZE_FOR_ALL_ARCHIVES or cereal::specialize, you don't need to modify the source code of the hash map. When I designed the interface for the serialization I mainly tested with Boost Serialization while trying to be as flexible as possible for any other other serializer.

Code example similar to https://github.com/Tessil/robin-map#serialization-with-boost-serialization-and-compression-with-zlib, don't forget to include cereal/types/utility.hpp for std::pair serialization:

#include <tsl/robin_map.h>

#include <cassert>
#include <cereal/archives/binary.hpp>
#include <cereal/types/utility.hpp>
#include <fstream>

namespace cereal {

template <class Archive, class Key, class T>
struct specialize<Archive, tsl::robin_map<Key, T>,
                  cereal::specialization::non_member_load_save> {};

template <class Archive, class Key, class T>
void save(Archive &ar, const tsl::robin_map<Key, T> &map) {
  auto serializer = [&ar](const auto &v) { ar &v; };
  map.serialize(serializer);
}

template <class Archive, class Key, class T>
void load(Archive &ar, tsl::robin_map<Key, T> &map) {
  auto deserializer = [&ar]<typename U>() {
    U u;
    ar &u;
    return u;
  };
  map = tsl::robin_map<Key, T>::deserialize(deserializer);
}
}  // namespace cereal

int main() {
  tsl::robin_map<std::int64_t, std::int64_t> map = {
      {1, -1}, {2, -2}, {3, -3}, {4, -4}};

  const char *file_name = "robin_map.data";
  {
    std::ofstream ofs(file_name, std::ios::binary);
    cereal::BinaryOutputArchive ar(ofs);
    ar << map;
  }

  {
    std::ifstream ifs(file_name, std::ios::binary);
    cereal::BinaryInputArchive ar(ifs);
    
    tsl::robin_map<std::int64_t, std::int64_t> map_deserialized;
    ar >> map_deserialized;

    assert(map == map_deserialized);
  }
}

@Sonolil
Copy link

Sonolil commented May 16, 2021

Oh right, I can use the external load/save functions. Thank you, this didn't occur to me, not to mention the lambda parts go over my head a little ( what does ar &v do? ). Perhaps you should update this to readme as an example.

--edit--
I just tested it out and it works really well. I assume ar& v is just another serialize operator. The templated lambda seems to require c++20 though so I made functors instead. Thank you for the prompt response, I did not expect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants