Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflection_table std::string messagepack writing is completely broken #1858

Closed
ndevenish opened this issue Aug 19, 2021 · 6 comments · Fixed by cctbx/cctbx_project#822
Closed
Assignees

Comments

@ndevenish
Copy link
Member

It supports writing of std::string arrays

} else if (name == "std::string") {
v = extract<std::string>(o.via.array.ptr[1]);

but does so in a broken way; It uses the same "ref/versa" writing as all the standard objects:
std::size_t binary_size = num_elements * element_size;
o.pack_bin(binary_size);
o.pack_bin_body(reinterpret_cast<const char*>(&v[0]), binary_size);

This means that it writes to the messagepack file, an area of memory n_items * sizeof(std::string) starting from the &const_ref<std::string>[0]; which is not how std::strings work; e.g. this will definitely contain the first string (but not read it), and if you are lucky this will contain some of the subsequent strings, but basically you are reading random memory.

I don't know if this functionality was ever used for writing strings in the old .pickle format, but it's broken here and we've obviously not used it. Are there any examples of us wanting/needing to write string columns in the reflection tables?

If not, my suggestion is that we just drop the versa<std::string> support.

@phyy-nx
Copy link
Member

phyy-nx commented Aug 23, 2021

We use strings in reflection tables in cctbx.xfel.merge:
https://github.com/cctbx/cctbx_project/blob/master/xfel/merging/application/input/file_loader.py#L121
https://github.com/cctbx/cctbx_project/blob/master/xfel/merging/application/input/file_loader.py#L146

This is mostly done in memory, but when we write the data back to disc, we currently have an old as_pickle call. Oddly enough, it's on my list to swap that out to an as_file call to start using msgpack, but I hadn't tested it yet.

@ndevenish
Copy link
Member Author

I think it's relatively easy to fix, just have a custom packer (like Shoeboxes) do. It's good to have an example of it being used, it means it's worth fixing vs throwing out.

@ndevenish
Copy link
Member Author

Some re-discussion of this issue happened in cctbx/dxtbx#465

@ndevenish ndevenish changed the title reflection_table std::string messagepack writing appears to be completely broken reflection_table std::string messagepack writing ~appears to be~ is completely broken Jan 19, 2022
@ndevenish ndevenish changed the title reflection_table std::string messagepack writing ~appears to be~ is completely broken reflection_table std::string messagepack writing ~~appears to be~~ is completely broken Jan 19, 2022
@ndevenish ndevenish changed the title reflection_table std::string messagepack writing ~~appears to be~~ is completely broken reflection_table std::string messagepack writing is completely broken Jan 19, 2022
graeme-winter added a commit to graeme-winter/dials that referenced this issue Jan 19, 2022
Working on dials#1858

This could never have worked. Now trying to make something which does work.
This does not work. This is at least some stubs / breadcrumbs on how it
could work. Turns out to be hard though
@graeme-winter
Copy link
Contributor

Thanks @graeme-winter for looking at this. Good set of clues. At the moment I'm inclined to look into whether we can do away with the exp_id column in the cctbx.xfel.merge reflection tables, and instead use the usual size_t id column alongside the experiment id<->experiment identifier map. The exp_id column was developed more or less at the same time as the map, and both solved similar problems for tracking experiment identifiers. I'll look into how big of a change set that would be. Wouldn't fix this or dials/dials#1858, but it would solve our merging problem.

@phyy-nx that is probably a better solution in the long game - I was wondering why you would keep strings in a reflection table.

With that in mind: I am tempted to simply remove support for that (in the messagepack file, at least) rather than having the appearance of support. A "fix" for this is possible but annoying.

@graeme-winter
Copy link
Contributor

😱

in trying to remove this I find that the packing of strings is part of how the general packing works -> also means if we touch this it is likely to break everything

Now working out of I can simply unlink...

@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. The label will be removed automatically if any activity occurs. Thank you for your contributions.

@stale stale bot added the stale issues that have not seen activity for a while label Jul 31, 2022
@phyy-nx phyy-nx self-assigned this Oct 14, 2022
@stale stale bot removed the stale issues that have not seen activity for a while label Oct 14, 2022
phyy-nx added a commit to cctbx/cctbx_project that referenced this issue Nov 17, 2022
Closes dials/dials#1858.

Background: exp_id is a column of experiment identifiers, originally developed in parallel with the experiment_identifier map in DIALS. Now that the experiment_identifier map is standard in DIALS, exp_id is redundant, so this commit removes it.

Note this commit makes heavy use of two convenience methods in DIALS:

- concat: takes two or more reflection tables and concatenates them together, renumbering the 'id' column and taking care of the experiment_identifer map
- reset_ids: resets the 'id' column, accounting for any gaps. Useful after using select to remove reflections from a table.

We can now use message_pack instead of pickle when saving refl tables.
Oeffner pushed a commit to cctbx/cctbx_project that referenced this issue Dec 1, 2022
Closes dials/dials#1858.

Background: exp_id is a column of experiment identifiers, originally developed in parallel with the experiment_identifier map in DIALS. Now that the experiment_identifier map is standard in DIALS, exp_id is redundant, so this commit removes it.

Note this commit makes heavy use of two convenience methods in DIALS:

- concat: takes two or more reflection tables and concatenates them together, renumbering the 'id' column and taking care of the experiment_identifer map
- reset_ids: resets the 'id' column, accounting for any gaps. Useful after using select to remove reflections from a table.

We can now use message_pack instead of pickle when saving refl tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants