Skip to content

Conversation

@dagewa
Copy link
Member

@dagewa dagewa commented May 13, 2025

Third time's a charm?

Following @phyy-nx's suggestion, History is now attached to Experiment with a shared pointer, in the same way that the experimental models like Beam, Detector and so on are. This avoids the problem of losing history when it was part of the experiment list only (#811) and avoids excessive duplication of history data when there are lots of experiments (#814).

When an ExperimentList is saved, the histories of its component Experiments are consolidated. At the moment, that just means combined into a single History object and sorted by timestamp. If necessary, this consolidation could be made more aggressive, for example keeping only one entry per program. I'm not sure if this is necessary though.

Some DIALS changes are needed following this PR (but not too many). One issue is that the dials.import history line is thrown away by dials.index. Another thing to do is that ExperimentList.as_json now has a couple of extra options: history_as_integrated=False and history_as_scaled=False. Setting these to True adds a flag to the history line written on that file save to indicate that this was an integration job, or scaling, or both. Being explicit about that will make it much easier to extract the relevant lines for the MTZ history, for dials/dials#2861

With just this PR, a standard DIALS processing run for a single rotation data set ends up with scaled.expt containing a block like this

  "history": [
    "2025-05-13T13:32:38Z|dials.index|v3.22.dev0",
    "2025-05-13T13:34:35Z|dials.refine|v3.22.dev0",
    "2025-05-13T13:49:37Z|dials.integrate|v3.22.dev0",
    "2025-05-13T13:52:32Z|dials.scale|v3.22.dev0"
  ],

I will investigate what happens with more complex multi-experiment cases, like dials.stills_process and xia2.multiplex.

@dagewa
Copy link
Member Author

dagewa commented May 13, 2025

Results of @phyy-nx's timing test on my laptop. For this branch:

real	7m35.355s
user	7m53.642s
sys	0m14.482s

on main:

real	7m52.741s
user	8m10.124s
sys	0m14.804s

so, actually a bit quicker on this branch, but maybe that's a fluke. Anyway, the branch is not slow.

I think what would be interesting is to actually run a dials.stills_process / cctbx.xfel.merge processing job using this branch to see what you get. How many history lines are there in the final files? Is there a performance penalty?

I want to do similar tests for other multi-experiment use cases, such as xia2.multiplex

@phyy-nx
Copy link
Contributor

phyy-nx commented May 13, 2025

Thanks for running those tests.

I think what would be interesting is to actually run a dials.stills_process / cctbx.xfel.merge processing job using this branch to see what you get. How many history lines are there in the final files? Is there a performance penalty?

Are the history lines created automatically? I would assume the individual programs would need to add them.

As for how many lines in the final file, dials.stills_process writes its output by process number, so if you run using 20 processes you get 20x4 files (refined.expt, indexed.refl, integrated.expt/.refl). Each of those would have a single history object, likely created in the last step before writing the data (which isn't robust to a crash, but that's likely ok). So 20 history objects, one per file, likely each with a slightly different time stamp due to finishing independently.

So, question for the cctbx.xfel group, how does cctbx.xfel.merge reconcile these histories? The goal is in the final mtz file to have just two lines, the dials.stills_process line and the cctbx.xfel.merge line.

@dagewa
Copy link
Member Author

dagewa commented May 13, 2025

Are the history lines created automatically?

Yes, it's automatic and will work for other packages that use dxtbx, not just DIALS. The only thing that is not is adding the flag to indicate an integrated or scaled file. This flag is to make it easier to extract the interesting lines for MTZ export.

@dagewa
Copy link
Member Author

dagewa commented May 13, 2025

Regarding your other question, I was thinking about this too and I reckon just select the final (latest timestamp) dials.still_process line as the "integrated" entry and have cctbx.xfel.merge write its own "scaled" line (I appreciate the terminology integrated/scaled is not ideal here). I would probably do something similar with multiplex.

@phyy-nx
Copy link
Contributor

phyy-nx commented May 13, 2025

Final time stamp makes sense. This looks awesome. @yangha7 is gonna test this on the cctbx.xfel side. Thanks!

This is better as a method of the ExperimentList class, providing
an easier interface to history management. When you want to append a
new history line to each experiment, just call el.consolidate_histories(),
which gives you the now unique History object attached to each experiment
and append the line to that object.
@phyy-nx
Copy link
Contributor

phyy-nx commented May 14, 2025

Hey @dagewa, @yangha7 and I are getting this error:

$ libtbx.python -c "from dxtbx.model.beam import BeamFactory"
Traceback (most recent call last):
  File "/dev/shm/aaron/modules/dxtbx/src/dxtbx/model/beam.py", line 10, in <module>
    from ..dxtbx_model_ext import Beam, PolychromaticBeam, Probe
ModuleNotFoundError: No module named 'dxtbx.dxtbx_model_ext'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/dev/shm/aaron/modules/dxtbx/src/dxtbx/model/__init__.py", line 21, in <module>
    from dxtbx.model.beam import BeamFactory
  File "/dev/shm/aaron/modules/dxtbx/src/dxtbx/model/beam.py", line 12, in <module>
    from dxtbx_model_ext import Beam, PolychromaticBeam, Probe  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: /dev/shm/aaron/build/lib/dxtbx_model_ext.so: undefined symbol: _ZN5dxtbx5model12boost_python14export_historyEv

Anything you've seen before?

@dagewa
Copy link
Member Author

dagewa commented May 15, 2025

Hmm interesting. I tried to check the libtbx build but it actually failed for me much earlier (that is, the build failed), so I've only checked the cmake build. I'll have another look

@dagewa
Copy link
Member Author

dagewa commented May 15, 2025

My bad, I failed to put the line in SConscript to export the boost python stuff 🙄 Because of dials/dials#2926 I'm flying blind, but I think the last commit should have fixed it?

@phyy-nx
Copy link
Contributor

phyy-nx commented Jul 21, 2025

Hi @dagewa, this is working for my now. I ran test_stills_process and got dials.stills_process in the history. Very nice.

I also ran an xfel_regression test and the history for cctbx.xfel.merge looks like this:

"2025-07-20T16:16:34Z|__main__|v?"

I believe this is because cctbx.xfel.merge doesn't use entry points based on my read of your history generation code. What do you think about this diff?

$ git diff __init__.py
diff --git a/src/dxtbx/model/__init__.py b/src/dxtbx/model/__init__.py
index 8c87bdd7..403a08ca 100644
--- a/src/dxtbx/model/__init__.py
+++ b/src/dxtbx/model/__init__.py
@@ -817,6 +817,12 @@ class _experimentlist:
         except importlib.metadata.PackageNotFoundError:
             version = "v?"

+        if dispatcher in ['Unknown', '__main__']:
+            import libtbx.load_env
+            dispatcher = libtbx.env.dispatcher_name
+        if dispatcher in ['dials.python', 'libtbx.python']:
+            dispatcher = os.path.splitext(os.path.basename(stack[0].filename))[0]
+
         # Set the flags string for the history
         flags = []
         if history_as_integrated:

With that I get this history:

"2025-07-21T14:18:15Z|cctbx.xfel.merge|v?"

Which seems better, though the version string isn't great. The diff also accounts for running a custom script by putting the script name in.

@dagewa
Copy link
Member Author

dagewa commented Jul 22, 2025

Thanks for checking @phyy-nx. I thought I had handled that case, but clearly no. I'm surprised that __main__ appeared - I thought I was falling back to the python import path for the module at least. I will do some more testing. Definitely happy to use libtbx.env.dispatcher_name in environments where that is appropriate.

Is there a standard way to get a version number for cctbx.xfel? I think I should also have a fallback to module.__version__ in cases where that is appropriate (someone's external my_data_proc.py might have this).

- correctly get dispatcher names for libtbx builds
- get script name when run as main
@phyy-nx
Copy link
Contributor

phyy-nx commented Jul 22, 2025

No standard version for a dev build of any kind that I know about. cctbx.xfel (at present) isn't versioned regardless, something I've been thinking about in context of setting up a conda build for it.

@dagewa
Copy link
Member Author

dagewa commented Aug 11, 2025

@phyy-nx are you happy for this to go in? It gets the basic structure in place even if we tweak things like version number capturing a bit more later

@phyy-nx
Copy link
Contributor

phyy-nx commented Aug 11, 2025

Yep!

@dagewa dagewa enabled auto-merge (squash) August 11, 2025 21:37
@dagewa dagewa merged commit 43dfd36 into main Aug 11, 2025
12 of 13 checks passed
@dagewa dagewa deleted the history-as-experiment-object branch August 11, 2025 21:48
aaronfinke pushed a commit to aaronfinke/dxtbx that referenced this pull request Oct 7, 2025
…ctbx#816)

* Working towards History as an object that can be shared between experiments

* Make History pickleable

* Add History to Experiment as a shareable object

* Function to get unique set of history objects in the experiment list

* Add append_history_item function that controls the format of the history
string, and includes a UTC timestamp

* bug fix

* Add functions to (de)serialize and consolidate history.

At the moment, consolidation is just sorting by timestamp. We could
start combining history items though for cases including parallel
file writes (dials.stills_process)

* Change constructor to require history lines

* Test for history

* tidying

* Add a type annotation and a docstring

* News

* Rename newsfragments/xxx.feature to newsfragments/816.feature

* Bugfix for experiment lists with zero length history

* Tidy up consolidation of histories

This is better as a method of the ExperimentList class, providing
an easier interface to history management. When you want to append a
new history line to each experiment, just call el.consolidate_histories(),
which gives you the now unique History object attached to each experiment
and append the line to that object.

* Missed line in SConscript

* Fix issue when an ExperimentList is saved in an interactive session

* Fix idiotic error in 7f1f770

* Changes based on @phyy-nx's suggestion to:

- correctly get dispatcher names for libtbx builds
- get script name when run as main

---------

Co-authored-by: DiamondLightSource-build-server <DiamondLightSource-build-server@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants