Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle image templates that don't use zero padding #705

Merged

Conversation

dagewa
Copy link
Member

@dagewa dagewa commented Mar 8, 2024

Following the discussion in #646, a template with a single # character is used to expand to all image files with non-zero-padded sequential numbers.

@dagewa dagewa linked an issue Mar 8, 2024 that may be closed by this pull request
Copy link

codecov bot commented Mar 8, 2024

Codecov Report

Attention: Patch coverage is 86.11111% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 42.26%. Comparing base (6818056) to head (fc5f8f4).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #705      +/-   ##
==========================================
+ Coverage   42.23%   42.26%   +0.02%     
==========================================
  Files         187      187              
  Lines       16752    16784      +32     
  Branches     3193     3201       +8     
==========================================
+ Hits         7075     7093      +18     
- Misses       9031     9045      +14     
  Partials      646      646              

@dagewa dagewa changed the title Handle image templates that dont use zero padding Handle image templates that don't use zero padding Mar 15, 2024
@dagewa
Copy link
Member Author

dagewa commented Mar 16, 2024

Sorry, there's a bunch of extra commits in here. I think that's because I merged main from the wrong remote at some point 😞

@dagewa
Copy link
Member Author

dagewa commented Mar 17, 2024

Improved specificity further by ensuring the new pattern is only matched when the image serial number does not start with zero. All tests are passing for me, but this code is knotty because not only are we using regexes, but were using reversed regexes 😱

@ndevenish ndevenish force-pushed the 646-handling-image-templates-that-dont-use-zero-padding branch from e4026c5 to 45e88b2 Compare March 21, 2024 10:05
dagewa and others added 7 commits March 21, 2024 10:11
for the case of non-zero padded indices that extend to multiple digits.
Other use cases should still work the same way.
make a new template that includes the underscore character, as that
appears to be part of the standard Rigaku CAP filename format.
The non-zero-padded template must not start with `0` in the image number,
so add that to the template. Then there are 4 groups, so add a condition
to handle this case separately.
@ndevenish ndevenish force-pushed the 646-handling-image-templates-that-dont-use-zero-padding branch from 45e88b2 to 7dee593 Compare March 21, 2024 10:14
@dagewa
Copy link
Member Author

dagewa commented Mar 22, 2024

Finally got the CI all happy. Locally, dxtbx and DIALS tests all pass, as do xia2 tests run with pytest --regression -n auto.

Has anybody got any comments on this PR?

Copy link
Collaborator

@graeme-winter graeme-winter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible

I feel like we should have some test code where we pass a list of filenames and we get back a list which matches the behaviour, more as documentation for future developers than for actual testing.

I would ask that we comment in the change log how you would handle wanting to only open single digit files. I specifically wonder about wanting to dials.import template=foo_001#.cbf say which may not do what you expect (though, probably would)

src/dxtbx/imageset.py Outdated Show resolved Hide resolved
src/dxtbx/imageset.py Show resolved Hide resolved
src/dxtbx/imageset.py Outdated Show resolved Hide resolved
newsfragments/705.feature Outdated Show resolved Hide resolved
src/dxtbx/model/scan_helpers.py Show resolved Hide resolved
src/dxtbx/sequence_filenames.py Show resolved Hide resolved
@dagewa
Copy link
Member Author

dagewa commented Mar 27, 2024

I would ask that we comment in the change log how you would handle wanting to only open single digit files. I specifically wonder about wanting to dials.import template=foo_001#.cbf say which may not do what you expect (though, probably would)

I believe this case does do the right thing:

$ dials.import template=$(dials.data get -q insulin)/insulin_1_01#.img
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.dev.1103-gea4fc09a8
The following parameters have been modified:

input {
  template = "/home/fcx32934/sw/cctbx/build/dials_data/insulin/insulin_1_01#.img"
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatSMVADSCSN.FormatSMVADSCSN'>
  template: /home/fcx32934/sw/cctbx/build/dials_data/insulin/insulin_1_01#.img:0:9
  num images: 10
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------
Writing experiments to imported.expt

The dials.show imported.expt output shows that

Scan:
    number of images:   10
    image range:   {0,9}

Checking the image viewer shows that the first image is indeed the expected image.
image

@dagewa
Copy link
Member Author

dagewa commented Mar 27, 2024

Note to self: one thing to check is what happens with non-zero-padded series that do not start with a single digit. For example:

image_98.cbf
image_99.cbf
image_100.cbf

@dagewa
Copy link
Member Author

dagewa commented Mar 27, 2024

Ok, seems to work fine. With files named insulin_1_{98..101}.img, implicitly:

$ dials.import insulin_1_*
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.dev.1103-gea4fc09a8
The following parameters have been modified:

input {
  experiments = <image files>
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatSMVADSCSN.FormatSMVADSCSN'>
  template: /tmp/insulin_1_##.img:98:101
  num images: 4
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------

and with an explicit template:

$ dials.import template=insulin_1_#.img
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.dev.1103-gea4fc09a8
The following parameters have been modified:

input {
  template = "insulin_1_#.img"
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatSMVADSCSN.FormatSMVADSCSN'>
  template: insulin_1_#.img:98:101
  num images: 4
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------

both produce a 4-image sweep.

Don't include the directory path in the regex, because a pattern
starting with "C:\\Users" will result in:

re.error: incomplete escape \U at position 2

Instead, only do the regex on the filename part.
@dagewa
Copy link
Member Author

dagewa commented Apr 18, 2024

Added test was useful: unearthed an error on Windows, now fixed 👍 I believe this is ready to go in.

@dagewa dagewa merged commit 24a50c7 into cctbx:main Apr 18, 2024
13 checks passed
jbeilstenedmands added a commit to xia2/xia2 that referenced this pull request Apr 19, 2024
jbeilstenedmands added a commit to xia2/xia2 that referenced this pull request Apr 19, 2024
toastisme pushed a commit to toastisme/dxtbx that referenced this pull request Jul 18, 2024
* Use a single `#` character in a filename template to represent a non-zero-padded
incremental number.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling image templates that don't use zero-padding
3 participants