-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guess_format
doesn't work with h5py.File
objects
#2884
Comments
try adding a function like this : https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/chemfiles.py#L121 |
It looks like the problem is that it's a false positive for ChainReader, because the format hint for that one checks if the file is iterable. Not sure the best way around this; maybe a special case for h5py in the ChainReader format hint? Alternatively, the |
The format hint thing takes priority over chainreader I think. It’s also
how numpy arrays don’t fall into it
…On Tue, Aug 4, 2020 at 12:27, Lily Wang ***@***.***> wrote:
It looks like the problem is that it's a false positive for ChainReader,
because the format hint for that one checks if the file is iterable. Not
sure the best way around this; maybe a special case for h5py in the
ChainReader format hint?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2884 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGSGBYUMQMLSDOJQRNYMT3R67WCXANCNFSM4PUCIJ7A>
.
|
Not currently, and you wouldn't want it to anyway (although this would have been an easy solution), because what if you have an iterable of chemfiles objects? ChainReader ignores ndarrays specifically: @staticmethod
def _format_hint(thing):
"""Can ChainReader read the object *thing*
.. versionadded:: 1.0.0
"""
return (not isinstance(thing, np.ndarray) and
util.iterable(thing) and
not util.isstream(thing)) |
Ah my mistake then, I thought chainreader was in the guess_format thing that happens once all the hints are done. Hmm. I think ChainReader should only happen once all other formats have had a chance to have a go. So maybe ChainReader shouldn't use a |
@richardjgowers @lilyminium thanks for the replies. |
Yeah I think chainreader is being a little greedy and we probably should
tweak it to be less eager to read anything we pass it.
…On Tue, Aug 4, 2020 at 18:27, edisj ***@***.***> wrote:
@richardjgowers <https://github.com/richardjgowers> @lilyminium
<https://github.com/lilyminium> thanks for the replies.
I do have a _format_hint is defined here
<https://github.com/MDAnalysis/mdanalysis/blob/840a977294311555232114fe998163ce53a88716/package/MDAnalysis/coordinates/H5MD.py#L504-L509>
in H5MDReader. So is the issue that the h5py.File is being picked up by
the ChainReader format hint, which selects ChainReader before it has a
chance to be selected by H5MDReader?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2884 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGSGB2GK23DQOMSEUPEHN3R7BAITANCNFSM4PUCIJ7A>
.
|
Should we make ChainReader a special case in the format analysis? Or should we add a "priority" attribute to each format hint along the lines that everything with the same priority can be tested in any order but low priority are guaranteed to be tested after higher priorities? We then just have to agree that ChainReader is, say, 0, and others are higher, say 10. Not sure if other special cases such as numpy arrays need to be higher/lower than the others. |
@ljwoods2 is this issue related to the guessing issue you're having for zarrtraj? |
Is your feature request related to a problem?
I'm trying to pass an
h5py.File
object intomda.Universe
using H5MDReader (#2787), but have been running into funky errors and I think the issue is thath5py.File
attributes aren't suited for MDAnalysis.lib.util.isstream and MDAnalysis.lib.util.iterable, due to h5py files not passing theisstream
method and being classified as "iterable" due to having alen(h5py.File)
If I don't pass
format="H5MD
:MDAnalysis.lib.util.guess_format() looks at
isstream(h5py.File)
anditerable(h5py.File
, however, these return unexpected values for anh5py.File
:which makes
guess_format
think there are multiple trajectories that get passed to MDAnalysis.coordinates.chain.py which THEN gets passed to MDAnalysis.coordinates.core.py which fails and givesValueError: Unknown coordinate trajectory format '' for 'h5md'.
, so it looks like it did try to iterate through the groups in theh5py.File
:If I do pass `format="H5MD":
Again, it ends up calling
MDAnalysis.coordinates.chain.py
, but then passes it toH5MDReader
where it fails withOSError: Unable to open file (unable to open file: name = 'h5md', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
. So, again, it looks like it tried to load the first group'h5md'
from the file as an individual trajectoryDescribe the solution you'd like
I'd like h5py files to also pass the
isstream
method, even though it doesn't fit the criteria.Describe alternatives you've considered
I've tried doing something like adding an if clause at the start of MDAnalysis.lib.util.guess_format with something like
but the same issues occur, so I'm not sure what to do to fix it.
Additional context
The text was updated successfully, but these errors were encountered: