-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata / validation not caught before attempting to upload #1270
Comments
It looks like nwbinspector is not catching the validation issue? |
Can you try They've been adding more content beyond the Inspector lately; also could you share the log file |
I am not sure that works as intended
|
Here is the content of the log:
|
Last idea of mine: Try upgrading with Otherwise I defer to @yarikoptic on what looks to be bugs on the DANDI CLI side of things |
Ah yes, I upgraded after that run |
so the heart of the problem is the message(s) from
correct? I guess we might improve the message there. What it means that the file contains no fields of interest for you could use
to go through your .nwb files and print their Then you can see which metadata fields are used by |
I ran the script you suggested :
|
To note, these are NWB files that I created by merging the output of suite2p+NeuroConv with data from Allen Institute Visual coding NWB 1.0 files. It looks like maybe some metadata needs to move around. |
I am not entirely sure what is missing. Here is output of pynwb on this file :
timestamps_reference_time: 2020-01-01 12:30:00-08:00 |
We are not sure which metadata is missing. Ahad and I were wondering if something else was crashing organize. See here for an example of these files : https://www.dropbox.com/s/qwv4i2zh0un4v9d/Rorb-IRES2-Cre_590168381_590168385.nwb?dl=0 |
From the printout of your NWB file, it looks like you ought to have everything DANDI currently requires (at least to my knowledge). Thanks for including that
That is my best guess now as well |
If that is helpful, I am comparing the content of this NWB files with another file that dandi organize actually like and is already on Dandi. WORKS:
DOES NOT WORK
|
Could it be fields that should NOT be there? |
Doubtful, when it comes to metadata the more information than can be included the better, so as such I don't believe there are any 'forbidden' contents Something I did just notice is the underscores in the |
I just remove most of the identifier:
|
same error |
@jwodder and @yarikoptic - this section of the reader is resulting in an error - perhaps that results in the issue @jeromelecoq is seeing: using
results in:
where as this works just fine:
|
Ah that seems like it. Yes. I tested the io.read() but not the thing above. We just need to find the key it crashes on? |
Thanks for digging! hm, I have tried to reproduce while incrementally building up how I open it
and they all worked out
|
So I used a variant of this code https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L138 To port visual stimuli object in an NWB 1 files to a newly created NWB 2.0 files. What exactly is the sub-object that crashes? |
I think those index_timeseries were provided here : https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L184 |
which is due to https://github.com/dandi/dandi-cli/blob/HEAD/dandi/metadata.py#L110 which was added in #843 to "address" #840 . If my analysis is right, the "solution" here might be
|
Can you clarify how I can address the error? Should I remove external links? |
@yarikoptic - perhaps its a version thing. in a fresh mamba environment on my m1 tin can:
and then from dandi.metadata import _get_pynwb_metadata
_get_pynwb_metadata("Rorb-IRES2-Cre_590168381_590168385.nwb") the error (which points to the links as well i think):
some relevant bits:
|
@jeromelecoq - this may help: https://www.dandiarchive.org/2022/03/03/external-links-organize.html (perhaps @CodyCBakerPhD could say if its still up to date) |
I am not sure why there are external links with the movies. I can access the raw data directly. It looks like the raw movie is in the template. |
I can't seem to replicate
|
|
jerome.lecoq@OSXLTCYGQCV to_upload % python |
So I am not sure how it happened so far but I have an external link to dataset in the same file ...
|
thanks @jeromelecoq - suggests something else on my machine. still trying to get clean read. |
It does seem that this is related to ExternalLinks. This link is between datasets in the same file. this link was created by this code line : https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L184 To connect a template with a presentation. Is that the wrong way to do this? |
Is it possible that the error is because the template is stored as an OpticalSeries? Ryan discuss it in the comment above : |
@jeromelecoq - when did ryan make that suggestion? perhaps the pynwb bug is fixed now and you can go to addressing the best practice violation suggested in your original post? @yarikoptic and @jeromelecoq - i can't reproduce the contexterror on a separate linux machine, but i can on my m1 mac both natively and using a docker container. and it's interesting that the error points to the same relevant section of code. all coincidence perhaps. |
I completely changed the way the natural_movie template is added and used Images object, per Satra suggestion. The same error occurs. So this is ruled out. Here is the newer file. Here is copy of cmd:
I am a very unclear as to what is going on. Should we loop in Ryan here? |
Hi all, I am having the same type of error as jerome
with different nwb files. These files are recgenerations of files that were already on dandi and have passed dandi validation in the past, with the only difference being that the subject_id in the subject field has changed. |
@Ahad-Allen following above discussion -- do you know if files include external links? edit: ignore -- as I showed below, it does not you can possibly get to the original exception and warnings (which might warn about external links) via running it as |
using this script -- those modules versions seems to be the samefrom pynwb import NWBHDF5IO
from dandi.consts import metadata_nwb_file_fields
from dandi.pynwb_utils import open_readable
from dandi.pynwb_utils import nwb_has_external_links
import sys
def load(io):
nwb = io.read()
for key in metadata_nwb_file_fields:
value = getattr(nwb, key)
import pkg_resources
import dandi, h5py, hdmf, pynwb
for m in dandi, h5py, hdmf, pynwb:
print(pkg_resources.get_distribution(m.__name__))
for fname in sys.argv[1:]:
print(f"{fname} has links: {nwb_has_external_links(fname)}")
with NWBHDF5IO(fname, load_namespaces=True) as io:
load(io)
print("way 1 worked")
with open(fname, 'rb') as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
load(io)
print("way 2 worked")
with open_readable(fname) as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
load(io)
print("way 3 worked")
from dandi.metadata import _get_pynwb_metadata
print(_get_pynwb_metadata(fname))
and on file from @Ahad-Allen
so also works -- I guess difference in some other version detail. edit: on that box I use simple virtualenv with system wide python 3.9 |
and running organize on the file from @Ahad-Allen worked for mesmaug:~/proj/dandi/nwb-files/000027
$> DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ../1193675750raw_data.nwb
...
2023-04-07 15:55:45,114 [ INFO] Symlink support autodetected; setting files_mode='symlink'
2023-04-07 15:55:45,118 [ DEBUG] Assigned 1 session_id's based on the date
2023-04-07 15:55:45,119 [ INFO] Organized 1 paths. Visit /home/yoh/proj/dandi/nwb-files/000027/
2023-04-07 15:55:45,119 [ INFO] Logs saved in /home/yoh/.cache/dandi-cli/log/20230407195534Z-3648741.log
DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug 11.94s user 0.83s system 108% cpu 11.736 total
(dev3) 1 10233.....................................:Fri 07 Apr 2023 03:55:45 PM EDT:.
smaug:~/proj/dandi/nwb-files/000027
$> ls -l /home/yoh/proj/dandi/nwb-files/000027/sub-621602/sub-621602_ophys.nwb
lrwxrwxrwx 1 yoh yoh 53 Apr 7 15:55 /home/yoh/proj/dandi/nwb-files/000027/sub-621602/sub-621602_ophys.nwb -> /home/yoh/proj/dandi/nwb-files/1193675750raw_data.nwb
|
It looks like these files were created with non-standard means. Without more detailed reported from dandi-cli, it's going to be difficult to know how to resolve. |
Hi @bendichter, well, I am not entirely sure to what extent this is out of a normal workflow. 1/ I used suite2p to segment a movie. |
I was able to nailed down the issue further. The problem is the IndexSeries object when it receives a index_timeseries as parameter to register the associated template. This end-up creating an NWB file with external file link. |
I believe this code here : https://pynwb.readthedocs.io/en/stable/tutorials/domain/brain_observatory.html Would not work as a result.
The problem is that indexed_timeseries link which causes dandi to have issues. |
i'm going to bring @rly into this conversation. the summary of this issue is that certain operations lead to external links being created, which are not external links (as in don't point to files outside we think) and that's triggering dandi cli to complain. @jeromelecoq - just a random thought, is it possible that some part of the step is still pointing to data/data array in the nwb 1 file? i.e. still maintains a reference hence treated as an external link? |
Yes
I think a link is created causing the issue.
I played around with the dataset properties and it seems like the link is
to current file itself like self referenced link
I can explore more tonight
On Sun, Apr 9, 2023 at 2:33 PM Satrajit Ghosh ***@***.***> wrote:
i'm going to bring @rly <https://github.com/rly> into this conversation.
the summary of this issue is that certain operations lead to external links
being created, which are not external links (as in don't point to files
outside we think) and that's triggering dandi cli to complain.
@jeromelecoq <https://github.com/jeromelecoq> - just a random thought, is
it possible that some part of the step is still pointing to data/data array
in the nwb 1 file? i.e. still maintains a reference hence treated as an
external link?
—
Reply to this email directly, view it on GitHub
<#1270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATAHTYGZW7OVGJLGNH4JGLXAMTJTANCNFSM6AAAAAAWUR6H2A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Jérôme*
|
Using TimeSeries allowed me to move forward and upload a draft of Visual Coding NWB 2.0 to Dandi. This supports the issue is related to links. I am still working on it little things here and there but I will go back to this later on. Obviously my files do not have the template images, just the underlying stimulus structure. |
@jeromelecoq please try installing this branch of HDMF referenced in hdmf-dev/hdmf#847:
And let me know if that resolves the error. |
I am getting a sequence of error when some metadata is missing :
(nwb) jerome.lecoq@OSXLTCYGQCV upload % nwbinspector ./to_upload --config dandi
NWBInspector Report Summary
Timestamp: 2023-04-05 13:50:51.651946-07:00
Platform: macOS-12.6.3-arm64-arm-64bit
NWBInspector version: 0.4.26
Found 17 issues over 1 files:
2 - BEST_PRACTICE_VIOLATION
15 - BEST_PRACTICE_SUGGESTION
0 BEST_PRACTICE_VIOLATION
0.0 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_index_series_points_to_image - 'IndexSeries' object at location '/stimulus/presentation/natural_movie_three_stimulus'
Message: Pointing an IndexSeries to a TimeSeries will be deprecated. Please point to an Images container instead.
0.1 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_index_series_points_to_image - 'IndexSeries' object at location '/stimulus/presentation/natural_movie_one_stimulus'
Message: Pointing an IndexSeries to a TimeSeries will be deprecated. Please point to an Images container instead.
1 BEST_PRACTICE_SUGGESTION
1.2 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/eye-tracking camera'
Message: Description is missing.
1.3 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/display monitor'
Message: Description is missing.
1.4 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/Microscope'
Message: Description is missing.
1.5 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/2-photon microscope'
Message: Description is missing.
1.6 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Images' object with name 'SegmentationImages'
Message: Description ('no description') is a placeholder.
1.7 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'GrayscaleImage' object with name 'mean'
Message: Description is missing.
1.8 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'GrayscaleImage' object with name 'correlation'
Message: Description is missing.
1.9 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_small_dataset_compression - 'OpticalSeries' object at location '/stimulus/templates/natural_movie_three_image_stack'
Message: data is not compressed. Consider enabling compression when writing a dataset.
1.10 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_small_dataset_compression - 'OpticalSeries' object at location '/stimulus/templates/natural_movie_one_image_stack'
Message: data is not compressed. Consider enabling compression when writing a dataset.
1.11 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_experimenter_exists - 'NWBFile' object at location '/'
Message: Experimenter is missing.
1.12 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_experiment_description - 'NWBFile' object at location '/'
Message: Experiment description is missing.
1.13 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_keywords - 'NWBFile' object at location '/'
Message: Metadata /general/keywords is missing.
1.14 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binary_capability - 'TimeIntervals' object with name 'trials'
Message: Column 'blank_sweep' uses 'float32' but has binary values [0. 1.]. Consider making it boolean instead and renaming the column to start with 'is_'; doing so will save 1.88KB.
1.15 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binary_capability - 'PlaneSegmentation' object with name 'PlaneSegmentation'
Message: Column 'Accepted' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is_'; doing so will save 13.02KB.
1.16 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binary_capability - 'PlaneSegmentation' object with name 'PlaneSegmentation'
Message: Column 'Rejected' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is_'; doing so will save 13.02KB.
(nwb) jerome.lecoq@OSXLTCYGQCV upload % cd 000459
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi organize ../to_upload
2023-04-05 13:51:11,061 [ WARNING] A newer version (0.52.0) of dandi/dandi-cli is available. You are using 0.51.0
2023-04-05 13:51:11,490 [ INFO] NumExpr defaulting to 8 threads.
2023-04-05 13:51:12,251 [ INFO] Loading metadata from 1 files
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 2.6s
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 2.6s finished
2023-04-05 13:51:14,851 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05 13:51:14,851 [ INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log
Error: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
The text was updated successfully, but these errors were encountered: