Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect external links #840

Closed
bendichter opened this issue Nov 23, 2021 · 2 comments · Fixed by #843
Closed

detect external links #840

bendichter opened this issue Nov 23, 2021 · 2 comments · Fixed by #843
Assignees

Comments

@bendichter
Copy link
Member

bendichter commented Nov 23, 2021

DANDI does not currently have a good way of handling external links. Eventually, we should find a good way to handle them. For now, we at least need a way to check whether they exist. Here is how you would do that. I don't know how you would want to integrate this into dandi-cli, but you can copy/paste this code:

Create an NWB file with an external link. Here, file4 has an external link to file1:

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
from pynwb import TimeSeries
from pynwb import NWBHDF5IO
import numpy as np
import h5py

# Create the base data
start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
create_date = datetime(2017, 4, 15, 12, tzinfo=tzlocal())
data = np.arange(1000).reshape((100, 10))
timestamps = np.arange(100)
filename1 = 'external1_example.nwb'
filename4 = 'external_linkdataset_example.nwb'

# Create the first file
nwbfile1 = NWBFile(session_description='demonstrate external files',
                   identifier='NWBE1',
                   session_start_time=start_time,
                   file_create_date=create_date)
# Create the second file
test_ts1 = TimeSeries(name='test_timeseries1',
                      data=data,
                      unit='SIunit',
                      timestamps=timestamps)
nwbfile1.add_acquisition(test_ts1)
# Write the first file
io = NWBHDF5IO(filename1, 'w')
io.write(nwbfile1)
io.close()

# Create the second file
nwbfile2 = NWBFile(session_description='demonstrate external files',
                   identifier='NWBE2',
                   session_start_time=start_time,
                   file_create_date=create_date)

# Create the first file
nwbfile4 = NWBFile(
    session_description='demonstrate external files',
    identifier='NWBE4',
    session_start_time=start_time,
    file_create_date=create_date
)

# Get the first timeseries
io1 = NWBHDF5IO(filename1, 'r')
nwbfile1 = io1.read()
timeseries_1 = nwbfile1.get_acquisition('test_timeseries1')
timeseries_1_data = timeseries_1.data

# Create a new timeseries that links to our data
test_ts4 = TimeSeries(
    name='test_timeseries4',
    data=timeseries_1_data,   # <-------
    unit='SIunit',
    timestamps=timestamps
)
nwbfile4.add_acquisition(test_ts4)


io4 = NWBHDF5IO(filename4, 'w')
io4.write(nwbfile4, link_data=True)
io4.close()

test whether files have external links:

def has_external_links(file):
    
    external_links = set()
    visited = set()
    
    # cannot use`file.visititems` because it skips external links (https://github.com/h5py/h5py/issues/671)
    def my_visit(node, func, path='/'):
        if isinstance(node[path], h5py.Group):
            for key in node[path].keys():
                key_path = path + '/' + key
                if key_path not in visited:
                    func(key_path, node)
                    visited.add(key_path)
                    my_visit(node, func, key_path)
        else:
            func(path, node)

    def func(path, node):
        if isinstance(f.get(path, getlink=True), h5py.ExternalLink):
            external_links.add(path)

    my_visit(f, func)

    if external_links:
        return True
    else:
        return False

    
with h5py.File(filename1, 'r') as f:
    print(has_external_links(f))

with h5py.File(filename4, 'r') as f:
    print(has_external_links(f))
@yarikoptic
Copy link
Member

yarikoptic commented Nov 23, 2021

Dear @jwodder please adopt @bendichter 's has_external_links to be fscached and shipped along with other nwb helpers. It could exit as early as it can say True ;)
Then IMHO it should be used in organize and validate functionality to error out on a file if it has an external link or @bendichter and @satra you think we should support them, and thus indeed add validation that they point to existing files? (in a context of a dandiset, those external links should not leave the boundary of a dandiset)

@bendichter
Copy link
Member Author

I think error for now, then we can relax that once we decide how we want to handle external links

yarikoptic added a commit that referenced this issue Dec 13, 2021
Error on NWB files with external links
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants