Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add custom html field generation method for TimeSeries #1831

Merged
merged 9 commits into from
Feb 9, 2024

Conversation

stephprince
Copy link
Contributor

@stephprince stephprince commented Jan 24, 2024

Motivation

Fix #1830.

Adds a method to TimeSeries that checks for linked timestamps or linked data to avoid recursion errors when generating an html representation in Jupyter notebooks.

See also related hdmf issue (hdmf-dev/hdmf#1010) and PR (hdmf-dev/hdmf#1038)

How to test the behavior?

See below for an example from the related hdmf issue in which timestamps are linked across multiple spatial series.

from pynwb import NWBHDF5IO
from pynwb.testing.mock.file import mock_NWBFile
from pynwb.testing.mock.behavior import mock_SpatialSeries, mock_Position
import numpy as np

nwbfile = mock_NWBFile()

test_timestamps = np.zeros((1000,))
test_data = np.random.rand(*(1000, 3))
spatial_series1 = mock_SpatialSeries(name="test1", rate=None, data=test_data, timestamps=test_timestamps)
spatial_series2 = mock_SpatialSeries(name="test2", rate=None, data=test_data, timestamps=spatial_series1)
spatial_series3 = mock_SpatialSeries(name="test3", rate=None, data=test_data, timestamps=spatial_series1)
spatial_series4 = mock_SpatialSeries(name="test4", rate=None, data=test_data, timestamps=spatial_series1)
position = mock_Position(spatial_series=[spatial_series1, spatial_series2, spatial_series3, spatial_series4])

nwbfile.create_processing_module("behavior", description="contains processed behavior data")
nwbfile.processing["behavior"].add(position)

with NWBHDF5IO("test.nwb", "w") as io:
    io.write(nwbfile)

read_io = NWBHDF5IO("test.nwb", "r")
nwbfile_in = read_io.read()

nwbfile_in

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (f77f33c) 92.19% compared to head (88c0d44) 91.95%.

Files Patch % Lines
src/pynwb/base.py 63.63% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1831      +/-   ##
==========================================
- Coverage   92.19%   91.95%   -0.24%     
==========================================
  Files          27       27              
  Lines        2639     2661      +22     
  Branches      690      699       +9     
==========================================
+ Hits         2433     2447      +14     
- Misses        136      142       +6     
- Partials       70       72       +2     
Flag Coverage Δ
integration 70.80% <4.54%> (-0.56%) ⬇️
unit 84.14% <63.63%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stephprince
Copy link
Contributor Author

@rly I tested this potential fix with a file from dandiest 000053 and on the original example. But I don't have access to dandiest 000336, so I'm not positive if it actually fixes the original issue.

@rly
Copy link
Contributor

rly commented Jan 24, 2024

@stephprince thanks! using this branch and the corresponding one in hdmf fixes the issue for the file I have from dandiset 000336

Comment on lines 292 to 293
if key in ['timestamp_link', 'data_link']:
value = {v.name: v.neurodata_type for v in value}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than link to the linked object, can we link to just the timestamps (or data) of the corresponding object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean instead of displaying the neurodata_type, show just the timestamps/data of the corresponding object? Or do you mean changing how the timestamp_link and data_link properties get set up?

Copy link
Contributor

@rly rly Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former. In practice, we do not show arrays yet in this html repr, so nothing will show up, but it would be less confusing as to why timestamps = another NWB object.

It would however be nice to say that this is a linked object, so maybe we could alter the key to say something like
→ timestamps (link to <path to timestamps of other NWB object -- use container.get_ancestors() and make a nice string path out of it>)"

Copy link
Contributor Author

@stephprince stephprince Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I was just looking at the items in the 'timestamp_link' subheading and didn't realize the timestamps for the other timeseries were still showing the linked object. Yes, I can try to change that as well as alter the key to indicate it is a linked object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Yes I was looking at the timestamps field and not the timestamps_link field.

@rly
Copy link
Contributor

rly commented Jan 25, 2024

This looks good. However, I misremembered how get_ancestor worked - things that are in nwbfile.acquisition["EyeTracking"].eye_tracking have get_ancestor() = "root/EyeTracking/eye_tracking" which is not an accurate pynwb-based path to the object...

But in testing this, I saw that the tooltip contains the pynwb path! Could you put that as the path in the (link to <path>) instead? I think this is the access_code variable but I don't remember. Sorry for the hassle! :\

image

@rly
Copy link
Contributor

rly commented Jan 25, 2024

@oruebel What do you think? What looks better here?

@oruebel
Copy link
Contributor

oruebel commented Jan 25, 2024

Using the Python code path shown in the tooltip seems reasonable here, since this is how a user would access it in PyNWB and since we are printing the Container here (not the raw file structure).

@stephprince
Copy link
Contributor Author

I think that makes sense to use the python code path. However, I believe the access_code that is showing up with the tooltip is for the current object (not the linked one)
Screenshot 2024-01-26 at 10 21 37 AM
I'm trying to write something to generate an access code for the linked object by going up through the parent objects, but I'm not sure how to identify when an object belongs to acquisition, processing, etc.. Are there any attributes/methods that you usually use for that or is there a better way to do this?

@rly
Copy link
Contributor

rly commented Jan 26, 2024

Good point. No, the child object does not know where in the parent object it lives, e.g., in acquisition or processing. It's ugly, but the only ways I can see determining the path to the object are:

  1. go backwards up the parent-child hierarchy, and in the parent, iterate through all the object until you find the child, compute the access code in a similar way as it is done in hdmf (https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/container.py#L626-L633), and then repeat with the parent of that parent. this seems difficult and bug-prone.
  2. adjust Container._generate_html_repr so that it caches the access code of every field in a dictionary that you can then query in this code in pynwb. or create a new function in Container that generates the access code of every child/grandchild within it that you can then query.

I think option 2 might be better and could be useful in other ways.

@oruebel
Copy link
Contributor

oruebel commented Jan 26, 2024

I think option 2 might be better and could be useful in other ways.

If the file was loaded from disk, then this should be an h5py dataset, so could also ask it for the path in the file directly in that case

@rly
Copy link
Contributor

rly commented Jan 26, 2024

I think option 2 might be better and could be useful in other ways.

If the file was loaded from disk, then this should be an h5py dataset, so could also ask it for the path in the file directly in that case

Right, but that's not the pynwb access path

@CodyCBakerPhD
Copy link
Collaborator

Hey all

go backwards up the parent-child hierarchy, and in the parent, iterate through all the object until you find the child, compute the access code in a similar way as it is done in hdmf (https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/container.py#L626-L633), and then repeat with the parent of that parent. this seems difficult and bug-prone.

This is how we did this recently for our backend reports of dataset IO's to be configured by the user: https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L19-L35

In particular it's built for the in-memory case before anything has hit disk

If the file was loaded from disk, then this should be an h5py dataset, so could also ask it for the path in the file directly in that case

This is what we do to assign 'location' in the NWB Inspector; but it may not always agree with the other method all the time; common exceptions include the trials and epochs tables, which can be found attached in-memory to the NWB file container, but on disk show up in the general/intervals group or something like that

@CodyCBakerPhD
Copy link
Collaborator

So if this is just for user-friendly text-based display of contents for the HTML repr, I say go with the ancestral recursion method since it will get 'close enough' to how they would access it in the API

@rly
Copy link
Contributor

rly commented Jan 26, 2024

Thanks for the code pointer @CodyCBakerPhD !

Yeah, I think that's close enough.

@stephprince
Copy link
Contributor Author

great, thanks for the help @CodyCBakerPhD! I will go ahead with the ancestral recursion method then

@rly
Copy link
Contributor

rly commented Feb 8, 2024

@stephprince is this good to go? If so, please mark as ready for review

rly
rly previously approved these changes Feb 8, 2024
@stephprince stephprince marked this pull request as ready for review February 9, 2024 00:34
@stephprince stephprince merged commit 0a70990 into dev Feb 9, 2024
21 of 24 checks passed
@stephprince stephprince deleted the jupyter-recursion-error branch February 9, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: "Maximum Recursion Depth Exceeded" displaying NWB when testing Jupyter notebook
4 participants