add custom html field generation method for TimeSeries #1831

stephprince · 2024-01-24T21:20:46Z

Motivation

Adds a method to TimeSeries that checks for linked timestamps or linked data to avoid recursion errors when generating an html representation in Jupyter notebooks.

See also related hdmf issue (hdmf-dev/hdmf#1010) and PR (hdmf-dev/hdmf#1038)

How to test the behavior?

See below for an example from the related hdmf issue in which timestamps are linked across multiple spatial series.

from pynwb import NWBHDF5IO
from pynwb.testing.mock.file import mock_NWBFile
from pynwb.testing.mock.behavior import mock_SpatialSeries, mock_Position
import numpy as np

nwbfile = mock_NWBFile()

test_timestamps = np.zeros((1000,))
test_data = np.random.rand(*(1000, 3))
spatial_series1 = mock_SpatialSeries(name="test1", rate=None, data=test_data, timestamps=test_timestamps)
spatial_series2 = mock_SpatialSeries(name="test2", rate=None, data=test_data, timestamps=spatial_series1)
spatial_series3 = mock_SpatialSeries(name="test3", rate=None, data=test_data, timestamps=spatial_series1)
spatial_series4 = mock_SpatialSeries(name="test4", rate=None, data=test_data, timestamps=spatial_series1)
position = mock_Position(spatial_series=[spatial_series1, spatial_series2, spatial_series3, spatial_series4])

nwbfile.create_processing_module("behavior", description="contains processed behavior data")
nwbfile.processing["behavior"].add(position)

with NWBHDF5IO("test.nwb", "w") as io:
    io.write(nwbfile)

read_io = NWBHDF5IO("test.nwb", "r")
nwbfile_in = read_io.read()

nwbfile_in

Checklist

Did you update CHANGELOG.md with your changes?
Have you checked our Contributing document?
Have you ensured the PR clearly describes the problem and the solution?
Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

codecov · 2024-01-24T21:22:55Z

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (f77f33c) 92.19% compared to head (88c0d44) 91.95%.

Files	Patch %	Lines
src/pynwb/base.py	63.63%	6 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1831      +/-   ##
==========================================
- Coverage   92.19%   91.95%   -0.24%     
==========================================
  Files          27       27              
  Lines        2639     2661      +22     
  Branches      690      699       +9     
==========================================
+ Hits         2433     2447      +14     
- Misses        136      142       +6     
- Partials       70       72       +2

Flag	Coverage Δ
integration	`70.80% <4.54%> (-0.56%)`	⬇️
unit	`84.14% <63.63%> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stephprince · 2024-01-24T21:28:41Z

@rly I tested this potential fix with a file from dandiest 000053 and on the original example. But I don't have access to dandiest 000336, so I'm not positive if it actually fixes the original issue.

rly · 2024-01-24T22:37:46Z

@stephprince thanks! using this branch and the corresponding one in hdmf fixes the issue for the file I have from dandiset 000336

rly · 2024-01-24T22:41:38Z

src/pynwb/base.py

+        if key in ['timestamp_link', 'data_link']:
+            value = {v.name: v.neurodata_type for v in value}


Rather than link to the linked object, can we link to just the timestamps (or data) of the corresponding object?

Do you mean instead of displaying the neurodata_type, show just the timestamps/data of the corresponding object? Or do you mean changing how the timestamp_link and data_link properties get set up?

The former. In practice, we do not show arrays yet in this html repr, so nothing will show up, but it would be less confusing as to why timestamps = another NWB object.

It would however be nice to say that this is a linked object, so maybe we could alter the key to say something like
→ timestamps (link to <path to timestamps of other NWB object -- use container.get_ancestors() and make a nice string path out of it>)"

Got it, I was just looking at the items in the 'timestamp_link' subheading and didn't realize the timestamps for the other timeseries were still showing the linked object. Yes, I can try to change that as well as alter the key to indicate it is a linked object.

Ah right. Yes I was looking at the timestamps field and not the timestamps_link field.

rly · 2024-01-25T05:17:03Z

This looks good. However, I misremembered how get_ancestor worked - things that are in nwbfile.acquisition["EyeTracking"].eye_tracking have get_ancestor() = "root/EyeTracking/eye_tracking" which is not an accurate pynwb-based path to the object...

But in testing this, I saw that the tooltip contains the pynwb path! Could you put that as the path in the (link to <path>) instead? I think this is the access_code variable but I don't remember. Sorry for the hassle! :\

rly · 2024-01-25T05:17:40Z

@oruebel What do you think? What looks better here?

oruebel · 2024-01-25T05:48:54Z

Using the Python code path shown in the tooltip seems reasonable here, since this is how a user would access it in PyNWB and since we are printing the Container here (not the raw file structure).

stephprince · 2024-01-26T19:15:29Z

I think that makes sense to use the python code path. However, I believe the access_code that is showing up with the tooltip is for the current object (not the linked one)

I'm trying to write something to generate an access code for the linked object by going up through the parent objects, but I'm not sure how to identify when an object belongs to acquisition, processing, etc.. Are there any attributes/methods that you usually use for that or is there a better way to do this?

rly · 2024-01-26T22:19:31Z

Good point. No, the child object does not know where in the parent object it lives, e.g., in acquisition or processing. It's ugly, but the only ways I can see determining the path to the object are:

go backwards up the parent-child hierarchy, and in the parent, iterate through all the object until you find the child, compute the access code in a similar way as it is done in hdmf (https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/container.py#L626-L633), and then repeat with the parent of that parent. this seems difficult and bug-prone.
adjust Container._generate_html_repr so that it caches the access code of every field in a dictionary that you can then query in this code in pynwb. or create a new function in Container that generates the access code of every child/grandchild within it that you can then query.

I think option 2 might be better and could be useful in other ways.

oruebel · 2024-01-26T22:32:20Z

I think option 2 might be better and could be useful in other ways.

If the file was loaded from disk, then this should be an h5py dataset, so could also ask it for the path in the file directly in that case

rly · 2024-01-26T22:43:23Z

I think option 2 might be better and could be useful in other ways.

If the file was loaded from disk, then this should be an h5py dataset, so could also ask it for the path in the file directly in that case

Right, but that's not the pynwb access path

CodyCBakerPhD · 2024-01-26T22:43:52Z

Hey all

go backwards up the parent-child hierarchy, and in the parent, iterate through all the object until you find the child, compute the access code in a similar way as it is done in hdmf (https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/container.py#L626-L633), and then repeat with the parent of that parent. this seems difficult and bug-prone.

This is how we did this recently for our backend reports of dataset IO's to be configured by the user: https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L19-L35

In particular it's built for the in-memory case before anything has hit disk

If the file was loaded from disk, then this should be an h5py dataset, so could also ask it for the path in the file directly in that case

This is what we do to assign 'location' in the NWB Inspector; but it may not always agree with the other method all the time; common exceptions include the trials and epochs tables, which can be found attached in-memory to the NWB file container, but on disk show up in the general/intervals group or something like that

CodyCBakerPhD · 2024-01-26T22:44:42Z

So if this is just for user-friendly text-based display of contents for the HTML repr, I say go with the ancestral recursion method since it will get 'close enough' to how they would access it in the API

rly · 2024-01-26T22:46:28Z

Thanks for the code pointer @CodyCBakerPhD !

Yeah, I think that's close enough.

stephprince · 2024-01-27T00:35:13Z

great, thanks for the help @CodyCBakerPhD! I will go ahead with the ancestral recursion method then

rly · 2024-02-08T21:04:16Z

@stephprince is this good to go? If so, please mark as ready for review

add custom html field generation method for timeseries

4468878

update CHANGELOG.md

d555919

stephprince mentioned this pull request Jan 24, 2024

add option to use a custom html field generation method hdmf-dev/hdmf#1038

Merged

4 tasks

Merge branch 'dev' into jupyter-recursion-error

0ae1da2

rly reviewed Jan 24, 2024

View reviewed changes

stephprince added 2 commits January 24, 2024 18:07

modify html representation of linked timestamps and data

fb4def6

add test for html representation of linked data

ee3af9e

rly mentioned this pull request Jan 27, 2024

[Feature]: In Container._repr_html_, indicate which fields are links hdmf-dev/hdmf#1043

Open

update html representation of linked objects

bce6858

rly previously approved these changes Feb 8, 2024

View reviewed changes

stephprince added 2 commits February 8, 2024 15:35

update html representation test

d24bde8

Merge branch 'dev' into jupyter-recursion-error

5f38295

stephprince dismissed rly’s stale review via 5f38295 February 8, 2024 23:37

stephprince marked this pull request as ready for review February 9, 2024 00:34

Use HDMF 3.12.2

88c0d44

rly approved these changes Feb 9, 2024

View reviewed changes

stephprince merged commit 0a70990 into dev Feb 9, 2024
21 of 24 checks passed

stephprince deleted the jupyter-recursion-error branch February 9, 2024 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add custom html field generation method for TimeSeries #1831

add custom html field generation method for TimeSeries #1831

stephprince commented Jan 24, 2024 •

edited

Loading

codecov bot commented Jan 24, 2024 •

edited

Loading

stephprince commented Jan 24, 2024

rly commented Jan 24, 2024

rly Jan 24, 2024

stephprince Jan 24, 2024

rly Jan 24, 2024 •

edited

Loading

stephprince Jan 24, 2024 •

edited

Loading

rly Jan 24, 2024

rly commented Jan 25, 2024

rly commented Jan 25, 2024

oruebel commented Jan 25, 2024

stephprince commented Jan 26, 2024

rly commented Jan 26, 2024

oruebel commented Jan 26, 2024

rly commented Jan 26, 2024

CodyCBakerPhD commented Jan 26, 2024

CodyCBakerPhD commented Jan 26, 2024

rly commented Jan 26, 2024

stephprince commented Jan 27, 2024

rly commented Feb 8, 2024

		if key in ['timestamp_link', 'data_link']:
		value = {v.name: v.neurodata_type for v in value}

add custom html field generation method for TimeSeries #1831

add custom html field generation method for TimeSeries #1831

Conversation

stephprince commented Jan 24, 2024 • edited Loading

Motivation

How to test the behavior?

Checklist

codecov bot commented Jan 24, 2024 • edited Loading

Codecov Report

stephprince commented Jan 24, 2024

rly commented Jan 24, 2024

rly Jan 24, 2024

Choose a reason for hiding this comment

stephprince Jan 24, 2024

Choose a reason for hiding this comment

rly Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

stephprince Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

rly Jan 24, 2024

Choose a reason for hiding this comment

rly commented Jan 25, 2024

rly commented Jan 25, 2024

oruebel commented Jan 25, 2024

stephprince commented Jan 26, 2024

rly commented Jan 26, 2024

oruebel commented Jan 26, 2024

rly commented Jan 26, 2024

CodyCBakerPhD commented Jan 26, 2024

CodyCBakerPhD commented Jan 26, 2024

rly commented Jan 26, 2024

stephprince commented Jan 27, 2024

rly commented Feb 8, 2024

stephprince commented Jan 24, 2024 •

edited

Loading

codecov bot commented Jan 24, 2024 •

edited

Loading

rly Jan 24, 2024 •

edited

Loading

stephprince Jan 24, 2024 •

edited

Loading