Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorial covering DynamicTable basics #320

Merged
merged 30 commits into from
Feb 17, 2022

Conversation

cechava
Copy link
Collaborator

@cechava cechava commented Oct 14, 2021

hdmf_common objects covered:

  • VectorData
  • VectorIndex
  • DynamicTable
  • DynamicTableRegion

resolves issue #318

@codecov
Copy link

codecov bot commented Oct 14, 2021

Codecov Report

Merging #320 (ceb23ec) into master (1647dc4) will not change coverage.
The diff coverage is n/a.

❗ Current head ceb23ec differs from pull request most recent head c83da30. Consider uploading reports for the commit c83da30 to get more accurate results

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #320   +/-   ##
=======================================
  Coverage   85.66%   85.66%           
=======================================
  Files         125      125           
  Lines        4709     4709           
=======================================
  Hits         4034     4034           
  Misses        675      675           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1647dc4...c83da30. Read the comment docs.

@bendichter bendichter self-requested a review October 14, 2021 15:12
Copy link
Contributor

@bendichter bendichter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, @cechava ! You are definitely on the right track.

If no value is passed to id, the corresponding field within the DynamicTable object is an empty array.

When you nwbExport, does this create a DynamicTable that contains an ElementIdentifiers with a different length from the columns in the table? That is not allowed

You may also supply an optional row ID.

What happens if an ID is not provided?

Ragged array columns

Could you add info about util.create_indexed_column?

Now, you can write the file, read it back, and run:

Can you demonstrate this?

@cechava
Copy link
Collaborator Author

cechava commented Oct 14, 2021

@bendichter is there a way to write a bare-bones DynamicTable object to file?
In Python HDMF one can do this with HDF5IO from the hdmf.backends.hdf5 module.

In the DataPipe tutorial a numeric matrix is first put into a TimeSeries object (which is itself a DynamicTable) before putting into a NWBFile and exported. A bare-boned DynamicTable object might have non-numeric data so this is not the best.
Do you know of any other way?

@bendichter
Copy link
Contributor

bendichter commented Oct 14, 2021 via email

@cechava
Copy link
Collaborator Author

cechava commented Oct 14, 2021

Tagging @ln-vidrio, in case they know
The question is about DynamicTables. Is there a straightforward way to write a bare-bones DynamicTable of the type below to file?

col1_exp = types.hdmf_common.VectorData( ...
    'description', 'column #1', ...
    'data', types.untyped.DataPipe('data', [1; 2], ...  # data must be numerical
        'maxSize', [Inf, 1], ...
        'axis', 1 ...
    ) ...
);

col2_exp = types.hdmf_common.VectorData('description', 'column #2',...
    'data',types.untyped.DataPipe('data', [10; 20], ...  #data must be numerical
        'maxSize', [Inf, 1], ...
        'axis', 1 ...
    ) ...
);

ids_exp = types.hdmf_common.ElementIdentifiers('data', ...
    types.untyped.DataPipe('data', [0;1], ...  # data must be numerical
        'maxSize', [1, Inf], ...
        'axis', 1 ...
    ) ...
);

table_exp = types.hdmf_common.DynamicTable('description','an expandable table',...
    'colnames', {'col1','col2'},...
    'col1',col1_exp,'col2',col2_exp,...
    'id', ids_exp);

@cechava
Copy link
Collaborator Author

cechava commented Oct 14, 2021

@bendichter
Another issue is the behavior of ObjectView. You mentioned that it is possible to pass in the object itself rather than the full hdf5 path to the object. However, it doesn't seem to be the case

For example, when I run the following code to create a DynamicTable with ragged array columns:

col1 = types.hdmf_common.VectorData( ...
    'description', 'column #1', ...
    'data', {'1a'; '1b'; '1c'; '2a'} ...
);
col1_index = types.hdmf_common.VectorIndex( ...
    'description', 'column #1 index',...
    'target',types.untyped.ObjectView('/col1'), ...  % object view of target column
    'data', [3; 4] ...
);
table_ragged_col = types.hdmf_common.DynamicTable( ...
    'description', 'an example table', ...
    'colnames', {'col1','col1_index'}, ...
    'col1', col1, ...
    'col1_index', col1_index, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ...  % 0-indexed, for compatibility with Python
);
table_ragged_col.getRow(1)

The output is, as expected:

       col1       col1_index
    __________    __________

    {3×1 cell}        3    

However, using the suggested method of casting the object itself as an object view:

col1_index = types.hdmf_common.VectorIndex( ...
    'description', 'column #1 index',...
    'target',types.untyped.ObjectView(col1), ...  % object view of target column
    'data', [3; 4] ...
);

table_ragged_col = types.hdmf_common.DynamicTable( ...
    'description', 'an example table', ...
    'colnames', {'col1','col1_index'}, ...
    'col1', col1, ...
    'col1_index', col1_index, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ...  % 0-indexed, for compatibility with Python
);

table_ragged_col.getRow(1)

Return the clearly wrong output.

     col1     col1_index
    ______    __________

    {'1a'}        3     

Clearly, VectorIndex column is not formatting the ragged array column properly in the second scenario.
Let me know if there's something I'm not doing right.

For now, I will keep the HDF5 path in the tutorials since that produces the expected behavior.

@cechava
Copy link
Collaborator Author

cechava commented Oct 14, 2021

I can open an issue with the above, if preferable

@cechava cechava marked this pull request as draft October 14, 2021 21:59
@bendichter
Copy link
Contributor

@cechava good catch, but I think you are identifying an issue with getRow, not with ObjectView. Yes, I think this should be its own issue.

@cechava cechava marked this pull request as ready for review December 2, 2021 05:28
Copy link
Contributor

@bendichter bendichter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cechava, this is really excellent! You've already put a lot of work into this, and it's almost done! I made a few edits, for small typos and cleaned up spacing a bit. I wanted to bring up two points though:

Lines 60-73: Are you sure that should be a cell array? It seems like it should be a standard tensor.

Lines 108-129: Does this do the same thing as table_ragged_numeric.getRow(1, 'columns', {'col_num'})? If so, maybe we should just have that one and remove this, since I think getRow is easier to understand.

@cechava
Copy link
Collaborator Author

cechava commented Dec 6, 2021

@bendichter thanks for feedback.

Lines 60-73: I have corrected this in latest commit. The multidimensional table is properly formatted.

Lines 108-129: To answer your question first, the following would produce the same output:
table_ragged_numeric.getRow(1, 'columns', {'col_num'}).col_num{1}
The purpose of including this bit is to give an example of how to use the read_indexed_column utility function. I agree that getRow is easier to understand, but it returns a table rather than the data itself. Because this is a utility function that we use in other NWB tutorials, I think this bit is worth keeping. Let me know if you agree.

@bendichter
Copy link
Contributor

bendichter commented Dec 6, 2021 via email

@cechava
Copy link
Collaborator Author

cechava commented Dec 6, 2021

yes, it's easy to get the matrix from the table object. I agree with deprecating and updating tutorials.

@bendichter
Copy link
Contributor

image

start and stop time are the wrong shape

@bendichter
Copy link
Contributor

Can you please test writing for these? Lines 60-73 was still incorrect, and throws an error when writing:

col5 = types.hdmf_common.VectorData( ...
    'description', 'column #5',...
    'data', ...
    { ...
        [ ...
            [1, 2, 3]; ...
            [4, 5, 6] ...
        ]; ...
        [ ...
            [11,12,13]; ...
            [14,15,16] ...
        ] ...
    } ...
);

multi_dim_table = types.core.TimeIntervals( ...
    'description','test table', ...
    'colnames', {'col5', 'start_time', 'stop_time'}, ...
    'start_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
    'stop_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
    'col5', col5, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ...  % 0-indexed, for compatibility with Python
);
>> file.intervals.set('multi', multi_dim_table);
>> nwbExport(file, 'test_multi.nwb');

it needs to be e.g.

data = [[1, 2, 3]; [4, 5, 6]; [7, 8, 9]];
data(:, :, 2) = [ [11, 12, 13]; [14, 15, 16]; [17, 18, 19]];

col5 = types.hdmf_common.VectorData( ...
    'description', 'column #5',...
    'data', data ...
);

multi_dim_table = types.core.TimeIntervals( ...
    'description','test table', ...
    'colnames', {'col5', 'start_time', 'stop_time'}, ...
    'start_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
    'stop_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
    'col5', col5, ...
    'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ...  % 0-indexed, for compatibility with Python
);

@cechava
Copy link
Collaborator Author

cechava commented Dec 7, 2021

The reason for using cell arrays is to follow the lead of this snippet which adds multi-dimensional rows to a DynamicTable object:

'randomvalues', {rand(5,2);rand(3,2)},...

Below I test defining a multi-dimensional column by either : 1) adding rows to a table or 2) defining the column from the beginning

% -------- Build row by row
file= NwbFile( ...
    'session_start_time', '2021-01-01 00:00:00', ...
    'identifier', 'ident1', ...
    'session_description', 'test' ...
    );

colnames = {'start_time', 'stop_time', 'randomvalues'};
multi_dim_table1 = types.core.TimeIntervals(...
    'description', 'test dynamic table column',...
    'colnames', colnames);
            
for i = 1:2
    multi_dim_table1.addRow(...
        'start_time', i,...
        'stop_time', i+1,...
        'randomvalues', {rand(5,2)},...
        'id', i-1);
end
file.intervals.set('multi', multi_dim_table1);
nwbExport(file, 'test_multi_addRow.nwb');

% -------- Define in one go

start_times = types.hdmf_common.VectorData( ...
    'description', 'start_times', ...
    'data', [1;2] ...
);

stop_times = types.hdmf_common.VectorData( ...
    'description', 'stop_times', ...
    'data', [2;3] ...
);

randomvalues = types.hdmf_common.VectorData( ...
    'description', 'random values',...
    'data', ...
    [ ...
        {rand(5,2)}; ...
        {rand(5,2)} ...
    ] ...
);

multi_dim_table2 = types.core.TimeIntervals( ...
    'description','test table', ...
    'colnames', {'randomvalues', 'start_time', 'stop_time'}, ...
    'start_time', start_times, ...
    'stop_time', stop_times, ...
    'randomvalues', randomvalues, ...
     'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ...
 );
file.intervals.set('multi', multi_dim_table2);
nwbExport(file, 'test_multi_oneGo.nwb');

The second approach leads to a write error. The output of the toTable utility function is the same for both

>> multi_dim_table1.toTable()

ans =

  2×4 table

    id    start_time    stop_time    randomvalues
    __    __________    _________    ____________

    0         1             2        {5×2 double}
    1         2             3        {5×2 double}

>> multi_dim_table2.toTable()

ans =

  2×4 table

    id    randomvalues    start_time    stop_time
    __    ____________    __________    _________

    0     {5×2 double}        1             2    
    1     {5×2 double}        2             3    

The difference between both these tables is in the fact that the table that was built row-by-row has an index for the multidimensional column.

>> multi_dim_table1.vectordata

ans = 

  2×1 Set array with properties:

          randomvalues: [types.hdmf_common.VectorData]
    randomvalues_index: [types.hdmf_common.VectorIndex]

>> multi_dim_table2.vectordata

ans = 

  Set with properties:

    randomvalues: [types.hdmf_common.VectorData]
    

We probably want to do something similar for any table that is created with multidimensional columns from the beginning. Do you agree @ln-vidrio?

Including the error traceback below for reference:

Error using unicode2native
At least one dimension must be equal to 1.

Error in io.mapData2H5 (line 62)
            data{i} = char(unicode2native(char(data{i})));

Error in io.writeDataset (line 5)
[tid, sid, data] = io.mapData2H5(fid, data, varargin{:});

Error in types.untyped.MetaClass/write_base (line 26)
                    io.writeDataset(fid, fullpath, obj.data, 'forceArray');

Error in types.untyped.MetaClass/export (line 66)
            refs = obj.write_base(fid, fullpath, refs);

Error in types.hdmf_common.Data/export (line 39)
        refs = export@types.untyped.MetaClass(obj, fid, fullpath, refs);

Error in types.hdmf_common.VectorData/export (line 53)
        refs = export@types.hdmf_common.Data(obj, fid, fullpath, refs);

Error in types.untyped.Set/export (line 181)
                    refs = v.export(fid, propfp, refs);

Error in types.hdmf_common.DynamicTable/export (line 114)
            refs = obj.vectordata.export(fid, fullpath, refs);

Error in types.core.TimeIntervals/export (line 91)
        refs = export@types.hdmf_common.DynamicTable(obj, fid, fullpath, refs);

Error in types.untyped.Set/export (line 181)
                    refs = v.export(fid, propfp, refs);

Error in types.core.NWBFile/export (line 1016)
            refs = obj.intervals.export(fid, [fullpath '/intervals'], refs);

Error in NwbFile/export (line 61)
                refs = export@types.core.NWBFile(obj, output_file_id, '/', {});

Error in nwbExport (line 35)
    export(nwb(i), filename);

Error in hdmf_tutorial_test (line 227)
nwbExport(file, 'test_multi_oneGo.nwb');

@cechava
Copy link
Collaborator Author

cechava commented Dec 7, 2021

though looking at how things are done in PyNWB, I see @bendichter's point. I generated an analogous table with multi-dimensional columns in pynwb:

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
import numpy as np
from pynwb import NWBHDF5IO

start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())

nwbfile = NWBFile(session_description='demonstrate NWBFile basics',  # required
                 identifier='NWB123',  # required
                 session_start_time=start_time)

nwbfile.add_trial_column(name='randomvalues', description='')

for i in range(2):
   nwbfile.add_trial(start_time=float(i+1), stop_time=float(i+2), randomvalues=np.random.rand(3,2),id = i)
with NWBHDF5IO('pynwb_multi_test.nwb', 'w') as io:
   io.write(nwbfile)

The following is the format of that multi-dimensional column:
Screen Shot 2021-12-07 at 2 00 25 PM

The different rows are stacked along the third dimension. In contrast, the way it is done in matnwb now produces this:
Screen Shot 2021-12-07 at 2 01 36 PM

I think it's worth trying to match the way multi-dimensional columns are written to file.

@lawrence-mbf
Copy link
Collaborator

So I'm not too clear on the precise issue. I think the misunderstanding is that addRow is intended to literally only add one "row" (defined as only one ID per call of the function). The cell arrays in addRow are used solely to create and test nested ragged arrays and has nothing to do with multidimensional rows nor are used with any types.hdmf_common.* classes.

In fact, calling this snippet twice:

for i = 1:2
  DynamicTable.addRow('randomvalues', rand(1, 3,2));
end

Should produce something similar as in pynwb as concatenation is automatically done on the first dimension (in MATLAB).

@cechava
Copy link
Collaborator Author

cechava commented Dec 7, 2021

I see. Thanks for clarifying. I misunderstood the intention behind the cell array

@cechava
Copy link
Collaborator Author

cechava commented Dec 7, 2021

ok so properly defining multidimensional columns encounters an error with the new checkConfig utility function. I have opened an issue and will fix the function so it can handle multi-dimensional columns. I will also add to the DynamicTable tests so that multidimensional columns are covered in case something breaks them in the future.

@cechava
Copy link
Collaborator Author

cechava commented Jan 13, 2022

depends on PR #387 and #388 to work properly

@cechava cechava requested a review from bendichter February 2, 2022 10:18
@cechava
Copy link
Collaborator Author

cechava commented Feb 2, 2022

updated and tested after merging of PR #387 and #388

@bendichter bendichter merged commit 412290d into NeurodataWithoutBorders:master Feb 17, 2022
lawrence-mbf pushed a commit that referenced this pull request Feb 17, 2022
* tutorial covering DynamicTable basics

* style changes

* added warnings about empty id fields; added demo of how to add rows to a ragged array column

* added warnings about empty id fields; added demo of how to add rows to a ragged array column

* aded demo of utils.read_indexed_column to get row elements from a ragged array column

* ObjectView takes object instead of full HDF5 path

* added expandable table demo to tutorial

* added info related to addColumns, toTable, and configCheck utility functions

* fixed typo

* fixed a few typos

* fixed multidimensional table

* uncommented setup

* removal of read_indexed_column utility function

* corrected and tested multi-dimensional column tables

* updated hdmf tutorial to work with rows-last convention for multidimensional columns; wont work until PR 367 is merged

* progress save

* updated tutorials for expected behavior of addrow

* updated tutorial for multidimensional columns

* uncomment setup

* change name to dynamic_tables.mlx

* add html
change title
add link to README

Co-authored-by: cechava <cesar@Cesars-MacBook-Pro-2.local>
Co-authored-by: bendichter <ben.dichter@gmail.com>
Co-authored-by: cechava <cesar@cesars-mbp-2.attlocal.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants