-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tutorial covering DynamicTable basics #320
tutorial covering DynamicTable basics #320
Conversation
Codecov Report
@@ Coverage Diff @@
## master #320 +/- ##
=======================================
Coverage 85.66% 85.66%
=======================================
Files 125 125
Lines 4709 4709
=======================================
Hits 4034 4034
Misses 675 675 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, @cechava ! You are definitely on the right track.
If no value is passed to id, the corresponding field within the DynamicTable object is an empty array.
When you nwbExport
, does this create a DynamicTable
that contains an ElementIdentifiers
with a different length from the columns in the table? That is not allowed
You may also supply an optional row ID.
What happens if an ID is not provided?
Ragged array columns
Could you add info about util.create_indexed_column
?
Now, you can write the file, read it back, and run:
Can you demonstrate this?
@bendichter is there a way to write a bare-bones DynamicTable object to file? In the DataPipe tutorial a numeric matrix is first put into a TimeSeries object (which is itself a DynamicTable) before putting into a NWBFile and exported. A bare-boned DynamicTable object might have non-numeric data so this is not the best. |
Sorry, I don’t understand the question. Is this about DynamicTables or
DataPipes?
On Thu, Oct 14, 2021 at 12:19 PM cechava ***@***.***> wrote:
@bendichter <https://github.com/bendichter> is there a way to write a
bare-bones DynamicTable object to file?
In Python HDMF one can do this with HDF5IO from the hdmf.backends.hdf5
module.
In the DataPipe tutorial a numeric matrix is first put into a TimeSeries
object (which is itself a DynamicTable) before putting into a NWBFile and
exported. A bare-boned DynamicTable object might have non-numeric data so
this is not the best.
Do you know of any other way?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#320 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGOEEQBOSDTLOJWFBJRTHTUG37HZANCNFSM5F7X26TA>
.
--
Ben Dichter, PhD
Data Science Consultant
personal website <http://bendichter.com/>
|
Tagging @ln-vidrio, in case they know
|
@bendichter For example, when I run the following code to create a DynamicTable with ragged array columns: col1 = types.hdmf_common.VectorData( ...
'description', 'column #1', ...
'data', {'1a'; '1b'; '1c'; '2a'} ...
);
col1_index = types.hdmf_common.VectorIndex( ...
'description', 'column #1 index',...
'target',types.untyped.ObjectView('/col1'), ... % object view of target column
'data', [3; 4] ...
);
table_ragged_col = types.hdmf_common.DynamicTable( ...
'description', 'an example table', ...
'colnames', {'col1','col1_index'}, ...
'col1', col1, ...
'col1_index', col1_index, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ... % 0-indexed, for compatibility with Python
);
table_ragged_col.getRow(1) The output is, as expected:
However, using the suggested method of casting the object itself as an object view: col1_index = types.hdmf_common.VectorIndex( ...
'description', 'column #1 index',...
'target',types.untyped.ObjectView(col1), ... % object view of target column
'data', [3; 4] ...
);
table_ragged_col = types.hdmf_common.DynamicTable( ...
'description', 'an example table', ...
'colnames', {'col1','col1_index'}, ...
'col1', col1, ...
'col1_index', col1_index, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ... % 0-indexed, for compatibility with Python
);
table_ragged_col.getRow(1) Return the clearly wrong output.
Clearly, VectorIndex column is not formatting the ragged array column properly in the second scenario. For now, I will keep the HDF5 path in the tutorials since that produces the expected behavior. |
I can open an issue with the above, if preferable |
…o a ragged array column
…o a ragged array column
@cechava good catch, but I think you are identifying an issue with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cechava, this is really excellent! You've already put a lot of work into this, and it's almost done! I made a few edits, for small typos and cleaned up spacing a bit. I wanted to bring up two points though:
Lines 60-73: Are you sure that should be a cell array? It seems like it should be a standard tensor.
Lines 108-129: Does this do the same thing as table_ragged_numeric.getRow(1, 'columns', {'col_num'})
? If so, maybe we should just have that one and remove this, since I think getRow
is easier to understand.
@bendichter thanks for feedback. Lines 60-73: I have corrected this in latest commit. The multidimensional table is properly formatted. Lines 108-129: To answer your question first, the following would produce the same output: |
It was a utility function that I wrote before `getRow` was available. I'm
thinking maybe we should deprecate it in favor of `getRow`, and adjust the
tutorial accordingly (we need to update them to use `addRow` and
`addColumn` anyway). What do you think? It's easy to get the matrix from
the MATLAB Table object, right?
…On Mon, Dec 6, 2021 at 9:39 AM cechava ***@***.***> wrote:
@bendichter <https://github.com/bendichter> thanks for feedback.
Lines 60-73: I have corrected this in latest commit. The multidimensional
table is properly formatted.
Lines 108-129: To answer your question first, the following would produce
the same output:
table_ragged_numeric.getRow(1, 'columns', {'col_num'}).col_num{1}
The purpose of including this bit is to give an example of how to use the
read_indexed_column utility function. I agree that getRow is easier to
understand, but it returns a table rather than the data itself. Because
this is a utility function that we use in other NWB tutorials, I think this
bit is worth keeping. Let me know if you agree.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#320 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGOEEQUGMQVANVRDT4PEZLUPTYO3ANCNFSM5F7X26TA>
.
--
Ben Dichter, PhD
Data Science Consultant
personal website <http://bendichter.com/>
|
yes, it's easy to get the matrix from the table object. I agree with deprecating and updating tutorials. |
Can you please test writing for these? Lines 60-73 was still incorrect, and throws an error when writing: col5 = types.hdmf_common.VectorData( ...
'description', 'column #5',...
'data', ...
{ ...
[ ...
[1, 2, 3]; ...
[4, 5, 6] ...
]; ...
[ ...
[11,12,13]; ...
[14,15,16] ...
] ...
} ...
);
multi_dim_table = types.core.TimeIntervals( ...
'description','test table', ...
'colnames', {'col5', 'start_time', 'stop_time'}, ...
'start_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
'stop_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
'col5', col5, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ... % 0-indexed, for compatibility with Python
);
>> file.intervals.set('multi', multi_dim_table);
>> nwbExport(file, 'test_multi.nwb'); it needs to be e.g. data = [[1, 2, 3]; [4, 5, 6]; [7, 8, 9]];
data(:, :, 2) = [ [11, 12, 13]; [14, 15, 16]; [17, 18, 19]];
col5 = types.hdmf_common.VectorData( ...
'description', 'column #5',...
'data', data ...
);
multi_dim_table = types.core.TimeIntervals( ...
'description','test table', ...
'colnames', {'col5', 'start_time', 'stop_time'}, ...
'start_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
'stop_time', types.hdmf_common.VectorData('description', 'hi', 'data', [1.; 2.]), ...
'col5', col5, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ... % 0-indexed, for compatibility with Python
); |
The reason for using cell arrays is to follow the lead of this snippet which adds multi-dimensional rows to a matnwb/+tests/+system/DynamicTableTest.m Line 14 in 131fdff
Below I test defining a multi-dimensional column by either : 1) adding rows to a table or 2) defining the column from the beginning % -------- Build row by row
file= NwbFile( ...
'session_start_time', '2021-01-01 00:00:00', ...
'identifier', 'ident1', ...
'session_description', 'test' ...
);
colnames = {'start_time', 'stop_time', 'randomvalues'};
multi_dim_table1 = types.core.TimeIntervals(...
'description', 'test dynamic table column',...
'colnames', colnames);
for i = 1:2
multi_dim_table1.addRow(...
'start_time', i,...
'stop_time', i+1,...
'randomvalues', {rand(5,2)},...
'id', i-1);
end
file.intervals.set('multi', multi_dim_table1);
nwbExport(file, 'test_multi_addRow.nwb');
% -------- Define in one go
start_times = types.hdmf_common.VectorData( ...
'description', 'start_times', ...
'data', [1;2] ...
);
stop_times = types.hdmf_common.VectorData( ...
'description', 'stop_times', ...
'data', [2;3] ...
);
randomvalues = types.hdmf_common.VectorData( ...
'description', 'random values',...
'data', ...
[ ...
{rand(5,2)}; ...
{rand(5,2)} ...
] ...
);
multi_dim_table2 = types.core.TimeIntervals( ...
'description','test table', ...
'colnames', {'randomvalues', 'start_time', 'stop_time'}, ...
'start_time', start_times, ...
'stop_time', stop_times, ...
'randomvalues', randomvalues, ...
'id', types.hdmf_common.ElementIdentifiers('data', [0; 1]) ...
);
file.intervals.set('multi', multi_dim_table2);
nwbExport(file, 'test_multi_oneGo.nwb');
The second approach leads to a write error. The output of the >> multi_dim_table1.toTable()
ans =
2×4 table
id start_time stop_time randomvalues
__ __________ _________ ____________
0 1 2 {5×2 double}
1 2 3 {5×2 double}
>> multi_dim_table2.toTable()
ans =
2×4 table
id randomvalues start_time stop_time
__ ____________ __________ _________
0 {5×2 double} 1 2
1 {5×2 double} 2 3 The difference between both these tables is in the fact that the table that was built row-by-row has an index for the multidimensional column. >> multi_dim_table1.vectordata
ans =
2×1 Set array with properties:
randomvalues: [types.hdmf_common.VectorData]
randomvalues_index: [types.hdmf_common.VectorIndex]
>> multi_dim_table2.vectordata
ans =
Set with properties:
randomvalues: [types.hdmf_common.VectorData]
We probably want to do something similar for any table that is created with multidimensional columns from the beginning. Do you agree @ln-vidrio? Including the error traceback below for reference: Error using unicode2native
At least one dimension must be equal to 1.
Error in io.mapData2H5 (line 62)
data{i} = char(unicode2native(char(data{i})));
Error in io.writeDataset (line 5)
[tid, sid, data] = io.mapData2H5(fid, data, varargin{:});
Error in types.untyped.MetaClass/write_base (line 26)
io.writeDataset(fid, fullpath, obj.data, 'forceArray');
Error in types.untyped.MetaClass/export (line 66)
refs = obj.write_base(fid, fullpath, refs);
Error in types.hdmf_common.Data/export (line 39)
refs = export@types.untyped.MetaClass(obj, fid, fullpath, refs);
Error in types.hdmf_common.VectorData/export (line 53)
refs = export@types.hdmf_common.Data(obj, fid, fullpath, refs);
Error in types.untyped.Set/export (line 181)
refs = v.export(fid, propfp, refs);
Error in types.hdmf_common.DynamicTable/export (line 114)
refs = obj.vectordata.export(fid, fullpath, refs);
Error in types.core.TimeIntervals/export (line 91)
refs = export@types.hdmf_common.DynamicTable(obj, fid, fullpath, refs);
Error in types.untyped.Set/export (line 181)
refs = v.export(fid, propfp, refs);
Error in types.core.NWBFile/export (line 1016)
refs = obj.intervals.export(fid, [fullpath '/intervals'], refs);
Error in NwbFile/export (line 61)
refs = export@types.core.NWBFile(obj, output_file_id, '/', {});
Error in nwbExport (line 35)
export(nwb(i), filename);
Error in hdmf_tutorial_test (line 227)
nwbExport(file, 'test_multi_oneGo.nwb'); |
though looking at how things are done in PyNWB, I see @bendichter's point. I generated an analogous table with multi-dimensional columns in pynwb: from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile
import numpy as np
from pynwb import NWBHDF5IO
start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
nwbfile = NWBFile(session_description='demonstrate NWBFile basics', # required
identifier='NWB123', # required
session_start_time=start_time)
nwbfile.add_trial_column(name='randomvalues', description='')
for i in range(2):
nwbfile.add_trial(start_time=float(i+1), stop_time=float(i+2), randomvalues=np.random.rand(3,2),id = i)
with NWBHDF5IO('pynwb_multi_test.nwb', 'w') as io:
io.write(nwbfile) The following is the format of that multi-dimensional column: The different rows are stacked along the third dimension. In contrast, the way it is done in matnwb now produces this: I think it's worth trying to match the way multi-dimensional columns are written to file. |
So I'm not too clear on the precise issue. I think the misunderstanding is that In fact, calling this snippet twice: for i = 1:2
DynamicTable.addRow('randomvalues', rand(1, 3,2));
end Should produce something similar as in pynwb as concatenation is automatically done on the first dimension (in MATLAB). |
I see. Thanks for clarifying. I misunderstood the intention behind the cell array |
ok so properly defining multidimensional columns encounters an error with the new checkConfig utility function. I have opened an issue and will fix the function so it can handle multi-dimensional columns. I will also add to the DynamicTable tests so that multidimensional columns are covered in case something breaks them in the future. |
…nsional columns; wont work until PR 367 is merged
* tutorial covering DynamicTable basics * style changes * added warnings about empty id fields; added demo of how to add rows to a ragged array column * added warnings about empty id fields; added demo of how to add rows to a ragged array column * aded demo of utils.read_indexed_column to get row elements from a ragged array column * ObjectView takes object instead of full HDF5 path * added expandable table demo to tutorial * added info related to addColumns, toTable, and configCheck utility functions * fixed typo * fixed a few typos * fixed multidimensional table * uncommented setup * removal of read_indexed_column utility function * corrected and tested multi-dimensional column tables * updated hdmf tutorial to work with rows-last convention for multidimensional columns; wont work until PR 367 is merged * progress save * updated tutorials for expected behavior of addrow * updated tutorial for multidimensional columns * uncomment setup * change name to dynamic_tables.mlx * add html change title add link to README Co-authored-by: cechava <cesar@Cesars-MacBook-Pro-2.local> Co-authored-by: bendichter <ben.dichter@gmail.com> Co-authored-by: cechava <cesar@cesars-mbp-2.attlocal.net>
hdmf_common objects covered:
resolves issue #318