NixIO: Major update and rewrite #476

achilleas-k · 2018-02-09T19:20:51Z

NIX IO MAJOR REWRITE

This has been a long time coming and I'm happy to announce that it's finally here.

This PR is pretty much a complete rewrite of the IO. I wrote the old version with a strong attempt to unify parts of the reading and writing that were common between objects. I realised this was a mistake, as the differences made for a very complicated execution path which was hard to read and maintain.

Now, almost every object's read and write methods are separate, with much fewer (self-contained) common methods in the mix. This does increase the amount of duplicate code, but I believe it makes the IO much easier to read and maintain.

I understand this is a big PR and no one has time to read through the whole thing. I tried to keep the commits concise in case someone wants to go through the changes. That said, it's probably better if the whole thing be squash-merged to avoid polluting the repo log with dozens of commits for this single IO.

Some details about the changes in the IO follow.

Feature changes

Lazy loading removed.
Cascade not optional (always cascades).
No partial rewrites: The old IO used to try to rewrite as little as possible, but this created more issues than it solved. The main issue was that determining if an object should be rewritten or not probably took as much processing or time as actually writing it, which doubled the effort if it did need overwriting. Now, on write, if a Block with the same name is found in the file, the entire tree (rooted at the Block) is removed before rewriting (more details on G-Node wiki for NixIO).
Unit handling: NIX now supports arbitrary strings for units so even compound units can now be stored and recovered (more details on G-Node wiki for NixIO).
Specific Block loading can be performed in one of three ways:
- Index (position) in the file
- Name of the NIX Block
- Name of the original Neo Block (not recommended unless names are unique)
The read_block method may also be called without arguments, in which case Blocks are returned in sequence on each invocation. After the last Block is returned, the method returns None.
A new method iter_blocks returns an iterator for reading Blocks in order.
Neo version stored in file: On file creation, the Neo version is stored in the NIX file's top-level metadata. When changes are made to the format or the API (e.g., the upcoming change in waveform dimension order), the file version will specify how data should be read.
- Currently, during read, the version is not used in any way since reading should still be compatible with older files.

Method names

It was never possible to save any object directly to NIX except a Block. All other objects had to be stored by cascading down from the top level Block. So now, all write_<neotype> methods (except write_block and write_all_blocks) have been renamed to _write_<neotype> to signify that they are private and shouldn't be used directly.
As with writing, it was never possible to read an object directly without loading from the root Block. All load methods (except read_block and read_all_blocks) have been renamed from read_<neotype> to _nix_to_neo_<neotype>

Minor notes

Added vim swap files to .gitignore. Hope that's OK!

For Event and Epoch

Delete top-level block and metadata, removing all child objects, when it already exists.

DataArrays created to hold the times and durations of Events, Epochs, and SpikeTrains should not be added to the Group which represents the Segment in which the parent object is located.

When the same data object (Signal, Event, Epoch, or SpikeTrain) is attached to multiple Segments, the IO should not attempt to recreate it, but instead take it from the root Block and append it to the Group.

MultiTag to DataArray references: - All MultiTags which represent Epochs or Events reference all DataArrays which represent Signals in the same Segment/Group. ChannelIndex and Unit references: - If a ChannelIndex references a Signal, the corresponding DataArray references the corresponding Source. - If a Unit references a SpikeTrain, the corresponding MultiTag references the corresponding Source as well as the corresponding Source of the parent ChannelIndex.

Blocks can be accessed directly using index or name. Block iteration also supported.

Separate methods for AnalogSignal and IrregularlySampledSignal objects

See NeuralEnsemble#410

samuelgarcia · 2018-03-06T09:28:40Z

@achilleas-k : have you seen error on circle that occur only for 2.7 and related to nixio module ?

InvalidUnit: InvalidUnit: yr is not SI or composite of SI units evoked at: DataArray.unit

and

InvalidUnit: InvalidUnit: 1/30000*V is not SI or composite of SI units evoked at: DataArray.unit

Does it means there different version of nixio woth py27 and py3X ?

achilleas-k · 2018-03-06T10:26:22Z

In the latest version of NIXPY (1.4.3) we allow any string to be used for units. The 2.7 tests on circleci are using an older version. I was wondering if that build didn't install the latest nixpy because it had an older version cached. From the logs, it seems to have skipped the install (already satisfied), while the 3.6 build downloaded the newest version (and passed).

I should probably put a version requirement in the .circleci/requirements_testing.txt, right?

samuelgarcia · 2018-03-06T11:36:11Z

@achilleas-k :Thank you.

@apdavison @JuliaSprenger : do you plan a deep review of this ? I have done a slight overview but the code is long and I don't nixio. I propose to merge soon.

JuliaSprenger · 2018-03-06T11:40:26Z

I want to have a look at it before merging. I will have a look at it this week.

JuliaSprenger · 2018-03-20T18:34:49Z

There is a number of corner cases not covered yet by the IO

annotations can not contain empty lists (eg bl = neo.Block(name='myblock', myannotation=[])
if an analogsignal is only connected to a channel_index, but not a segment the block can not be saved, because the analogsignal receives no 'nix_name' annotation. The same is true for spiketrains.

Additional remarks
Will future versions of the IO also support annotation values which are lists of iterables (eg annotations['comments'] = ['comment1','comment2']? Currently this yields a warning 'Multidimensional arrays and nested containers are not currently supported when writing to NIX.' and does not save this particular annotation pair.

Also saving files using nixpy takes a long time depending on the number of neo objects contained in the block structure. Loading files however works nice and fast!

samuelgarcia · 2018-03-21T09:00:17Z

Hi @achilleas-k and @JuliaSprenger
My comment:

annotations with empty list should be possible. I don't known how much work it represent.
saving a tree without neo.Segment is very hard in neo because this is the main hierachy.
Block>ChannelIndex>AnalogSIgnal is a secondary hierachy.
Making it in hdf5 is even harder because it is a tree. Making it in SQL is easier! NeoMatlabIO
and old NeoHdf5IO have the same issue. This is very tricky to implement it in hdf5. Since the neo
container could change soon, I would prefer to not implement the possibility to save a tree with
no Segment because it is a lot of work and maybe this will be depreciated.
yes, array_annotations must be anticipated.

I propose to merge this PR soon (if not today!!) and them to make a new issue with Julia cooments for next release than could be implemented in next release (0.7).
What do you think ?

achilleas-k · 2018-03-21T11:41:36Z

The IO assumes that AnalogSignals found under the ChannelIndex tree are already attached to a Segment. I could make it so that are created if they are not. In NIX, this would mean they're created on the Block but aren't attached to a Group (Segment).

I can make this change if you think it's a valid Neo structure.

Supporting annotations as lists of iterables might be trickier, but it should be doable. That it doesn't currently support a list of strings (it regards strings as iterables) is a bug, so I should fix this.

@samuelgarcia I'm OK with merging now and opening issues. It will make it easier to see review (and possibly discuss) the changes individually.

JuliaSprenger · 2018-03-21T12:21:07Z

So from the current description of Neo I thought having only the secondary hierarchy via the ChannelIndexes would also be a valid Neo structure. I used this for example to save SpikeTrains where I was only interested in the unit assignment and not in a relation to a Segment. But I agree that we should move this to Neo version 0.7.

@achilleas-k: Is it realistic to allow for annotations containing an empty list still this week or should we move it to neo 0.7?

samuelgarcia · 2018-03-21T12:24:24Z

Yes, it is a totally valid structure in the actual neo tree.
My purpose is that it is hard to cover all case when dumping to a file. Hdf5 for instance.

achilleas-k · 2018-03-21T12:34:41Z

I've opened issues for secondary structure, empty annotations, and multidimensional annotation support.

Tests writing (and reading back) empty lists, empty tuples, and list of strings. An empty tuple is read back as an empty list.

Empty lists and tuples: If an annotation's value is an empty iterable, the property value is set to empty string and the definition is used to signify that the original value was an EMPTY LIST. On read, this is used to set the annotation to []. Fixes NeuralEnsemble#500 Lists of strings: Multidimensional data are not supported. Lists of strings were being detected as multidimensional because strings are iterables. Items in an iterable are now checked against string types before checking if they're any other type of iterable.

achilleas-k · 2018-03-21T16:37:50Z

Empty list support for annotations is a bit of a hack right now. It might become nicer in the future if/when we add support for empty list property values in NIX.

For AnalogSignal, IrregularlySampledSignal, and SpikeTrain writing, the parent group argument is now optional. This is the first step towards supporting those objects without them being attached to a Segment (i.e., when they are only in the ChannelIndex substructure).

If new objects are found in the ChannelIndex substructure (determined by the lack of nix_name annotation), they are created on the parent Block without being attached to a Group. Fixes NeuralEnsemble#499

Tests for objects existing in the ChannelIndex substructure without being attached to a Segment.

achilleas-k · 2018-03-21T17:23:14Z

Fixed issue #499 as well. Give it a try and let me know how it feels.

Really fixes NeuralEnsemble#499. After reading Groups and creating Segments, the reader now runs through the Block and collects any AnalogSignal and IrregularlySamplesSignal DataArrays as well as SpikeTrain MultiTags that haven't been read yet and runs them through the reader conversion.

achilleas-k added 30 commits February 8, 2018 16:52

[NixIO] Fix for 0d iterable attributes

bb0ff39

[NixIO] Initial support for arbitrary and scaled units

a212558

[nixio] Major rewrite: Block and Segment writing

0c382aa

[nixio] Major rewrite: Signal writing

9c33b8e

[nixio] Major rewrite: ChannelIndex and Unit writing

c23abb1

[nixio] Major rewrite: Event writing

4f85a6e

[nixio] Major rewrite: Updating tests

41e933c

[nixio] Major rewrite: Epoch writing

4e97abc

[nixio] Major rewrite: Metadata location fix

b97c167

For Event and Epoch

[nixio] Major rewrite: Overwriting behaviour

e99fc17

Delete top-level block and metadata, removing all child objects, when it already exists.

[nixio] Major rewrite: Spiketrain writing

b37a0e6

[nixio] Major rewrite: Spiketrain waveforms writing

bc08660

[nixio] Major rewrite: Don't add time and dura DAs to group

cab76fa

DataArrays created to hold the times and durations of Events, Epochs, and SpikeTrains should not be added to the Group which represents the Segment in which the parent object is located.

[nixio] Major rewrite: Append existing multiref objects

cb07447

When the same data object (Signal, Event, Epoch, or SpikeTrain) is attached to multiple Segments, the IO should not attempt to recreate it, but instead take it from the root Block and append it to the Group.

[nixio] Major rewrite: Test multiref for all objects

3f10556

Add vim swap file extension(s) to gitignore

d1d0b6c

[nixio] Test: lazy cleanup, unused import, PEP8

a4197a4

[nixio] Major rewrite: Block read methods

cd7ec60

Blocks can be accessed directly using index or name. Block iteration also supported.

[nixio] Test all Block retrieval methods

b746546

[nixio] Major rewrite: Segment reading

ef35f2b

[nixio] Major rewrite: Recurse Segment children

f8b7368

[nixio] Change unsupported lazy keyword message

8b01a00

[nixio] Major rewrite: Signal reading (incomplete)

5e8bba7

[nixio] Major rewrite: Some doc strings and comments

b5f6732

[nixio] ISS t_start is read only and RangeDim doesn't need it

0c60929

[nixio] Major rewrite: Signal reading

42f16c2

Separate methods for AnalogSignal and IrregularlySampledSignal objects

[nixio] Tests: Partial rewrites no longer supported

39c92ad

[nixio] Test signal count against number of DAs

a2a2647

[nixio] Major rewrite: Private methods for non-Block

d9bc353

samuelgarcia mentioned this pull request Feb 16, 2018

NIXIO inconsistent reading and writing of empty ChannelIndex object #410

Closed

achilleas-k added 2 commits February 16, 2018 15:39

[nixio] Allow empty ChannelIndex objects on read

b78c7e3

See NeuralEnsemble#410

[nixio] Test empty ChannelIndex write and read

89aba08

samuelgarcia added this to the 0.6.0 milestone Feb 20, 2018

[circleci] Specify minimum nixio version

671e0dd

This was referenced Mar 21, 2018

[NixIO] Support signals on secondary structure only (ChannelIndex) #499

Closed

[NixIO] Support "empty" annotations #500

Open

[NixIO] Support multidimensional annotations #501

Closed

achilleas-k added 2 commits March 21, 2018 17:21

[nixio] Test special cases for annotations

d90f14d

Tests writing (and reading back) empty lists, empty tuples, and list of strings. An empty tuple is read back as an empty list.

achilleas-k added 3 commits March 21, 2018 18:06

[nixio] Write new objects found in ChannelIndex subtree

cd850cd

If new objects are found in the ChannelIndex substructure (determined by the lack of nix_name annotation), they are created on the parent Block without being attached to a Group. Fixes NeuralEnsemble#499

[nixio] Test Segmentless objects

8d60948

Tests for objects existing in the ChannelIndex substructure without being attached to a Segment.

JuliaSprenger merged commit ff6806e into NeuralEnsemble:master Mar 21, 2018

This was referenced Jul 16, 2018

Support scaled and compound units G-Node/python-neo#12

Closed

Store signal units in the DataArray.units property G-Node/python-neo#13

Closed

achilleas-k deleted the nixio-major-rewrite branch September 21, 2018 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NixIO: Major update and rewrite #476

NixIO: Major update and rewrite #476

achilleas-k commented Feb 9, 2018

samuelgarcia commented Mar 6, 2018

achilleas-k commented Mar 6, 2018

samuelgarcia commented Mar 6, 2018

JuliaSprenger commented Mar 6, 2018

JuliaSprenger commented Mar 20, 2018

samuelgarcia commented Mar 21, 2018

achilleas-k commented Mar 21, 2018

JuliaSprenger commented Mar 21, 2018

samuelgarcia commented Mar 21, 2018

achilleas-k commented Mar 21, 2018

achilleas-k commented Mar 21, 2018

achilleas-k commented Mar 21, 2018

NixIO: Major update and rewrite #476

NixIO: Major update and rewrite #476

Conversation

achilleas-k commented Feb 9, 2018

NIX IO MAJOR REWRITE

Feature changes

Method names

Minor notes

samuelgarcia commented Mar 6, 2018

achilleas-k commented Mar 6, 2018

samuelgarcia commented Mar 6, 2018

JuliaSprenger commented Mar 6, 2018

JuliaSprenger commented Mar 20, 2018

samuelgarcia commented Mar 21, 2018

achilleas-k commented Mar 21, 2018

JuliaSprenger commented Mar 21, 2018

samuelgarcia commented Mar 21, 2018

achilleas-k commented Mar 21, 2018

achilleas-k commented Mar 21, 2018

achilleas-k commented Mar 21, 2018