Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NixIO: Major update and rewrite #476

Merged
merged 55 commits into from Mar 21, 2018

Conversation

achilleas-k
Copy link
Contributor

NIX IO MAJOR REWRITE

This has been a long time coming and I'm happy to announce that it's finally here.

This PR is pretty much a complete rewrite of the IO. I wrote the old version with a strong attempt to unify parts of the reading and writing that were common between objects. I realised this was a mistake, as the differences made for a very complicated execution path which was hard to read and maintain.

Now, almost every object's read and write methods are separate, with much fewer (self-contained) common methods in the mix. This does increase the amount of duplicate code, but I believe it makes the IO much easier to read and maintain.

I understand this is a big PR and no one has time to read through the whole thing. I tried to keep the commits concise in case someone wants to go through the changes. That said, it's probably better if the whole thing be squash-merged to avoid polluting the repo log with dozens of commits for this single IO.

Some details about the changes in the IO follow.

Feature changes

  • Lazy loading removed.
  • Cascade not optional (always cascades).
  • No partial rewrites: The old IO used to try to rewrite as little as possible, but this created more issues than it solved. The main issue was that determining if an object should be rewritten or not probably took as much processing or time as actually writing it, which doubled the effort if it did need overwriting. Now, on write, if a Block with the same name is found in the file, the entire tree (rooted at the Block) is removed before rewriting (more details on G-Node wiki for NixIO).
  • Unit handling: NIX now supports arbitrary strings for units so even compound units can now be stored and recovered (more details on G-Node wiki for NixIO).
  • Specific Block loading can be performed in one of three ways:
    • Index (position) in the file
    • Name of the NIX Block
    • Name of the original Neo Block (not recommended unless names are unique)
  • The read_block method may also be called without arguments, in which case Blocks are returned in sequence on each invocation. After the last Block is returned, the method returns None.
  • A new method iter_blocks returns an iterator for reading Blocks in order.
  • Neo version stored in file: On file creation, the Neo version is stored in the NIX file's top-level metadata. When changes are made to the format or the API (e.g., the upcoming change in waveform dimension order), the file version will specify how data should be read.
    • Currently, during read, the version is not used in any way since reading should still be compatible with older files.

Method names

  • It was never possible to save any object directly to NIX except a Block. All other objects had to be stored by cascading down from the top level Block. So now, all write_<neotype> methods (except write_block and write_all_blocks) have been renamed to _write_<neotype> to signify that they are private and shouldn't be used directly.
  • As with writing, it was never possible to read an object directly without loading from the root Block. All load methods (except read_block and read_all_blocks) have been renamed from read_<neotype> to _nix_to_neo_<neotype>

Minor notes

  • Added vim swap files to .gitignore. Hope that's OK!

Delete top-level block and metadata, removing all child objects, when
it already exists.
DataArrays created to hold the times and durations of Events, Epochs,
and SpikeTrains should not be added to the Group which represents the
Segment in which the parent object is located.
When the same data object (Signal, Event, Epoch, or SpikeTrain) is
attached to multiple Segments, the IO should not attempt to recreate it,
but instead take it from the root Block and append it to the Group.
MultiTag to DataArray references:
- All MultiTags which represent Epochs or Events reference all
DataArrays which represent Signals in the same Segment/Group.

ChannelIndex and Unit references:
- If a ChannelIndex references a Signal, the corresponding DataArray
references the corresponding Source.
- If a Unit references a SpikeTrain, the corresponding MultiTag
references the corresponding Source as well as the corresponding Source
of the parent ChannelIndex.
Blocks can be accessed directly using index or name.
Block iteration also supported.
Separate methods for AnalogSignal and IrregularlySampledSignal objects
@samuelgarcia samuelgarcia added this to the 0.6.0 milestone Feb 20, 2018
@samuelgarcia
Copy link
Contributor

@achilleas-k : have you seen error on circle that occur only for 2.7 and related to nixio module ?

InvalidUnit: InvalidUnit: yr is not SI or composite of SI units evoked at: DataArray.unit

and

InvalidUnit: InvalidUnit: 1/30000*V is not SI or composite of SI units evoked at: DataArray.unit

Does it means there different version of nixio woth py27 and py3X ?

@achilleas-k
Copy link
Contributor Author

In the latest version of NIXPY (1.4.3) we allow any string to be used for units. The 2.7 tests on circleci are using an older version. I was wondering if that build didn't install the latest nixpy because it had an older version cached. From the logs, it seems to have skipped the install (already satisfied), while the 3.6 build downloaded the newest version (and passed).

I should probably put a version requirement in the .circleci/requirements_testing.txt, right?

@samuelgarcia
Copy link
Contributor

@achilleas-k :Thank you.

@apdavison @JuliaSprenger : do you plan a deep review of this ? I have done a slight overview but the code is long and I don't nixio. I propose to merge soon.

@JuliaSprenger
Copy link
Member

I want to have a look at it before merging. I will have a look at it this week.

@JuliaSprenger
Copy link
Member

There is a number of corner cases not covered yet by the IO

  • annotations can not contain empty lists (eg bl = neo.Block(name='myblock', myannotation=[])
  • if an analogsignal is only connected to a channel_index, but not a segment the block can not be saved, because the analogsignal receives no 'nix_name' annotation. The same is true for spiketrains.

Additional remarks
Will future versions of the IO also support annotation values which are lists of iterables (eg annotations['comments'] = ['comment1','comment2']? Currently this yields a warning 'Multidimensional arrays and nested containers are not currently supported when writing to NIX.' and does not save this particular annotation pair.

Also saving files using nixpy takes a long time depending on the number of neo objects contained in the block structure. Loading files however works nice and fast!

@samuelgarcia
Copy link
Contributor

Hi @achilleas-k and @JuliaSprenger
My comment:

  • annotations with empty list should be possible. I don't known how much work it represent.
  • saving a tree without neo.Segment is very hard in neo because this is the main hierachy.
    Block>ChannelIndex>AnalogSIgnal is a secondary hierachy.
    Making it in hdf5 is even harder because it is a tree. Making it in SQL is easier! NeoMatlabIO
    and old NeoHdf5IO have the same issue. This is very tricky to implement it in hdf5. Since the neo
    container could change soon, I would prefer to not implement the possibility to save a tree with
    no Segment because it is a lot of work and maybe this will be depreciated.
  • yes, array_annotations must be anticipated.

I propose to merge this PR soon (if not today!!) and them to make a new issue with Julia cooments for next release than could be implemented in next release (0.7).
What do you think ?

@achilleas-k
Copy link
Contributor Author

The IO assumes that AnalogSignals found under the ChannelIndex tree are already attached to a Segment. I could make it so that are created if they are not. In NIX, this would mean they're created on the Block but aren't attached to a Group (Segment).

I can make this change if you think it's a valid Neo structure.

Supporting annotations as lists of iterables might be trickier, but it should be doable. That it doesn't currently support a list of strings (it regards strings as iterables) is a bug, so I should fix this.

@samuelgarcia I'm OK with merging now and opening issues. It will make it easier to see review (and possibly discuss) the changes individually.

@JuliaSprenger
Copy link
Member

So from the current description of Neo I thought having only the secondary hierarchy via the ChannelIndexes would also be a valid Neo structure. I used this for example to save SpikeTrains where I was only interested in the unit assignment and not in a relation to a Segment. But I agree that we should move this to Neo version 0.7.

@achilleas-k: Is it realistic to allow for annotations containing an empty list still this week or should we move it to neo 0.7?

@samuelgarcia
Copy link
Contributor

Yes, it is a totally valid structure in the actual neo tree.
My purpose is that it is hard to cover all case when dumping to a file. Hdf5 for instance.

@achilleas-k
Copy link
Contributor Author

I've opened issues for secondary structure, empty annotations, and multidimensional annotation support.

Tests writing (and reading back) empty lists, empty tuples, and list of
strings. An empty tuple is read back as an empty list.
Empty lists and tuples: If an annotation's value is an empty iterable,
the property value is set to empty string and the definition is used to
signify that the original value was an EMPTY LIST. On read, this is used
to set the annotation to [].
Fixes NeuralEnsemble#500

Lists of strings: Multidimensional data are not supported. Lists of
strings were being detected as multidimensional because strings are
iterables. Items in an iterable are now checked against string types
before checking if they're any other type of iterable.
@achilleas-k
Copy link
Contributor Author

Empty list support for annotations is a bit of a hack right now. It might become nicer in the future if/when we add support for empty list property values in NIX.

For AnalogSignal, IrregularlySampledSignal, and SpikeTrain writing, the
parent group argument is now optional. This is the first step towards
supporting those objects without them being attached to a Segment (i.e.,
when they are only in the ChannelIndex substructure).
If new objects are found in the ChannelIndex substructure (determined by
the lack of nix_name annotation), they are created on the parent Block
without being attached to a Group.

Fixes NeuralEnsemble#499
Tests for objects existing in the ChannelIndex substructure without
being attached to a Segment.
@achilleas-k
Copy link
Contributor Author

Fixed issue #499 as well. Give it a try and let me know how it feels.

Really fixes NeuralEnsemble#499.
After reading Groups and creating Segments, the reader now runs through
the Block and collects any AnalogSignal and IrregularlySamplesSignal
DataArrays as well as SpikeTrain MultiTags that haven't been read yet
and runs them through the reader conversion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants