Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable writing (MNE-Python) annotations #77

Closed
sappelhoff opened this issue Aug 5, 2021 · 27 comments · Fixed by #86
Closed

enable writing (MNE-Python) annotations #77

sappelhoff opened this issue Aug 5, 2021 · 27 comments · Fixed by #86
Assignees
Labels
API influences the API of pybv effort:medium estimated medium effort task enhancement New feature or request impact:high estimated high impact task
Milestone

Comments

@sappelhoff
Copy link
Member

sappelhoff commented Aug 5, 2021

Currently pybv has limited functionality to write to the VMRK file:

Events to write in the marker file. This array has either two or three columns. The first column is always the zero-based index of each event (corresponding to the “time” dimension of the data array). The second column is a number associated with the “type” of event. The (optional) third column specifies the length of each event (default 1 sample). Currently all events are written as type “Stimulus” and must be numeric. Defaults to None (not writing any events).

However, the BrainVision VMRK file would support much more, see:

; Each entry: Mk<Marker number>=<Type>,<Description>,<Position in data points>,
; <Size in data points>, <Channel number (0 = marker is related to all channels)>

We could think about a feature that can write annotations as used for example in MNE-Python: Each annotation having an onset, offset duration, and description. We could even go beyond what MNE-Python currently (?) supports and support channel specific annotations (although that could be done later).

Overall I envision that users could pass both "events" and "annotations" to write_raw_brainvision and that these two inputs would be internally merged in a reasonable way and all written to the VMRK file.

Together with #44 that'd bring us one step closer to being able to properly export MNE-Python raw objects to BrainVision, see also: https://github.com/mne-tools/mne-python/blob/53d0f5de8a8a99e3f4211a8dc1361930e3cc3cd8/mne/export/_export.py#L43-L44

WDYT?

@sappelhoff sappelhoff added the enhancement New feature or request label Aug 5, 2021
@cbrnr
Copy link
Collaborator

cbrnr commented Aug 5, 2021

I didn't know that we don't support MNE annotations – we definitely should!

We could think about a feature that can write annotations as used for example in MNE-Python: Each annotation having an onset, offset, and description.

It should be onset, duration, and description.

We could even go beyond what MNE-Python currently (?) supports and support channel specific annotations (although that could be done later).

I think we actually support channel-specific annotations already, but it's not available in the plotting GUI yet (so users cannot add them other than programmatically).

@sappelhoff
Copy link
Member Author

Do you perhaps want to give it a shot @cbrnr? I think it'd also benefit MNELAB 😏

@sappelhoff sappelhoff added this to the 0.6.0 milestone Aug 6, 2021
@cbrnr
Copy link
Collaborator

cbrnr commented Aug 6, 2021

Do you perhaps want to give it a shot @cbrnr? I think it'd also benefit MNELAB 😏

Give me some time and I can come up with something. Could it be as easy as using mne.events_from_annotations() and passing that to the existing writer?

@sappelhoff
Copy link
Member Author

Could it be as easy as using mne.events_from_annotations() and passing that to the existing writer?

That is already supported as far as I can see: You'd just get events via mne.events_from_annotations() that you can then pass to the events parameter of write_brainvision.

In the VMRK these will then be written as such:

Mk3=Stimulus,S  1,6230,1,0
Mk4=Stimulus,S  3,6804,1,0
Mk5=Stimulus,S  4,7849,1,0
Mk6=Stimulus,S  6,8475,1,0

With the "Stimulus" event type, and the MNE event code (an integer) being transformed to a string S i, where i is some int.

In my example above, the duration of each of those events is 1 sample. With pybv we can currently also use the events parameter to write the actual duration in samples. However mne.events_from_annotations does not provide the duration of each event in samples, but only onset, "previous event value", and event value.

So indeed, (maybe what you had in mind) one way to improve the current state is to provide a convenience function that goes from MNE annotations to events as pybv can use them. Namely in the form onset, event value, duration in samples (see screenshot):

image


Having that said, I was interested in more advanced usage, because the way above still loses the actual annotation in favor of some integer code. And the actual annotation is not written to the BrainVision file then (only the integer code), and you'd have to keep the map somewhere else.

I thought about going beyond the Stimulus,S i style events that pybv can currently write. All is still in line with the BrainVision spec.

Something like this:

Mk2=Comment,ControlBox is not connected via USB,1,1,0

Starting from

; Each entry: Mk<Marker number>=<Type>,<Description>,<Position in data points>,
; <Size in data points>, <Channel number (0 = marker is related to all channels)>

We could have:

  • Mk --> This stays the marker number ( = event number in the file, so events should be temporally ordered)
  • Type --> Could be always set to "Comment" for writing annotations
  • Description --> This is the actual free form string annotation ... for example BAD_eyeblink, to stay close to the MNE usage ... but could be The assistant spilled coffee over the participant
  • position in data points --> onset in samples
  • size in data points --> duration in samples
  • channel number --> which channel the annotation refers to

And I imagine that participants could pass this information to pybv via a new parameter annotations. We could think about what form this parameter should take (dict?, pandas DF?, string array, ...), but overall I think this would be a great way to conserve MNE annotations in BrainVision files while staying true to the file format standard.

When both the events and annotations parameters are passed, both would be written to VMRK.

@sappelhoff sappelhoff added effort:medium estimated medium effort task impact:high estimated high impact task labels Aug 6, 2021
@sappelhoff
Copy link
Member Author

sappelhoff commented Aug 7, 2021

Having slept over it, I think we should NOT introduce a new parameter "annotations" (or the like) and instead make a breaking API change to our parameter "events".

events should accept a dictionary of the following form:

{
    "onset": [],
    "duration": [],
    "description": [],
    "type": [],
    "channel": [],
}

With the following rules:

  • "onset" and "description" are REQUIRED, all other keys are OPTIONAL
  • "onset" and "duration" must be integers, reflecting sample indices
  • "onset" must be in ascending order
  • "channel" must be integers in the range range(0, n_chs+1), where 0 indicates that an event refers to all channels
  • "type" must be a string. This is commonly one of:
    • Stimulus
    • Response
    • Comment
    • New Segment
  • "description" may be a string, or an integer, but MUST be an integer for events of type Stimulus and Response, because here we will cast that integer to the form S i, and R i respectively

In the simplest scenario, supplying a dict as below works:

{
    "onset": [6230, 6804, 7849, 8475],
    "description": [1, 3, 4, 6],
}

and would result in:

Mk1=Stimulus,S  1,6230,1,0
Mk2=Stimulus,S  3,6804,1,0
Mk3=Stimulus,S  4,7849,1,0
Mk4=Stimulus,S  6,8475,1,0

That also shows that users can fix their pipelines easily to accommodate the new API:

# "old" events was a np.ndarray of shape (n, 2) or (n, 3)
new_events = {}
new_events["onset"] = events[:, 0]
new_events["description"] = events[:, 1]
if events.shape[1] == 3:
    new_events["duration"] = events[:, 2]

Here an example of a more complex scenario, which shows that users can specify much more information with the new API, facilitating export from MNE annotations:

{
    "onset": [1000, 2000, 3000],
    "duration": [25, 50, 75],
    "description": [1, "BAD signal dropout", 555],
    "type": ["Stimulus", "Comment", "Response"],
    "channel": [0, 64, 0],
}

which would result in:

Mk1=Stimulus,S  1,1000,25,0
Mk2=Comment,BAD signal dropout,2000,50,64
Mk3=Response,R555,3000,75,0

@sappelhoff sappelhoff added the API influences the API of pybv label Aug 7, 2021
@sappelhoff sappelhoff self-assigned this Aug 7, 2021
@sappelhoff
Copy link
Member Author

Also, I can take this over @cbrnr before asking you in #77 (comment) I hadn't known that I feel strongly about this one 😆

@cbrnr
Copy link
Collaborator

cbrnr commented Aug 7, 2021

Please go ahead!

@hoechenberger
Copy link
Collaborator

hoechenberger commented Aug 7, 2021

In JS and JSON, one would opt to have an array of objects instead; in Python terms: a list of dicts. This is what I would like to see here too. "channels" should be pluralized.

[
    {
        "onset": 1,
        "duration": 3,
        "description": "foo",
        "type": "bar",
        "channels": ["ch1", "ch2"],
    },
    {
        "onset": 5,
        "duration": 3,
        "description": "foo2",
        "type": "baz",
        "channels": ["ch1"],
    }
]

This makes the properties of each individual annotation very clear.

@hoechenberger
Copy link
Collaborator

hoechenberger commented Aug 7, 2021

I'd like to emphasize that I don't think "0" is a good pick to mean "all channels". I think

  • all affected channels should be listed explicitly
  • channels should be referred to by their names, not indices
  • if we figure listing all channels every time we wish tk add an annotation for all channels is not feasible, we should allow for the string "all" or even "None" instead.

@sappelhoff
Copy link
Member Author

sappelhoff commented Aug 7, 2021

This makes the properties of each individual annotation very clear.

I like that notation, but it would make specifying an input for pybv much harder as a user (unwieldy), wouldn't it?

"channels" should be pluralized.

I agree in principle, but I don't think the BrainVision spec allows for that:

; Each entry: Mk<Marker number>=<Type>,<Description>,<Position in data points>,
; <Size in data points>, <Channel number (0 = marker is related to all channels)>

One could think about allowing it, and then internally in pybv write an event as below.

Example input:

{
    "onset": [1000],
    "description": ["BAD signal dropout"],
    "type": ["Comment"],
    "channel": [[1, 23, 55]],
}

Corresponding output:

Mk1=Comment,BAD signal dropout,1000,1,1
Mk2=Comment,BAD signal dropout,1000,1,23
Mk3=Comment,BAD signal dropout,1000,1,55

Note how the same event is written 3 times with only the channel that it refers to differing.

It's ugly, but I don't see another spec-conformant way (apart from not supporting it at all), do you?

@sappelhoff
Copy link
Member Author

In response to #77 (comment): We need to follow the BrainVision spec, and your proposal unfortunately does not work within those constraints 😞

@hoechenberger
Copy link
Collaborator

hoechenberger commented Aug 7, 2021

One could think about allowing it, and then internally in pybv write an event

👍

Example input:

{
    "onset": [1000],
    "description": ["BAD signal dropout"],
    "type": ["Comment"],
    "channel": [[1, 23, 55]],
}

[...]

It's ugly

It is suuuper ugly and nobody could possibly want that. But the JSON-inspired notation I proposed above makes it trivial to resolve this issue, while staying very readable and trivial to parse:

[
    {
        'onset': 1000,
        'description': 'BAD signal dropout',
        'type': 'Comment',
        'channel': 1
    },
    {
        'onset': 1000,
        'description': 'BAD signal dropout',
        'type': 'Comment',
        'channel': 23
    },
    {
        'onset': 1000,
        'description': 'BAD signal dropout',
        'type': 'Comment',
        'channel': 55
    }
]

@hoechenberger
Copy link
Collaborator

hoechenberger commented Aug 7, 2021

@sappelhoff

Since MNE's Annotations don't immediately allow for an output that would suit this API, I propose we add Annotations.to_json() to MNE-Python. This will resolve the issue.

@sappelhoff
Copy link
Member Author

@hoechenberger I feel like you are not reading my messages 😂

This thread is about how to make the most out of the BrainVision format for annotations, I like your proposals and they make sense --> but they don't fit with the BrainVision spec

@hoechenberger
Copy link
Collaborator

My proposals were for the "annotations" parameter you suggested to introduce... I'm confused now! 😬🙈

@sappelhoff
Copy link
Member Author

sappelhoff commented Aug 8, 2021

Okay, I think I get part of it now - I think I was sometimes also talking about how to write it to VMRK, whereas you were always talking about the input into "write_brainvision". :-)

I like that notation, but it would make specifying an input for pybv much harder as a user (unwieldy), wouldn't it?

I am reconsidering my argument here: Let's go with your proposed "list of dicts" structure. It is not significantly harder to prepare, to go from my initial structure to yours it'd just be:

old_annots = {
    "onset": [1000, 2000, 3000],
    "duration": [25, 50, 75],
    "description": [1, "BAD signal dropout", 555],
    "type": ["Stimulus", "Comment", "Response"],
    "channel": [0, 64, 0],
}

# go from old_annots (dict of lists) to new_annots (list of dicts)
new_annots = []
n_annot = len(old_annots["onset"])
for i_annot in range(n_annot):
    annot = {}
    for key in old_annots.keys():
        annot[key] = old_annots[key][i_annot]
    new_annots.append(annot)

# new_annots
[{'onset': 1000,
  'duration': 25,
  'description': 1,
  'type': 'Stimulus',
  'channel': 0},
 {'onset': 2000,
  'duration': 50,
  'description': 'BAD signal dropout',
  'type': 'Comment',
  'channel': 64},
 {'onset': 3000,
  'duration': 75,
  'description': 555,
  'type': 'Response',
  'channel': 0}]

and the gain in readability and clarity of the "list of dicts" beats the simplicity of the other structure.

We can also use that list of dicts structure for coordinates in #44 and have a more consistent API overall

@sappelhoff
Copy link
Member Author

I'd like to emphasize that I don't think "0" is a good pick to mean "all channels". I think

  • all affected channels should be listed explicitly
  • channels should be referred to by their names, not indices
  • if we figure listing all channels every time we wish to add an annotation for all channels is not feasible, we should allow for the string "all" or even "None" instead.

Here we misunderstood each other again (you talking ONLY about input, me talking ALSO about writing to VMRK)

Now that that's cleared up, I agree with you:

That just leaves us with a single problem: How to encode in VMRK if one event pertains to several (more than 1, less than "all") channels?

My proposal was:

Mk1=Comment,BAD signal dropout,1000,1,1
Mk2=Comment,BAD signal dropout,1000,1,23
Mk3=Comment,BAD signal dropout,1000,1,55

Do you see a BrainVision spec compliant alternative?

If there is no alternative, there are two options:

  1. do not support it (i.e., only support lists of len 0 and len 1 for channels in annots)
  2. write it as I proposed (but make sure first, that MNE-Python would be able to read that, and that it wouldn't break FT, EEGLAB, Brainstorm)

@hoechenberger
Copy link
Collaborator

Haha so many misunderstandings!! So sorry about the confusion!!

On a somewhat unrelated side note, I'm currently in Berlin and happy to meet up for a coffee or beer if you feel like it 👍 Just drop me a message on Discord

@cbrnr
Copy link
Collaborator

cbrnr commented Aug 9, 2021

If repeating a marker for each affected channel is the only compliant solution then I'd go for it. Just to make sure I understand the format, a trailing zero means "all channels", correct? In MNE-Python, all channels corresponds to an empty list (https://mne.tools/dev/generated/mne.Annotations.html#mne-annotations), so how would the API handle this (I lost track so maybe @sappelhoff can you summarize the current state)?

@sappelhoff
Copy link
Member Author

sappelhoff commented Aug 9, 2021

a trailing zero means "all channels", correct?

correct

In MNE-Python, all channels corresponds to an empty list

yes

so how would the API handle this (I lost track so maybe @sappelhoff can you summarize the current state)?

A user passes the following input to write_brainvision's event parameter (which we redesign, breaking API change EDIT: we can remain backward compatible and still accept ndarray next to list of dict):

[{'onset': 1000,
  'duration': 25,
  'description': 1,
  'type': 'Stimulus',
  'channels': []},
 {'onset': 2000,
  'duration': 50,
  'description': 'BAD signal dropout',
  'type': 'Comment',
  'channels': [62, 63, 64]}]

pybv will then write the following to VMRK:

Mk1=Stimulus,S  1,1000,25,0
Mk2=Comment,BAD signal dropout,2000,50,62
Mk3=Comment,BAD signal dropout,2000,50,63
Mk4=Comment,BAD signal dropout,2000,50,64

I think logically these events in VMRK make sense and they are compliant. It's still not pretty, but we have our constraints to work with here. We still need to check that MNE-Python, EEGLAB, Fieldtrip, Brainstorm would properly read this, or at least not produce a bunch of nonsense.

the input for the events parameter of write_brainvision can be easily derived from MNE-Python annotations (if sfreq is available, too) ... and also designed from scratch for non MNE users.

@cbrnr
Copy link
Collaborator

cbrnr commented Aug 9, 2021

Thanks! This looks good. We could also directly accept an mne.Annotations object in addition to a list of dicts; I think nobody would want to create this manually just to write some annotations.

In addition to the readers you mentioned, we should first test if BrainVision Analyzer reads such files.

@sappelhoff sappelhoff modified the milestones: 0.6.0, 0.7.0 Sep 27, 2021
@cbrnr
Copy link
Collaborator

cbrnr commented Feb 10, 2022

I just revisited an old MNELAB issue and found out that being able to pass Annotations to write_brainvision() would be really useful. @sappelhoff would you be able to summarize the status and who agreed to do what? I think we haven't even decided on a solution yet.

@sappelhoff
Copy link
Member Author

sappelhoff commented Feb 10, 2022

@cbrnr I think we did find a solution, and a consistent API for writing events to VMRK (this issue) and coordinates to VHDR (#44 (comment)).

The basic API is described in #77 (comment) -> input to events becomes a list of dict (or for backward compatibility, can still be a numpy array with 2 or 3 columns; but that won't unlock all features).

Talking about both events (this issue) and coordinates (#44), we "only" need to:

  1. check whether the (i) "duplicate events" (when an annotation is with respect to channels other than "all" or a single one), or (ii) the coordinates section raise any issue in: FieldTrip, EEGLAB, BrainStorm, MNE-Python, BrainVision Analyzer
    1. If they do - try to fix them in those readers (and write to Brain Products to perhaps fix their BrainVision Analyzer, if it causes problems, or to improve the BV spec)
  2. implement it
  3. hopefully ship it to MNE Python's export module

I am currently fully blocked due to having to submit my dissertation soon, so if you want to give it a shot - be my guest! Else, I hope to be able to work on this some time this year, as I consider it a major improvement that I definitely want for pybv.

@cbrnr
Copy link
Collaborator

cbrnr commented Feb 10, 2022

Thanks @sappelhoff! I don't have time right now either, but I'm happy to help. I think it would be nice if the events parameter accepted an ndarray (the MNE events structure) as well as an mne.Annotations object as well. Or at least we should provide helper functions that convert events and annotations into this new format. Otherwise people will have to perform gymnastics like [{"onset": a["onset"], "duration": a["duration"], "description": a["description"]} for a in raw.annotations] to export their existing annotations.

@sappelhoff
Copy link
Member Author

agreed to support ndarray (the "custom" type we currently support, which can be easily indexed from the mne type), mne.annotations, and a standard list of dict

I think it's important to offer non-mne options as well

@cbrnr
Copy link
Collaborator

cbrnr commented Feb 10, 2022

Since pybv does not depend on MNE, how would you implement a parameter accepting also an mne.Annotations object? An extra wrapper function just in MNE (raw.export())? That's probably the best option.

@sappelhoff
Copy link
Member Author

Since pybv does not depend on MNE, how would you implement a parameter accepting also an mne.Annotations object? An extra wrapper function just in MNE (raw.export())? That's probably the best option

mmh and in "pure pybv" we'd only have ndarray or list of dicts? - Yes, that sounds good.

So far I had thought to have a check of parameter types, and raise if the then optional mne dependency is not present, but I like your suggestion better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API influences the API of pybv effort:medium estimated medium effort task enhancement New feature or request impact:high estimated high impact task
Projects
None yet
3 participants