New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting/importing Brian data from/to other formats #306

Closed
mstimberg opened this Issue Aug 8, 2014 · 10 comments

Comments

Projects
None yet
4 participants
@mstimberg
Member

mstimberg commented Aug 8, 2014

[Following up to my comment on #298.] I think it would be nice if we had a general mechanism to export data from Brian to other formats, so that it is very easy for users to connect Brian with existing analysis tools or store results. I'm not thinking of anything complicated here, data would still be internally stored as numpy arrays (respectively dynamic arrays). For example, each Group (i.e. every object that has state variables, including monitors) could have an export method:

syn = Synapses(...)
s_mon = StateMonitor(...)
spike_mon = SpikeMonitor(...)
...
s_mon_data = s_mon.export()  # export all state variables to a format set by a preference
synaptic_weights = syn.export(['w'])  # export a subset of state variables
spikes_neo = spikes.export(format='neo')  # export to a specific format

I think the approach above with a general method and a format keyword would be a nice approach, we could implement this as a plugin mechanism, allowing a user (or an external library) to register itself for use in Brian without having to change Brian code itself.

@romainbrette

This comment has been minimized.

Show comment
Hide comment
@romainbrette

romainbrette Aug 8, 2014

Member

Good idea!

Member

romainbrette commented Aug 8, 2014

Good idea!

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Aug 8, 2014

Member

I think this is a good idea, but we shouldn't try to handle all these data formats in generated code because, for example, reading HDF5 files in C++ is an absolute nightmare that we really don't want to get into. So we should handle all this stuff in Python only.

Also, we might not want to put it in core Brian but rather in the cookbook or a separate package because some of these other packages/standards are not very stable.

Member

thesamovar commented Aug 8, 2014

I think this is a good idea, but we shouldn't try to handle all these data formats in generated code because, for example, reading HDF5 files in C++ is an absolute nightmare that we really don't want to get into. So we should handle all this stuff in Python only.

Also, we might not want to put it in core Brian but rather in the cookbook or a separate package because some of these other packages/standards are not very stable.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Aug 9, 2014

Member

I think this is a good idea, but we shouldn't try to handle all these data formats in generated code because, for example, reading HDF5 files in C++ is an absolute nightmare that we really don't want to get into. So we should handle all this stuff in Python only.

That is what I meant: internally, the data is stored in numpy arrays (or on disk for standalone and transparently loaded). When export is called, this data is transformed into the output format, but this will be general code in Python, not codegen target specfic.

Also, we might not want to put it in core Brian but rather in the cookbook or a separate package because some of these other packages/standards are not very stable.

Agreed, this issue is mostly about setting up the general infrastructure in Python. I think core Brian should have (at least) one simple example, e.g. just export all the state variables in a dictionary that can then be easily pickled.

Member

mstimberg commented Aug 9, 2014

I think this is a good idea, but we shouldn't try to handle all these data formats in generated code because, for example, reading HDF5 files in C++ is an absolute nightmare that we really don't want to get into. So we should handle all this stuff in Python only.

That is what I meant: internally, the data is stored in numpy arrays (or on disk for standalone and transparently loaded). When export is called, this data is transformed into the output format, but this will be general code in Python, not codegen target specfic.

Also, we might not want to put it in core Brian but rather in the cookbook or a separate package because some of these other packages/standards are not very stable.

Agreed, this issue is mostly about setting up the general infrastructure in Python. I think core Brian should have (at least) one simple example, e.g. just export all the state variables in a dictionary that can then be easily pickled.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Aug 10, 2014

Member

Great!

Member

thesamovar commented Aug 10, 2014

Great!

@romainbrette

This comment has been minimized.

Show comment
Hide comment
@romainbrette

romainbrette Aug 13, 2014

Member

We should probably watch out what comes out of this project:
https://crcns.org/NWB

Member

romainbrette commented Aug 13, 2014

We should probably watch out what comes out of this project:
https://crcns.org/NWB

mstimberg added a commit that referenced this issue Sep 16, 2014

mstimberg added a commit that referenced this issue Sep 22, 2014

@mstimberg mstimberg added this to the 2.0beta2 milestone Nov 5, 2014

@mstimberg mstimberg modified the milestones: 2.0beta3, 2.0beta2 Feb 2, 2015

@mstimberg mstimberg changed the title from Exporting Brian data to other formats to Exporting/importing Brian data from/to other formats Feb 12, 2015

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Feb 12, 2015

Member

To add some more details about what I have in mind:

  • there should be a simple registration mechanism that allows you to register a new import/export class (look for function names called register... in the current Brian2 code base to see examples for such mechanisms)
  • The Group.get_states and Group.set_states method should delegate to a registered class for the given format (or raise an error if it doesn't exist)
  • The current implementation of the above methods that has hard-coded support for the 'dict' target should mostly move into a DictImportExport class
  • ImportExport classes should raise appropriate error messages if they can't fullfill the request, e.g. if they can't deal with physical units but the units argument was set to True, or if they need an additional library that isn't available.

It would be nice to have at least one additional target in Brian itself, a pandas dataframe for example should be both useful and straightforward to implement. Other targets should propably rather go into the brian2 cookbook.

Finally, it would be nice to have the same for a complete network (at least the export part), this could be a simple wrapper around the above mechanism, creating a dictionary with the object names (e.g. neurongroup, statemonitor, etc.) as keys and the result of Group.get_state (i.e., depending on the format argument either a dictinonary, or a pandas dataframe, etc.) as the values.

Member

mstimberg commented Feb 12, 2015

To add some more details about what I have in mind:

  • there should be a simple registration mechanism that allows you to register a new import/export class (look for function names called register... in the current Brian2 code base to see examples for such mechanisms)
  • The Group.get_states and Group.set_states method should delegate to a registered class for the given format (or raise an error if it doesn't exist)
  • The current implementation of the above methods that has hard-coded support for the 'dict' target should mostly move into a DictImportExport class
  • ImportExport classes should raise appropriate error messages if they can't fullfill the request, e.g. if they can't deal with physical units but the units argument was set to True, or if they need an additional library that isn't available.

It would be nice to have at least one additional target in Brian itself, a pandas dataframe for example should be both useful and straightforward to implement. Other targets should propably rather go into the brian2 cookbook.

Finally, it would be nice to have the same for a complete network (at least the export part), this could be a simple wrapper around the above mechanism, creating a dictionary with the object names (e.g. neurongroup, statemonitor, etc.) as keys and the result of Group.get_state (i.e., depending on the format argument either a dictinonary, or a pandas dataframe, etc.) as the values.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Feb 12, 2015

Member

This all sounds great to me.

Member

thesamovar commented Feb 12, 2015

This all sounds great to me.

@ankitkmr

This comment has been minimized.

Show comment
Hide comment
@ankitkmr

ankitkmr Mar 14, 2015

Hi Marcel

Sorry for delay. Just got back to college from spring break. I read the research paper and I must say it sounds even more interesting now. I have started writing code for export functionality ( you can have a look at it here : ( https://github.com/ankitkmr/brian2/blob/master/%23306_enhancement.py ).

I am moving on to working on the ideas you presented in your last comment at #306 . I just have some doubts.
Firstly can you be specific as to what you mean by format...just have a look at my code above and let me know. In the meanwhile, I'll try coding the ideas presented later on this page.

Let me know what you think of the code and if I am getting what you expect me to do.

Best

ankitkmr commented Mar 14, 2015

Hi Marcel

Sorry for delay. Just got back to college from spring break. I read the research paper and I must say it sounds even more interesting now. I have started writing code for export functionality ( you can have a look at it here : ( https://github.com/ankitkmr/brian2/blob/master/%23306_enhancement.py ).

I am moving on to working on the ideas you presented in your last comment at #306 . I just have some doubts.
Firstly can you be specific as to what you mean by format...just have a look at my code above and let me know. In the meanwhile, I'll try coding the ideas presented later on this page.

Let me know what you think of the code and if I am getting what you expect me to do.

Best

@ankitkmr

This comment has been minimized.

Show comment
Hide comment
@ankitkmr

ankitkmr Mar 14, 2015

Can you be more explicit about what you are referring to by import/export class? examples? class of Neurons, Synapses? and because I am not getting what class here represents ...register implies?

ankitkmr commented Mar 14, 2015

Can you be more explicit about what you are referring to by import/export class? examples? class of Neurons, Synapses? and because I am not getting what class here represents ...register implies?

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Mar 15, 2015

Member

Ok, here are some more explanations:
The data we want to export are the state variables (e.g. membrane potential, currents, etc.) of neurons, synapses, etc. All objects in Brian that have state variables inherit from the Group class. Therefore, the export is implemented in Group.get_states (and the import in Group.set_states). Currently, get_states always exports the data in a dictionary with the names of the variables as keys and the values as numpy arrays (or our Brian variant of numpy arrays that uses physical units):

>>> # after running the Vogels_et_al_2011.py example
>>> neurons.get_states(['v', 'g_ampa', 'g_gaba'])
{'g_ampa': array([ 1.91945271,  2.26156716,  1.74057656, ...,  2.47371202,
         1.21143219,  2.55672377]) * nsiemens,
 'g_gaba': array([  8.8592959 ,   8.07008752,   9.93361541, ...,  11.23948898,
          6.40375463,  13.74270921]) * nsiemens,
 'v': array([-53.99293672, -54.51540112, -55.75590128, ..., -55.25534778,
        -55.64736371, -58.96560936]) * mvolt}

Now, this has all the information users will need but for convience, it would be nice if the data could be exported in other formats, for example as a pandas DataFrame. It should then work like this:

>>> neurons.get_states(['v', 'g_ampa', 'g_gaba', format='pandas')
            g_ampa        g_gaba         v
0     1.919453e-09  8.859296e-09 -0.053993
1     2.261567e-09  8.070088e-09 -0.054515
...            ...           ...       ...
9998  1.211432e-09  6.403755e-09 -0.055647
9999  2.556724e-09  1.374271e-08 -0.058966

[10000 rows x 3 columns]

Finally, if all formats we'd ever want to support were dictionaries and pandas data frames, we could obviously just put it directly into the function. However, we want to support other formats in the future and most importantly, don't want to necessarily include all such formats in the Brian code base but allow external libraries or the user to add a new export/import format to Brian, i.e. have some form of "plugin" architecture. This is pretty straightforward in Python, it could for example look like this:

# Let's say that every export function gets a reference to the group
# and the variables to export as arguments:
def export_func1(group, variables):
    # ... do something
    return data

def export_func2(group, variables):
    # ... do something else
    return data

exporters = {'format1': export_func1, 'format2': export_func2}

Now Group.get_states will not have code to directly export the data, but instead refer to a function for the given format (this is what I meant with "delegate" above). Something like this:

def get_states(self, vars, format):
     if not format in exporters:
         raise NotImplementedError("Format '%s' is not supported" % format)
     return exporters[format](self, vars)  # call the registered function

Now, we could do this and register exporter and importer functions, but since we'll have an exporter and an importer function for each format, it would be more elegant to register a class instead, e.g. specifying that ImportExport classes have to follow the following pattern:

class ImportExport(object):
    @staticmethod
    def export_data(group, variables):
            # ... do something
            return data
    @staticmethod
    def import_data(group, data):
           # ...

Instead of registering one export function and one import function, one would then instead register one ImportExport class and Group.get_state would refer to the export_data function of this class while Group.set_state would use import_data.

Hope that makes things a bit clearer...

Member

mstimberg commented Mar 15, 2015

Ok, here are some more explanations:
The data we want to export are the state variables (e.g. membrane potential, currents, etc.) of neurons, synapses, etc. All objects in Brian that have state variables inherit from the Group class. Therefore, the export is implemented in Group.get_states (and the import in Group.set_states). Currently, get_states always exports the data in a dictionary with the names of the variables as keys and the values as numpy arrays (or our Brian variant of numpy arrays that uses physical units):

>>> # after running the Vogels_et_al_2011.py example
>>> neurons.get_states(['v', 'g_ampa', 'g_gaba'])
{'g_ampa': array([ 1.91945271,  2.26156716,  1.74057656, ...,  2.47371202,
         1.21143219,  2.55672377]) * nsiemens,
 'g_gaba': array([  8.8592959 ,   8.07008752,   9.93361541, ...,  11.23948898,
          6.40375463,  13.74270921]) * nsiemens,
 'v': array([-53.99293672, -54.51540112, -55.75590128, ..., -55.25534778,
        -55.64736371, -58.96560936]) * mvolt}

Now, this has all the information users will need but for convience, it would be nice if the data could be exported in other formats, for example as a pandas DataFrame. It should then work like this:

>>> neurons.get_states(['v', 'g_ampa', 'g_gaba', format='pandas')
            g_ampa        g_gaba         v
0     1.919453e-09  8.859296e-09 -0.053993
1     2.261567e-09  8.070088e-09 -0.054515
...            ...           ...       ...
9998  1.211432e-09  6.403755e-09 -0.055647
9999  2.556724e-09  1.374271e-08 -0.058966

[10000 rows x 3 columns]

Finally, if all formats we'd ever want to support were dictionaries and pandas data frames, we could obviously just put it directly into the function. However, we want to support other formats in the future and most importantly, don't want to necessarily include all such formats in the Brian code base but allow external libraries or the user to add a new export/import format to Brian, i.e. have some form of "plugin" architecture. This is pretty straightforward in Python, it could for example look like this:

# Let's say that every export function gets a reference to the group
# and the variables to export as arguments:
def export_func1(group, variables):
    # ... do something
    return data

def export_func2(group, variables):
    # ... do something else
    return data

exporters = {'format1': export_func1, 'format2': export_func2}

Now Group.get_states will not have code to directly export the data, but instead refer to a function for the given format (this is what I meant with "delegate" above). Something like this:

def get_states(self, vars, format):
     if not format in exporters:
         raise NotImplementedError("Format '%s' is not supported" % format)
     return exporters[format](self, vars)  # call the registered function

Now, we could do this and register exporter and importer functions, but since we'll have an exporter and an importer function for each format, it would be more elegant to register a class instead, e.g. specifying that ImportExport classes have to follow the following pattern:

class ImportExport(object):
    @staticmethod
    def export_data(group, variables):
            # ... do something
            return data
    @staticmethod
    def import_data(group, data):
           # ...

Instead of registering one export function and one import function, one would then instead register one ImportExport class and Group.get_state would refer to the export_data function of this class while Group.set_state would use import_data.

Hope that makes things a bit clearer...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment