New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove `save()` methods from analysis classes #1745

Open
orbeckst opened this Issue Dec 18, 2017 · 12 comments

Comments

5 participants
@orbeckst
Member

orbeckst commented Dec 18, 2017

Many classes in MDAnalysis.analysis have a save() method as a convenience to persist calculated data such as a timeseries to disk. However, storing data structures to disk in a portable manner is difficult and typical approaches such as using Python pickle are not portable (not even between Python 2 and Python 3). Instead of having to worry about the details of storing data, we should just let the user decide how to store data (e.g., convert to pandas dataframes and let pandas handle all this, or write as a custom text format).

(This issue came up in discussion on PR #1744 (for issue #1743).)

Proposal

Remove save() methods from analysis classes if the method is only a convenience functions. If the saved files can be read in again by the class, e.g., for re-analysis, then a more detailed case-by-case evaluation should be performed.

EDIT: Generally accepted, I rewrote the list of cases below.

approach

  1. survey the list of save methods below: decide which ones should be removed and note here
  2. add a deprecation notice for 0.17.1 (once this is done, update the Milestone to 1.0 instead of closing)
    • docs (add a .. deprecated:: 0.17.1 saying that the save() functionality will be removed in 1.0 because users generally know best how they want to persist the data.
    • DeprecationWarning
  3. add docs that explain which attributes might be worthwhile to save to disk and show an example for how to to this (basically, adding the code from the save() method to the docs)
  4. remove in 1.0

cases

With git grep 'save\w*(self'): Add a checkmark for any method that should be removed.

To be removed

Check methods once they have been removed.

  • MDAnalysis/analysis/align.py: def save(self, rmsdfile):
  • MDAnalysis/analysis/contacts.py: def save(self, outfile):
  • MDAnalysis/analysis/diffusionmap.py: def save(self, filename):
  • MDAnalysis/analysis/hbonds/hbond_analysis.py: def save_table(self, filename="hbond_table.pickle"):
  • MDAnalysis/analysis/hbonds/hbond_autocorrel.py: def save_results(self, filename='hbond_autocorrel'):
  • MDAnalysis/analysis/hbonds/wbridge_analysis.py: def save_table(self, filename="wbridge_table.pickle"):
  • MDAnalysis/analysis/hole.py: def save(self, filename="hole.pickle"):
  • MDAnalysis/analysis/lineardensity.py: def save(self, description='', form='txt'):
    • MDAnalysis/analysis/lineardensity.py: def _savetxt(self, filename):
    • MDAnalysis/analysis/lineardensity.py: def _savez(self, filename):
  • MDAnalysis/analysis/psa.py: def save_result(self, filename=None):
  • MDAnalysis/analysis/rms.py: def save(self, filename=None):

To stay

Check if they are covered by tests

  • MDAnalysis/analysis/encore/utils.py: def savez(self, fname):
  • MDAnalysis/analysis/legacy/x3dna.py: def save(self, filename="x3dna.pickle"):
  • MDAnalysis/analysis/psa.py: def save_paths(self, filename=None):
@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Dec 18, 2017

Member

@richardjgowers @kain88-de have we got a nice decorator for deprecating methods?

Can we add a code snippet in the comments that anyone can quickly apply to any method they come across?

Member

orbeckst commented Dec 18, 2017

@richardjgowers @kain88-de have we got a nice decorator for deprecating methods?

Can we add a code snippet in the comments that anyone can quickly apply to any method they come across?

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Dec 19, 2017

Member

@sseyler can you please comment on the necessity to keep the save_paths() and save_result() methods in the PSA module?

Are they commonly used?

Are they used in the published tutorials and docs?

Member

orbeckst commented Dec 19, 2017

@sseyler can you please comment on the necessity to keep the save_paths() and save_result() methods in the PSA module?

Are they commonly used?

Are they used in the published tutorials and docs?

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Dec 19, 2017

Member

We are not going to touch anything in legacy, i.e., we will keep x3dna.X3DNA.save().

Member

orbeckst commented Dec 19, 2017

We are not going to touch anything in legacy, i.e., we will keep x3dna.X3DNA.save().

@kain88-de

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Dec 19, 2017

Member

numpy has a deprecated decorator for functions we use in several places. That should be good enough here as well.

Member

kain88-de commented Dec 19, 2017

numpy has a deprecated decorator for functions we use in several places. That should be good enough here as well.

@kain88-de

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Dec 19, 2017

Member

@mtiberti can you comment about the need for savez in encore?

Member

kain88-de commented Dec 19, 2017

@mtiberti can you comment about the need for savez in encore?

@sseyler

This comment has been minimized.

Show comment
Hide comment
@sseyler

sseyler Dec 19, 2017

Contributor

For psa.py, the save_result() method can probably be removed since that's just a convenience function for saving a final result (i.e., distance matrix). But this can be done manually easily. The save_paths() method might be worth saving since it's storing intermediate results that can take some time to re-compute (i.e., the paths that have been transformed/fitted prior to measuring path distances).

The paths are saved in their original or (optionally) npz format, so I wouldn't think compatibility is much of an issue—but what do I know? The save_paths() method is also used (by default) in the published tutorial (via the store keyword in generate_paths()).

Contributor

sseyler commented Dec 19, 2017

For psa.py, the save_result() method can probably be removed since that's just a convenience function for saving a final result (i.e., distance matrix). But this can be done manually easily. The save_paths() method might be worth saving since it's storing intermediate results that can take some time to re-compute (i.e., the paths that have been transformed/fitted prior to measuring path distances).

The paths are saved in their original or (optionally) npz format, so I wouldn't think compatibility is much of an issue—but what do I know? The save_paths() method is also used (by default) in the published tutorial (via the store keyword in generate_paths()).

@mtiberti

This comment has been minimized.

Show comment
Hide comment
@mtiberti

mtiberti Dec 20, 2017

Contributor

The idea behind savez is to have an easy way to write to disk the matrix computed by get_distance_matrix, so that the user can load it from disk when they want to try runs of ces or dres. It writes, together with the array containing the data, some metadata which is used to do a consistency check on the file when it's loaded while initialising a TriangularMatrix with the loadfile option. It is not meant to be a user-friendly or output format (even though it can be opened using numpy as it's a standard numpy npz file), more a way to store a midpoint of the calculation for later reuse. We could always get rid of it, specify a format that the user needs to use when saving arrays for TriangularMatrix and expect the user to conform, however it sounds needlessly complicated and more error-prone. What do you think?

Contributor

mtiberti commented Dec 20, 2017

The idea behind savez is to have an easy way to write to disk the matrix computed by get_distance_matrix, so that the user can load it from disk when they want to try runs of ces or dres. It writes, together with the array containing the data, some metadata which is used to do a consistency check on the file when it's loaded while initialising a TriangularMatrix with the loadfile option. It is not meant to be a user-friendly or output format (even though it can be opened using numpy as it's a standard numpy npz file), more a way to store a midpoint of the calculation for later reuse. We could always get rid of it, specify a format that the user needs to use when saving arrays for TriangularMatrix and expect the user to conform, however it sounds needlessly complicated and more error-prone. What do you think?

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Dec 21, 2017

Member
Member

orbeckst commented Dec 21, 2017

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Dec 21, 2017

Member
Member

orbeckst commented Dec 21, 2017

@sseyler

This comment has been minimized.

Show comment
Hide comment
@sseyler

sseyler Jan 1, 2018

Contributor

Can the files from save_paths() read into any of the functions in the PSA module?

Yes, in that save_paths() is responsible for generating the (fitted) DCDs that can be reused directly since they're numbered and can be dumped into a PSA object easily.

Do we have a test that covers it?

Not specifically.

Contributor

sseyler commented Jan 1, 2018

Can the files from save_paths() read into any of the functions in the PSA module?

Yes, in that save_paths() is responsible for generating the (fitted) DCDs that can be reused directly since they're numbered and can be dumped into a PSA object easily.

Do we have a test that covers it?

Not specifically.

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Jan 3, 2018

Member
Member

orbeckst commented Jan 3, 2018

@richardjgowers richardjgowers modified the milestones: 0.17.0, 1.0 Jan 25, 2018

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Jan 26, 2018

Member

@richardjgowers I think the deprecations need to go into 0.17.x.

Member

orbeckst commented Jan 26, 2018

@richardjgowers I think the deprecations need to go into 0.17.x.

@orbeckst orbeckst referenced this issue Jan 26, 2018

Merged

deprecating save() methods in analysis classes #1763

4 of 4 tasks complete

@orbeckst orbeckst referenced this issue Jul 6, 2018

Closed

deprecate save() methods #1972

12 of 12 tasks complete

@orbeckst orbeckst added this to To do in release 1.0 Jul 10, 2018

@orbeckst orbeckst referenced this issue Oct 19, 2018

Open

more PSA tests #2049

2 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment