RFC: Handle classes without `save_results` #17

PicoCentauri · 2020-10-20T16:52:34Z

If an analysis class have no save_results function we simply dump a
pickel file with the name of the class.

What do you think about this @joaomcteixeira?

joaomcteixeira · 2020-10-20T18:24:14Z

Honestly, I don't see it this way. I think pickling objects will disrupt the whole purpose of the project in the first place.

In my view, MDACLI aims to bridge MDAnalysis to less programmatic users, or facilitate operations for everyone, or to allow piping sequential workflows.

Hence, once the pickle object is created, what is the user to do with ti? The user will need to go the Python prompt again, import pickle, unpickle, go to MDAnalsysis documentation, look for the needed method, extract the data, and finally use it. A user that can do all this labyrinthic process will actually scripting the MDAnalysis whole way and won't use MDACLI. Pickle does serve piping though.

This has been a recurrent thought in my mind for the last months. I see two possibilities:

The MDA community agrees to have a common method interface for all classes from where we can extract the results. Note, this interface needs not to be public! If the MDA community does not want a common public interface, we, from the MDACLI project, can try to add the needed format-converting and ducktyping functionality through private methods. For example, we would look for the most common way MDA Analysis classes provide results and emulate it in all other classes through private methods -- in an adaptor-like approach. But, this has two problems:
1. We would need to inspect mannually each and every Analysis class addition to assure format compatibility
2. consequently, new analysis won't be available on-the-fly on MDACLI, only after revision and PR. Hence we couldn't make use of __all__.
The second possibility is to do the above referred job directly in MDACLI, instead of PRing to the main MDAnalysis. The benefit here is that progress in MDACLI won't need to pass through the PR process of the MDA bigger project, inhevitable slower.

All and all, I truly believe that MDACLI's output should be a CSV file or plots. However, it can indeed output also the pickle analysis. That might be useful for some users. Yet, it shoudn't be a final goal.

So answering your question: For the long term, No. For the time being and as an additional output format, Yes.

So, I believe your PR is very valuable. But we should maintain this discussion active.

I will wait for your thoughts before merging. Let me know you think.

orbeckst · 2020-10-20T18:48:40Z

Do you have a specification document of what you would like the MDA AnalysisBase API to look like, from MDACLI's perspective? This would help focus the discussion.

Here are some possible approaches that I can think of:

Reserved attribute names

For example, should .result always be the data structure containing computed data?

Trailing underscore attributes (sklearn-style)

In scikit-learn, computed variables get an underscore affixed (e.g. .results_). We could do that (although I dislike using variable name alterations to indicate types etc. ... and I think it looks ugly — but there's precedent in the Python eco-system and I can be convinced otherwise.)

Flexible annotation

Or should there be a more flexible approach where we might have an attribute that marks up important "Result attributes" e.g. "OUTPUT" for anything that is always computed after run() and possibly other tags such as "OPTIONAL" for something that might be computed by an auxiliary method (e.g., in HydrogenBondAnalysis):

self._annotation = {'results': 'OUTPUT', 'times': 'OUTPUT', 'count_by_type': 'OPTIONAL'}

Then MDACLI can find anything that's 'OUTPUT' and use out = getattr(analysis_instance, key) to process. If out is a numpy array, use np.save() to CSV, if it's a pd.DataFrame, use to_csv(), ... etc. MDACLI's contribution would be to make the decision for the user that the output is always CSV and then find a way to make it work. In MDA we decided (when we cut the save() methods, that the user knows best how to save data).

There's certainly a better way to do this but you get the idea.

PicoCentauri · 2021-05-16T21:38:36Z

I added a saving routine that uses the results of the new MDAnalysis Results class.

Currently the test will fail and one can not test the code. I will add additional test cases and streamline the code when
#23 is merged.

Please comment if the code is readable and the output structure seems reasonable.

If an analysis class have no save_results function we simply dump a pickel file with the name of the class.

Also adds execution command to save CSV. Only saves JSON if dict has length. All new code was provided by Philip. Co-authored-by: Philip Loche <ploche@physik.fu-berlin.de>

pep8speaks · 2021-05-18T17:50:05Z

Hello @PicoCentauri! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-20 22:51:30 UTC

PicoCentauri · 2021-05-19T11:12:58Z

I added a test for the cli.
We should also add the header we put to the csv files to the json output file!

joaomcteixeira · 2021-05-19T15:38:41Z

@PicoCentauri
Currently, arrays above 3 dims are ignored and non-JSON-serializable results are also ignored. What do you think we .npz higher dimension arrays and try to pickle non-serializable objects?

Improves docstring General improvements cleaning Saves ndim > 3 to npz TODO: - some refact on save_results - some testing

TODO: * lint * tests

before attempting mdacli saving engine. Co-authored-by: Philip Loche <ploche@physik.fu-berlin.de>

correct json saving command

Functional refactoring of save branch

joaomcteixeira mentioned this pull request Oct 20, 2020

Visit all Analysis run/result API #7

Closed

PicoCentauri force-pushed the save branch from 1fc4a85 to 58646cd Compare January 29, 2021 10:43

PicoCentauri force-pushed the save branch 2 times, most recently from c5718a9 to 8401823 Compare May 16, 2021 21:35

PicoCentauri mentioned this pull request May 17, 2021

move repo to MDAnalysis? #8

Closed

3 tasks

PicoCentauri added 4 commits May 18, 2021 11:47

Handle classes without save_results

616f24b

If an analysis class have no save_results function we simply dump a pickel file with the name of the class.

Use new function for saving results

5ce6546

Styling

6ce6c92

PEP8

7b8ffc1

PicoCentauri force-pushed the save branch from 8401823 to 7b8ffc1 Compare May 18, 2021 13:46

PicoCentauri and others added 3 commits May 18, 2021 15:52

Added MDAnalysisTests to requirements

e92501f

Merge branch 'main' into save

874563a

Changes place of functions

6738d36

Also adds execution command to save CSV. Only saves JSON if dict has length. All new code was provided by Philip. Co-authored-by: Philip Loche <ploche@physik.fu-berlin.de>

joaomcteixeira linked an issue May 18, 2021 that may be closed by this pull request

Visit all Analysis run/result API #7

Closed

Reworked tests

59cf692

joaomcteixeira added 4 commits May 19, 2021 18:03

Improves save_results

5de2101

Improves docstring General improvements cleaning Saves ndim > 3 to npz TODO: - some refact on save_results - some testing

Functional refactor of save.py

a8529f8

working version

9e66e6a

TODO: * lint * tests

mostly lint

7db32ae

PicoCentauri mentioned this pull request May 20, 2021

Keep logical order when saving results #32

Open

joaomcteixeira and others added 3 commits May 20, 2021 23:23

adds command to json

30ac627

Adds option to use Analysis class save method

b68bd18

before attempting mdacli saving engine. Co-authored-by: Philip Loche <ploche@physik.fu-berlin.de>

Improves array saving functions

fe6b28a

correct json saving command

joaomcteixeira mentioned this pull request May 20, 2021

Alter MDAnalsysi.base.AnanlysisBase #35

Open

joaomcteixeira mentioned this pull request May 20, 2021

Simplify analyze_data method #33

Closed

Merge pull request #29 from MDAnalysis/saverefact

fc988a2

Functional refactoring of save branch

PicoCentauri merged commit 6334ac4 into main May 20, 2021

PicoCentauri deleted the save branch May 20, 2021 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Handle classes without `save_results` #17

RFC: Handle classes without `save_results` #17

PicoCentauri commented Oct 20, 2020

joaomcteixeira commented Oct 20, 2020

orbeckst commented Oct 20, 2020

PicoCentauri commented May 16, 2021

pep8speaks commented May 18, 2021 •

edited

PicoCentauri commented May 19, 2021

joaomcteixeira commented May 19, 2021

RFC: Handle classes without save_results #17

RFC: Handle classes without save_results #17

Conversation

PicoCentauri commented Oct 20, 2020

joaomcteixeira commented Oct 20, 2020

orbeckst commented Oct 20, 2020

Reserved attribute names

Trailing underscore attributes (sklearn-style)

Flexible annotation

PicoCentauri commented May 16, 2021

pep8speaks commented May 18, 2021 • edited

Comment last updated at 2021-05-20 22:51:30 UTC

PicoCentauri commented May 19, 2021

joaomcteixeira commented May 19, 2021

RFC: Handle classes without `save_results` #17

RFC: Handle classes without `save_results` #17

pep8speaks commented May 18, 2021 •

edited