Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent storage of times and frames attributes in AnalysisBase classes #4617

Open
edisj opened this issue Jun 17, 2024 · 6 comments
Open
Labels
Component-Analysis deprecation Deprecated functionality to give advance warning for API changes.
Milestone

Comments

@edisj
Copy link
Contributor

edisj commented Jun 17, 2024

I'm pretty new to using AnalysisBase to write my own custom analysis classes, and I found it confusing that frames and times are handled differently in different classes.

Using the NewAnalysis example in the docs (https://docs.mdanalysis.org/stable/documentation_pages/analysis/base.html#MDAnalysis.analysis.base.AnalysisBase),

if you run NewAnalysis.run(), frames and times will be in NewAnalysis.frames and NewAnalysis.times.

If you use AnalysisFromFunction, then frames and times will be in results.frames and results.times, even though they both use AnalysisBase

Is this intended?

Expected behavior

I expected to find frames and times in results.frames and results.times in both cases

Actual behavior

results.frames throws AttributeError: 'Results' object has no attribute 'frames' when inheriting from AnalysisBase, but not when using AnalysisFromFunction

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)")
    2.7.0
  • Which version of Python (python -V)?
    3.11.9
  • Which operating system?
    ubuntu 22.04
@RMeli
Copy link
Member

RMeli commented Jun 17, 2024

Hi @edisj, times and frames are not strictly speaking results, so they are not part of the results attribute in AnalysisBase, but they are their own attribute.

In the AnalysisFromFunction they are also their own attribute, that you can access as usual:

rog = AnalysisFromFunction(radgyr, u.trajectory,
                           protein, protein.masses,
                           total_mass=np.sum(protein.masses))
rog.run()

print(rog.times, rog.frames)

So the .times and .frames attributes are consistent between the two classes.

However, in the AnalysisFromFunction, these attributes appear to be added to the .results attribute as well:

self.results.frames = self.frames
self.results.times = self.times

I agree this inconsistency might be confusing.

I expected to find frames and times in results.frames and results.times in both cases

Since this is your expectation, would you find it better to have them in results also for AnalysisBase?

@edisj
Copy link
Contributor Author

edisj commented Jun 19, 2024

Hi @RMeli, thanks for the clarification! I ended up just adding them in my conclude method like this and it works exactly how I wanted

def _conclude(self):
    self.results.frames = self.frames 
    self.results.times = self.times 

Since this is your expectation, would you find it better to have them in results also for AnalysisBase?

Yes, personally I think .times and .frames should be part of the results

what I naturally think "times" and "frames" mean in the context of an analysis are the time values and frame numbers that correspond to the results of the analysis, so in some sense they are kind of like implicit results

The documentation from AnalysisFromFrunction:

Attributes
----------
results.frames : numpy.ndarray
    simulation frames used in analysis
results.times : numpy.ndarray
    simulation times used in analysis

is more in line with how I think it should behave

versus AnalysisBase:

Attributes
----------
times: numpy.ndarray
    array of Timestep times. Only exists after calling
    :meth:`AnalysisBase.run`
frames: numpy.ndarray
    array of Timestep frame indices. Only exists after calling
    :meth:`AnalysisBase.run`

What do you think?

@orbeckst
Copy link
Member

orbeckst commented Jul 9, 2024

I found it confusing that AnalysisBase AnalysisFromFunction added times and frames to results. I am very much used to the idea that any analysis class has times and frames as these are "not results" (even though it may not make a ton of sense for any class that aggregates such as RMSF, Density or similar).

Removing times and frames from AnalysisBase is a huge break. Adding them to AnalysisBase.results is easy but the redundancy feels dirty.

It would be interesting to hear other opinions.

@PicoCentauri you have looked at AnalysisBase a lot – any opinions?

@marinegor are there any implications for parallel analysis?

EDIT: I had confused AnalysisBase and AnalysisFromFunction

@IAlibay
Copy link
Member

IAlibay commented Jul 10, 2024

found it confusing that AnalysisBase added times and frames to results.

@orbeckst from my understanding it's the other way around? AnalysisBase doesn't have times and frames is results but AnalysisFromFunction does?

I think my 2 cents here is that AnalysisFromFunction is wrong - times and frames isn't a results

@marinegor
Copy link
Contributor

@marinegor are there any implications for parallel analysis?

Probably not, though if frames aren't numpy ndarrays, it could influence serialization.

But in general, I also oppose adding them to results since they aren't results, and were there even before the run started.

@orbeckst
Copy link
Member

The majority opinion appears to be to deprecate results.times and results.frames as produced by AnalysisFromFunction and then remove for 3.0 for consistency.

@orbeckst orbeckst added the deprecation Deprecated functionality to give advance warning for API changes. label Jul 10, 2024
@IAlibay IAlibay added this to the Release 3.0 milestone Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-Analysis deprecation Deprecated functionality to give advance warning for API changes.
Projects
None yet
Development

No branches or pull requests

5 participants