Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistence of class membership #2

Closed
wlandau opened this issue May 11, 2020 · 6 comments
Closed

Persistence of class membership #2

wlandau opened this issue May 11, 2020 · 6 comments

Comments

@wlandau
Copy link
Member

wlandau commented May 11, 2020

While I mostly agree that extending S3 is reasonable, I am a bit concerned about the fluidity of the class attribute for our purposes. Specifically,

  1. Ad hoc assignment is possible with class<-() etc.
  2. Attributes tend to get dropped from objects at surprising times.

Do we want safeguards and formalization around class membership? Maybe formal construction/casting/coercion in addition to the already proposed validation?

@lawremi
Copy link
Collaborator

lawremi commented May 12, 2020

Yes, I think we should have formal construction and coercion, and direct class assignment should be discouraged if not disallowed. Would you be willing to make a PR adding this as a requirement?

@hadley
Copy link
Member

hadley commented May 13, 2020

I do think we need to be a little cautious here — while class<- is generally best avoided, I don't think it actually causes many problems in practice, and it's v occasionally useful in the hands of an expert.

@wlandau I think you need to be more specific about "attributes being dropped at surprising times". I think most problems are caused by the specific design of the generic/method, but it's hard to know without knowing exactly what you consider to be surprising.

@hadley
Copy link
Member

hadley commented May 13, 2020

However, there is a general principle to consider: what should methods do with attributes that they don't know about? These attributes could come from a subclass (and has not defined a more specific method), or could be set by the analyst to store some additional metadata. I think there are only two main choices:

  • Should you preserve all unknown attributes? The primary downside is that if the attribute is vectorised (i.e. its size depends on of the size of the underlying vector), you may create a corrupt object.

  • Should you drop all unknown attributes? This ensures that you always create valid objects, but means that any user supplied attributes will be dropped.

Based on our experiences with the tidyverse, dropping unknown attributes appears to cause quite a lot of pain in the user community, and in practice it appears that relatively few attributes are vectorised so object corruption is relatively rare (and can be fixed by the class maintainer defining a new method).

Closely related is the problem of handling subclasses in methods, as discussed in https://adv-r.hadley.nz/s3.html#s3-subclassing — in essence, if you want your class to subclassable, then you have to ensure every method you write uses a generic object constructor. Otherwise the implementor of the subclass has to provide methods for every method that the class provides, meaning that inheritance doesn't save you any work.

@lawremi
Copy link
Collaborator

lawremi commented May 13, 2020

It's not clear where class<-() would be needed if we had a robust coercion mechanism.

With respect to attribute dropping, in a formal class-based framework, the question does boil down to subclass preservation.

In Bioconductor, we've long promoted calling initialize() when performing simple transformations. By default (how we use it), initialize() will take an existing object and only replace the specified slots, so the class is preserved. The Vector class in S4Vectors allows subclasses to register vectorized slots that are handled appropriately across stuff like subsetting. In contrast, destructive operations like aggregation do not attempt to preserve the subclass.

This is a problem for every system that allows for inheritance. Inheritance is a strong form of coupling; in general the subclass must be aware of every super class operation and whether it would violate the contract of the subclass. Perhaps the requirement here is that we should provide tools like initialize() that make it easier to implement the appropriate behavior, which depends on the context.

@wlandau
Copy link
Member Author

wlandau commented May 17, 2020

Yes, I think we should have formal construction and coercion, and direct class assignment should be discouraged if not disallowed. Would you be willing to make a PR adding this as a requirement?

Sure, thanks @lawremi. I will try to get to it soon. Last week was super busy.

I think you need to be more specific about "attributes being dropped at surprising times". I think most problems are caused by the specific design of the generic/method...

@hadley, I was originally thinking of the behavior of c() mentioned here, as well as gotchas like this one. Indeed, the generic is responsible for attribute dropping in both cases, as the docs explicitly mention. (And as an aside, I love how vctrs helps us avoid some of these pitfalls.) But it concerns me that class is thrown out like the rest, mainly because it silently allows forms of coercion that the author of the class did not declare and may not have considered. I feel that the type of an object should be less brittle than an attribute.

@wlandau
Copy link
Member Author

wlandau commented May 17, 2020

However, there is a general principle to consider: what should methods do with attributes that they don't know about?

I wonder, could the new system allow the class author to decide on a case by case basis? I agree, sometimes corrupt objects result from keeping unknown attributes. At other times, all the user wants to do with the subclass is to add new plot() and summary() methods without interfering with the object itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants