Skip to content

Latest commit

 

History

History
432 lines (352 loc) · 23.1 KB

DIP1042.md

File metadata and controls

432 lines (352 loc) · 23.1 KB

ProtoObject

Field Value
DIP: 1042
Review Count: 1
Authors: Robert Aron, Eduard Staniloiu, Razvan Nitu
Implementation:
Status: Withdrawn

Abstract

Every class defined in the D language has Object as the root ancestor. Object defines four methods: toString, toHash, opCmp, and opEquals. At a first glance, their presence might not appear problematic, but they do more harm than good. Their signatures predate the introduction of the @nogc, nothrow, pure, and @safe function attributes, and also of the const, immutable, and shared type qualifiers. As a consequence, these methods make it difficult to use Object with qualifiers or in code with properties such as @nogc, pure, or @safe. We propose the introduction of a new class, ProtoObject, as the root class and ancestor of Object. ProtoObject defines no methods and requires the user to implement the desired behaviour through interfaces. This approach enables users to opt-in for the behavior that makes sense for their classes, and the design is flexible enough to allow the use of future attributes and language improvements without breaking code.

Contents

Rationale

The current definition of Object is:

class Object
{
    private __ImplementationDefined __mutex;
    string toString();
    nothrow @trusted size_t toHash();
    int opCmp(Object o);
    bool opEquals(Object o);
    static Object factory(string classname);
}

This definition is missing a number of desirable attributes and qualifiers. It is possible to define an improved class that inherits Object and overrides these primitives as follows:

class ImprovedObject
{
const override pure @safe:
    string toString();
    nothrow size_t toHash();
    int opCmp(const Object);
    bool opEquals(const Object);
}

ImprovedObject may be defined in user code (or even in the core runtime library) and inherited by all user-defined classes in a project for a better experience. However, ImprovedObject still has a number of issues:

  • The hidden member __mutex, needed for synchronized sections of code, is still present whether it is used or not. The standard library uses synchronized for six class types out of the over seventy classes it introduces. At best, the mutex should be opt-in.
  • The toString method cannot be implemented meaningfully if it requires @nogc. This is because of its signature; constructing a string and returning it will often create garbage by necessity. A better implementation would accept an output range in the form of a delegate(scope const(char)[]) that accepts, in successive calls, the rendering of the object as a string.
  • The opCmp and opEquals objects need to take const Object parameters, not const ImprovedObject. This is because overriding with covariant parameters would be unsound and is therefore not allowed. Using the weaker type const Object in the signature defers checks to runtime that should be done during compilation.
  • Overriding opEquals requires the user to also override toHash accordingly; two objects that are equal must have the same hash value.
  • opCmp reveals an outdated design and implementation. Its presence was historically required by built-in associative arrays, which used binary trees dependent upon ordering. The current implementation of associative arrays uses hashtables that lift the requirement. In addition, not all objects can be meaningfully ordered, so the best approach is to make comparison opt-in. Ordering comparisons in other class-based languages are done by means of interfaces, e.g., Comparable<T> in Java or IComparable<T> in C#.
  • The static method factory is a global dependency sink because it allows creating an instance of any class in the application from a string containing its name. Currently, there is no way for a class to opt out. This feature creates code bloat in the generated executable. At best, class factory registration should be opt-in because only a small number of classes (none in the standard library) require the feature.
  • The current approach doesn't give the user the chance to opt in or out of certain functions (behaviors). There can be cases where the imposed methods don't make sense for the class typem, e.g., not all abstractions are of comparable types.
  • Because of the hidden __mutex member, and the fact that the D programming language supports function attributes, the design of Object is susceptible to the Fragile Base Class Problem. This states that a small and seemingly unrelated change in a base class can lead to bugs and breakages in derived classes.

To provide a real example of the problem, the following code compiles:

class C { int a; this(int) @safe {} }

void main()
{
    C c = new C(1);
    C[] a = [c, c, c];
    assert(a == [c, c, c]);
}

whereas the next section of code fails to compile with the message "incompatible types for array comparison: C[] and C[3]":

class C { int a; this(int) @safe {} }

@safe void main()
{
    C c = new C(1);
    C[] a = [c, c, c];
    assert(a == [c, c, c]);
}

It fails because the non-safe Object.opEquals method is called in a safe function. In fact, just comparing two classes with no user-defined opEquals, e.g., assert (c == c), will issue an error in @safe code: "@safe function D main cannot call @system function object.opEquals".

To make it work, a new root of all classes (in our case ProtoObject) and the Equals interface are needed, as well as a mixin template that provides the implementation for opEquals. The C class must inherit from ProtoObject and Equals, and it must instantiate the mixin template as a field:

class C : ProtoObject, Equals
{
    int a;
    this(int) @safe {}
    mixin ImplementEquals;
}

In druntime/object.d:

class ProtoObject {  }

interface Equals
{
    const @nogc nothrow pure @safe scope
    int equals(scope const ProtoObject rhs);
}

mixin template ImplementEquals(M...)
{
    // Equals mixin template implementation
}
...

The final points are of crucial importance:

  • Backward compatibility is a must. The introduction of ProtoObject must not break code.
  • The user must be allowed to chose which methods to implement.
  • Root objects must work in attributed code without issues. Since we can't predict the future and know which, if any, attributes and qualifiers will be available in the language, this is yet another argument to have a ProtoObject with no methods.

Prior Work

Both Java and C# have an Object class as the superclass of all classes. For both languages, Object defines a set of default methods, with Java baking into Object much more than C# does.

Java

Java's java.lang.Object defines the following methods:

Method name Description
boolean equals(Object o); Provides a generic way to compare objects.
Class getClass(); The Class class provides more information about the object.
int hashCode(); Returns a hash value that is used to search objects in a collection.
void notify(); Used in synchronizing threads.
void notifyAll(); Used in synchronizing threads.
String toString(); Can be used to convert the object to String.
void wait(); Used in synchronizing threads.
protected Object clone() throws CloneNotSupportedException; Return a new object that is identical to the current object.
protected void finalize() throws Throwable; Called just before an object is garbage collected.

Each object also has an Object monitor that is used in Java synchronized sections. Basically, it is a semaphore indicating if a thread has entered a critical section. Before a critical section can be entered, the thread must obtain an Object monitor. Only one thread at a time can own that object's monitor.

We can see that there are quite a few similarites between D and Java:

  • both define opEquals and getHash;
  • getClass is similar to typeid(obj);
  • clone gives us access to a Prototype like creation pattern; in D we have the Factory Method;
  • both have an Object monitor.

C#

C#'s System.Object defines the following methods that will be inherited by every C# class:

Method name Description
GetHashCode() Retrieves a number unique to that object.
GetType() Retrieves information about the object like method names, the objects name, etc.
ToString() Converts the object to a textual representation, usually for outputting to the screen or file.

C# stripped a lot from its Object, comparint to Java and comes much closer to what we desire:

  • Firstly, there is no builtin monitor. The Monitor object is implemented as a separate class that inherits from Object in the System.Threading namespace.
  • Secondly, C# has a smaller number of imposed methods, but they are still imposed, and toString will continue to be GC-dependent.

Rust

Rust has a completely different approach: data aggregates (structs, enums, and tuples) are all unrelated types. A developer can define methods on them and make the data itself private, all the usual tactics of encapsulation, but there is no subtyping and no inheritance of data.

The relationships between various data types in Rust are established using traits. Rust's Traits are like interfaces in OOP languages, and they must be implemented, using impl, for the desired type.

trait Quack {
    fn quack(&self);
}

struct Duck ();

impl Quack for Duck {
    fn quack(&self) {
        println!("quack!");
    }
}

After looking at how Java, C#, and Rust have tackled the same problem, we are confident that our proposed solution is a good one: define an empty root object and define interfaces that expose the desired behavior of the classes that implement them. It can be argued that one is not really interested in what type an object is, but rather if it can perform a certain action (has a certain behavior). A key requirement in the design is that existing code must continue to work; new code should be able to use ProtoObject and reap the benefits.

Description

We propose a simple, economic, and backward-compatible approach for code to avoid Object: define simpler types as supertypes of Object as follows:

class ProtoObject
{
    // no monitor, no primitives
}
class SynchronizedProtoObject : ProtoObject
{
    private __ImplementationDefined __mutex;
}
class Object : SynchronizedProtoObject
{
    ... definition unchanged ...
}

This reconfiguration makes ProtoObject, not Object, the ultimate root of all classes. Classes that don't specify a base still inherit Object by default, thus preserving backward compatibility. But new code has the option (and will be well advised) to use ProtoObject as an explicit base class.

This proposal is based on the following key insight. Currently, Object has two roles: (a) the root of all classes, and (b) the default supertype of class definitions that don't specify one. There is no requirement that these two roles are fulfilled by the same type. This proposal maintains Object as the default supertype in class definitions, thereby preserving the behavior of existing code. The additional supertypes of Object only affect newly-introduced code that is aware of them and uses them.

The recommended approach to go forward with types that inherit from ProtoObject is to write and implement interfaces that expose the desired behavior for the types that it supports. On the basis that one is not really interested in what type an object is, but rather which actions it can perform (what types can it act like?), an object of type T can be treated as a Collection, a Range, or a Key in a map, provided that it implements the correct interfaces.

The GoF Design Patterns book talks at length about preferring implementing interfaces to inheriting from concrete classes and why we should favor object composition over class inheritance. Extending from concrete classes is usually seen as a form of code reuse, but this is easly overused by programmers and can lead to the fragile base class problem. The same code reuse can be achieved through composition and delegation schemes; we already do this with structs and Design by Introspection.

Users will be required to implement specific interfaces that define methods corresponding to those in Object:

Interface Method name
Stringify string toString();
Hash size_t toHash();
Ordered int opCmp(const Object);
Equals bool opEquals(const Object);

Ordering

To enable comparison of two objects, we define the Ordered interface.

interface Ordered
{
    const @nogc nothrow pure @safe scope
    int cmp(scope const ProtoObject rhs);
}

The cmp function takes a ProtoObject argument. This is required so we can compare two instances of ProtoObject. Let's see an example:

int __cmp(ProtoObject p1, ProtoObject p2)
{
    if (p1 is p2) return 0;
    Ordered o1 = cast(Ordered) p1;
    if (o1 is null) return -1; // we can't compare
    return o1.cmp(p2);
}

As the example shows, if we can't dynamic cast to Ordered, then we can't compare the instances. Otherwise, we can safely call the cmp function and let dynamic dispatch do the rest.

Any other class, class T, that desires to be comparable, needs to extend ProtoObject and implement Ordered.

class Book : ProtoObject, Ordered
{
    enum BookFormat { pdf, epub, paperback, hardcover }

    ulong isbn;
    BookFormat format;

    int cmp(scope const ProtoObject rhs)
    {
        auto rhsBook = cast(Book) rhs;
        if (rhs is null) return 1;
        return this.isbn - rhs.isbn;
    }
}

Ordered implies that implementing types form a total order. An order is a total order if it is (for all a, b and c):

  • total and antisymmetric: exactly one of a < b, a == b, or a > b is true; and
  • transitive, a < b and b < c implies a < c. The same must hold for both == and >.

As one can expect, most of the classes that implement Ordered will require some boilerplate code. Inspired by Rust's derive, we have the ImplementOrdered(M...) template mixin. This provides the basic implementation for Ordered and more.

In it's most basic form, ImplementOrdered will go through all the members of the implementing type and compare them with rhs:

@safe unittest
{
    class Book : ProtoObject, Ordered
    {
        mixin ImplementOrdered;
        enum BookFormat { pdf, epub, paperback, hardcover }

        ulong isbn;
        BookFormat format;

        this(ulong isbn, BookFormat format)
        {
            this.isbn = isbn;
            this.format = format;
        }
    }

    auto b1 = new Book(12345, Book.BookFormat.pdf);
    auto b2 = new Book(12345, Book.BookFormat.paperback);
    assert(b1.cmp(b2) != 0);
}

In the case above, comparing the two books, b1 and b2, won't result in an equivalence order (cmp -> 0), even if they should, as a book is identified by isbn regardless of its format. This is because ImplementOrdered, by default, will go through all the members of the class and compare them. This is why ImplementOrdered allows you to specify only the members that you wish to compare.

@safe unittest
{
    class Book : ProtoObject, Ordered
    {
        mixin ImplementOrdered!("x");
        enum BookFormat { pdf, epub, paperback, hardcover }

        ulong isbn;
        BookFormat format;

        this(ulong isbn, BookFormat format)
        {
            this.isbn = isbn;
            this.format = format;
        }
    }

    auto b1 = new Book(12345, Book.BookFormat.pdf);
    auto b2 = new Book(12345, Book.BookFormat.paperback);
    assert(b1.cmp(b2) == 0);
}

To maximize flexibility, ImplementOrdered also accepts a comparator function to apply to each member:

@safe unittest
{
    class Widget : ProtoObject, Ordered
    {
        mixin ImplementOrdered!("x", "y", (int a, int b) => a - b, "z", (int a, int b) => a - b + 1, "t");
        int x, y, z, t;
        /* code */
    }
}

There are cases where it is not desirable to implement the default comparison operation every field of the aggregate. In such cases, especially if there are a lot of member fields, it would be unpleasant to pass ImplementOrdered the name of all the fields we want to take into account while comparing; it would be cleaner and easier to inform it about the exceptions. This is why we define ImplementOrderedExcept.

@safe unittest
{
    class Widget : ProtoObject, Ordered
    {
        mixin ImplementOrderedExcept!("unorderableField");
        int x, y, z, t;
        size_t unorderableField;
        /* code */
    }
}

Equality

Checking for equality follows in the footsteps of ordering and requires the user to implement the Equals interface.

interface Equals
{
    const @nogc nothrow pure @safe scope
    int equals(scope const ProtoObject rhs);
}

Again, as is the case with ordering, rhs is a ProtoObject, so we can treat the worst-case scenario: compare two objects and all we know is that they are ProtoObjects.

bool __equals(ProtoObject p1, ProtoObject p2)
{
    if (p1 is p2) return true;
    Equals o1 = cast(Equals) p1;
    Equals o2 = cast(Equals) p2;
    if (o1 is null || o2 is null) return false;
    return o1.equals(o2) && o2.equals(o1);
}

An important note: it isn't sufficient for o1 to say it's equal to o2; o2 must also agree. Think about comparing a Base and a Derived instance. The base could be equal to the derived, but the derived might have some extra data so it won't be equal to the base instance.

As is the case with Ordered, to aid the user with boilerplate, we provide ImplementEquals and ImplementEqualsExcept that respect the same design. "Consistency, consistency, consistency" (Scott Meyers).

Hashing

The user must implement the Hash interface if to provide hashable behavior.

interface Hash
{
    const @nogc nothrow pure @safe scope
    size_t toHash();
}

Again, we provide the user with default implementations for a hashing function in the form of ImplementHash and ImplementHashExcept.

Implementations of Ordered, Equals, and Hash must agree with each other. That is, a.cmp(b) == 0 if and only if (a == b) && (a.toHash == b.toHash). It's easy to accidentally make them disagree by mixing in some of the interface implementations and manually implementing others.

Breaking Changes and Deprecations

No breaking changes are anticipated because this proposal provides an alternative to, not a complete redesign of, the default class hierarchy.

Acknowledgments

This DIP is based upon the idea proposed by Andrei Alexandrescu and previously worked on by Eduard Staniloiu.

Reference

Eduard Staniloiu on the default class hierarchy

Copyright & License

Copyright (c) 2022 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

Reviews

Community Review Round 1

Reviewed Version

Discussion

Feedback

The following items were raised in the feedback thread:

  • Because Object is no longer guaranteed to be a supertype of all class instances, this does not maintain backwards compatibility and can cause silent breakage when a library maintainer switches an API returning Object to return ProtoObject. It would be better to change Object to be the empty top type and the default superclass, as there would then be less silent breakage. The DIP author replied that it's up to the library maintainer to advertise the breaking change in the library's API and to release a new major version. Another commented suggested this potential breakage in third-party libraries should be noted in the DIP. The DIP author believes this case is exaggerated.
  • cmp should not return -1 when it can't compare class instances.
  • The DIP implies comparison order is total, but it really isn't. As an example, consider the comparison of two ProtoObject instances that do not implement Ordered, and at least one of which is not null.
  • @nogc and nothrow are overly strong for a comparator.
  • toHash should take a seed value.
  • An alternative approach is provided in a pull request for @safe class comparisons. The DIP author responded that the ProtoObject approach is cleaner and more flexible since it is opt-in, requires no deprecations, and causes no breakage.
  • The DIP does not mentions Stringify but does not include it.
  • The Implement* mixins are orthogonal to this proposal.
  • The DIP does not address attributes on ~this.
  • The DIP should discuss the alternative of putting the burden on the compiler rather than on the programmer, e.g., the compiler can remove the monitor from an object instance if it is unused, rather than requiring the programmer to opt-in to using it.
  • ProtoObject as the root class would deviate from the norm of Object root in other OO languages, which will confuse and annoy some programmers.
  • More motivation is required for having more than one root.
  • Interfaces have an overhead of 8-bytes per interface per instance and should be avoided.
  • The list of popular languages in the Prior Work section should include academic languages.
  • The DIP should include a discussion of adding a reference count and how it can be merged with a monitor in a single field.