Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement private variables, getter/setters and methods using Symbols #684

Closed
Arnavion opened this issue Sep 17, 2014 · 8 comments
Closed
Labels
Declined The issue was declined as something which matches the TypeScript vision Suggestion An idea for TypeScript

Comments

@Arnavion
Copy link
Contributor

Compiler interface changes

  • New compiler flag: -symbolForPrivates

Codegen example

TS source (unchanged)

class Clazz {
    // Private member var
    private _var1: number;

    // Private member var with default value assigned in constructor
    private _var2: string = "defaultValue";

    // Constructor with two more private member var declarations, one with a default value
    constructor(private _var3: number, private _var4: string = "defaultValue") {
    }

    // Private method
    private _method_p(): number {
        return 5;
    }

    method(): number {
        // Accessing and calling private method
        return this._method_p();
    }

    // Private getter
    private get property_p(): number {
        // Access private member var
        return this._var1;
    }

    get var1(): number {
        // Access private getter
        return this._property_p;
    }

    static method(): number {
        return Clazz._method_p();
    }

    // Private static method
    private static _method_p(): number {
        return 6;
    }
}

JS output

// One per file. See footnote 1 below.
var __symbol =
    this.__symbol ||
    (1, eval)("this").Symbol ||
    function(name) { return name; };

var Clazz = (function () {
    // One Symbol for each private variable / property name / method
    // Symbol name is "__symbol_" + memberName and the original member name is passed as the parameter to Symbol() for easier debugging.
    var __symbol__var1 = __symbol("_var1");
    var __symbol__var2 = __symbol("_var2");
    var __symbol__var3 = __symbol("_var3");
    var __symbol__var4 = __symbol("_var4");
    var __symbol__method_p = __symbol("_method_p");
    var __symbol__property_p = __symbol("_property_p");
    var __symbolstatic__method_p = __symbol("_method_p"); // Statics prepend "__symbolstatic_" to the name but still have the original name as the parameter to Symbol().

    function Clazz(_var3, _var4) {
        if (_var4 === void 0) {
            _var4 = "defaultValue";
        }

        this[__symbol__var2] = "defaultValue";
        this[__symbol__var3] = _var3;
        this[__symbol__var4] = _var4;
    }

    Clazz.prototype[__symbol__method_p] = function () {
        return 5;
    };

    Clazz.prototype.method = function () {
        return this[__symbol__method_p]();
    };

    Object.defineProperty(Clazz.prototype, __symbol__property_p, {
        get: function () {
            return this[__symbol__var1];
        },
        enumerable: true,
        configurable: true
    });

    Object.defineProperty(Clazz.prototype, "var1", {
        get: function () {
            return this[__symbol__property_p];
        },
        enumerable: true,
        configurable: true
    });

    Clazz.method = function () {
        return Clazz[__symbolstatic__method_p]();
    };

    Clazz[__symbolstatic__method_p] = function () {
        return 5;
    };

    return Clazz;
})();

Footnotes

  1. __symbol

    This is a wrapper around runtime's implementation of Symbol if it has one (global.Symbol, where global is found using eval), or a function that just returns the name string if it doesn't. For example, in a runtime that doesn't have Symbol, __symbol("_var1") will become "_var1", which is functionally identical to the current TS codegen.

    I based the definition of __symbol around __extends, but it has these differences:

    • __extends is emitted once per file if there are any classes in the current file that have an extends clause. __symbol is emitted once per file if it has any classes that have a private member var / getter / setter / method. Perhaps emit it once per file if there are any classes at all, regardless of whether they have private members or not.
    • Unlike __extends, __symbol delegates to global.Symbol if available.
  2. The variables declared to hold the Symbols are named as "__symbol_" + originalMemberName for instance members, and "__symbolstatic_" + originalMemberName for static members. They are defined in the class closure, so their names are not visible in an outer scope. They are also only used at the class closure scope, so their names are visible but do not collide with any same-named variables inside constructor / method / getter / setter bodies.

  3. Of inheritance and same names:

    Derived
    private x: numberprivate get x(): numberprivate x(): numberpublic x: numberpublic get x(): numberpublic x(): number
    Baseprivate x: numberPrevented by the type-checker but can be allowed with symbols. this[__symbol_x] in a base class scope will still refer to Base.x and in a derived class scope will refer to Derived.x
    private get x(): number
    private x(): number
    public x: numberPrevented by the type-checker but can be allowed with symbols. this[__symbol_x] in a derived class scope will refer to Derived.x However it's probably too confusing to allow this.N/A
    public get x(): number
    public x(): number
    Derived
    private static x: numberprivate static get x(): numberprivate static x(): numberpublic static x: numberpublic static get x(): numberpublic static x(): number
    Baseprivate static x: numberPrevented by the type-checker but can be allowed with symbols. Base[__symbol_x] in a base class scope will refer to Base.x and Derived[__symbol_x] in a derived class scope will refer to Derived.x
    private static get x(): number
    private static x(): number
    public static x: numberPrevented by the type-checker but can be allowed with symbols. Derived[__symbol_x] will refer to Derived.x However it's probably too confusing to allow this.N/A
    public static get x(): number
    public static x(): number

Changes in interop with existing TS code / with existing JS libraries with .d.ts files

  • This is a whole program change, so everything that has private members or private member access will need to be recompiled. This doesn't affect third-party JS libraries that provide .d.ts files because .d.ts don't contain private members. But any library that's provided as TS source for development but deployed as separately-compiled JS source will break unless it's recompiled with the same switch.

    For example, suppose a hypothetical library L that provides an L.js file and an L.d.ts file. In this case there's no problem, because L.d.ts will not contain any private member declarations. Only classes defined in foo.ts will have private members and private member access using Symbol.

    But suppose the library provided an L.ts source instead of L.d.ts, and also a pre-compiled L.js that they compiled themselves (for example, they might provide a minified L.min.js, expecting the user to develop their code against L.ts but deploy the official L.min.js to their website). This L.ts source would contain private members. Then the user must make sure they compile L.js themselves and use that, not use the L.js provided by the official source. On second thought, since privates cannot escape the class scope and classes can't be split across files as of now, this isn't a problem. This will however be a problem if open classes are implemented in the future (i.e., all files that contain parts of an open class must be compiled with the same value of the switch).

  • As a result of this change, the following will have different behavior:

    • The results of Object's introspection functions, like getOwnPropertyNames, keys, etc.
    • Private statics will not be available on derived classes, because the static-copying code in __extends (for key in base, derived[key] = base[key]) will not enumerate symbol properties.

Open questions

  • Is it needed to test for this.__symbol first in the declaration of __symbol?
  • Is there a better way to get the global Symbol object for the definition of __symbol ? The presence of eval will cause the runtime to disable optimization for the whole scope, which is the whole file.
  • This will make accessing private members by bypassing the type-checker impossible difficult - (<any>obj)._privateMember will no longer work. One can try to hack around it by using Object.getOwnPropertySymbols - (<any>obj)[Object.getOwnPropertySymbols(obj).filter(function (symbol) { return symbol.toString() === "Symbol(_privateMember)"; })[0]] (obj.__proto__ for methods, getter and setters) but relying on Symbol.toString() this way is undefined behavior. (Relying on the output of Symbol.toString() is defined behavior.) Should there be an escape hatch? A way to disable this rewrite for some files or some classes?
  • Are there any concerns for inheritance? I don't think there are any right now, because sub-classes can't access private members. But what about if protected is implemented in the future? Would it use named access like publics / current privates? Or would there be some way to share symbols outside of the base class closure to derived class closures? If the latter, how will they be exposed in a way that does not collide with user-defined names (unlike footnote 2).
  • I'm not aware of any other TS features that change otherwise syntactivally valid JS code (this._method()) to something completely different (this[__symbol__method]()) based on the result of a type evaluation. In that respect, perhaps it'd be better to have the user explicitly write out the code that creates Symbols and indexes using them. However that would then require the compiler to understand Symbol specially so that it can provide type information.
@RyanCavanaugh
Copy link
Member

🌟 🏆 for the exceptionally complete write-up and accompanying PR.

Showing emit for static members might be useful. Note that instance and static members of the same name don't conflict -- should they re-use the same symbol variable if that happens?

Re: making access of private members impossible, that seems more like a feature than a deficit. It's certainly what a lot of people have been asking for.

@Arnavion
Copy link
Contributor Author

D'oh. I did test this on code that has private statics to make sure they were handled, but I didn't think about a static and non-static with the same name.

Technically, re-using the same name won't hurt - the TS source still has its own type-check pass that can distinguish between a reference to the static member and a reference to the instance member, so even if the final JS ends up using the same symbol it won't hurt. But since tsc is in complete control of the codegen anyway, it might as well generate a new name for it. I've updated the OP.

@Arnavion
Copy link
Contributor Author

I struck out the part about interop breaking if parts of the final JS were compiled with different values of the switch. This isn't a problem right now because all access to privates must be within the same class, and a class cannot be split across files (as of now).

If open classes are implemented in the future, then this change will require that all files that contain parts of the same class be compiled with the same value of the switch.

@NoelAbrahams
Copy link

👍 for clarity!

A few practical considerations:

  • Class instantiation is about 60% slower when using symbols.
  • Property and method access is about 90% slower when using symbols.
  • The codegen when using symbols is not minification-friendly; even advanced optimisation will not minify string literals such as __symbol("_property_p");

So in a runtime that doesn't have a native Symbol implementation, this is rather costly for no gain (because the codegen has the same behaviour as the non-symbol case). Even if a browser does support Symbol it's not clear that the problems listed above will disappear. In short, I'm not sure that TypeScript should implement private using symbols.

@Arnavion
Copy link
Contributor Author

Actually symbols will minify better. With the exception of Google Closure Compiler, I believe all minifiers do not rename private properties because there is no way to know when a property is private or being accessed via a string indexer. (GCC is able to do so in advanced mode because it is a whole program optimizer and so treats everything as rename-able.)

With symbols, the indexer is a variable var __symbol__property_p = __symbol("_property_p"); which would be minified to, say, var a = __symbol("_property_p");. Then all instances of this[__symbol__property_p] will become this[a], etc. Without symbols, this._property_p would remain this._property_p everywhere it's used.

Points about the perf impact are completely valid.

Edit: To clarify, the slow-down comes not from the use of Symbols but from the use of a variable indexer to access properties instead of dot-notation or a literal indexer. I edited @NoelAbrahams 's instantiation test into http://jsperf.com/privates-using-symbols-vs-current-codegen/5 to strip out everything except the constructor and the symbols it uses. Perf impact was unchanged.

I then added two more classes, ClazzWithIndexer that uses string indexers to set the properties, and ClazzWithIndirection that uses variable indexers to set the properties. This is similar to @NoelAbrahams 's test except that it doesn't even involve the call to __symbol to generate the name. Perf for ClazzCurrent and ClazzWithIndexer is identically excellent. Perf for ClazzWithIndirection and ClazzWithSymbols is identically terrible.

I did verify in V8 profiler (using %HasFastProperties and --trace-opt --trace-deopt) that the objects created from the Indirection and Symbols classes were not being put in slow-mode and that the constructor functions were optimized, so that's not the reason for the slow-down. It looks like the prop access becomes a table lookup as opposed to a pointer offset.

I've opened a thread on v8 mailing list about it.

@Arnavion
Copy link
Contributor Author

  • Added a footnote that the hack to work around the type-checker using Object.getOwnPropertySymbols is actually not undefined, because Symbol.toString()'s output is specified by the spec.
  • Added a footnote and tables about the behavior with inheritance and members of the same name in the base and derived class.
  • Added a point under interop about statics and inheritance. Before symbols, the inheritance shim would copy over all static members to the derived class. With symbols, it will not copy private static members because symbol properties are not enumerated by for-in.

@NoelAbrahams
Copy link

@Arnavion ,

I believe all minifiers do not rename private properties because there is no way to know when a property is private or being accessed via a string indexer.

Yes, that's true, but that's what we're hoping that TypeScript will fix. See #8.

@danquirk
Copy link
Member

Closing this given previous discussion and where we are with Symbols in general. @Arnavion had a nice PR for this linked earlier where upon further inspection we all agreed:

The general consensus is to wait and see how classes and symbols get used together.

Depending on how the future shapes up we can revisit this or start a new proposal based on what we learn.

@danquirk danquirk added Declined The issue was declined as something which matches the TypeScript vision and removed In Discussion Not yet reached consensus labels Apr 21, 2015
@microsoft microsoft locked and limited conversation to collaborators Jun 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Declined The issue was declined as something which matches the TypeScript vision Suggestion An idea for TypeScript
Projects
None yet
Development

No branches or pull requests

4 participants