Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created a branded type for identifier-escaped strings #16915

Merged
merged 26 commits into from
Jul 6, 2017

Conversation

weswigham
Copy link
Member

@weswigham weswigham commented Jul 3, 2017

Then flowed it throughout the compiler, finding and fixing a handful of bugs relating to underscore-prefixed identifiers in the process. Includes a test for two (new) cases noticed - diagnostics from conflicting symbols from export *'s, and enums with underscore prefixed member emit.

I mentioned this while talking among the team last week. This can be done in place of/alongside #16868.

Additionally, this fixes #15334, #14268, #11902, and #3268. (Each should also now exist as a test case in this PR, too)

The shape of this brand is also rather unique compared to others we've used. Instead of just an intersection of a string and an object, it is that union-ed with an intersection of void and an object. This makes it wholly incompatible with a normal string (which is good, it cannot be misused on assignment or on usage), while still being comparable with a normal string via === (also good) and castable from a string.

Summary:

__String is used to represent a string whose leading underscore have been
escaped (if present) or a string representing an internal compiler symbol name,
such as "__call". To escape a string in this way, call escapeLeadingUnderscores.
To unescape such a string, call unescapeLeadingUnderscores. Strings of this kind
are used to represent symbol names and identifier text, to allow for internal symbol
names to cohabitate the same symbol table as normal members without being in conflict.
The brand on this type is structured such that it is castable and comparable with
normal strings, but a normal string cannot be used in its place and it can not
be used in place of a normal string (to prevent escaped text from leaking out via
diagnostic messages and the like). The union member enabling this is the member intersected
with void, which makes it incompatible with anything asking for a string.

Then flowed it throughout the compiler, finding and fixing a handful of
bugs relating to underscore-prefixed identifiers in the process.
Includes a test for two cases noticed - diagnostics from conflicting
symbols from export *'s, and enum with underscore prefixed member emit.
@weswigham weswigham changed the title Created a branded type for escaped strings Created a branded type for identifier-escaped strings Jul 3, 2017
@weswigham
Copy link
Member Author

cc @sandersn and @andy-ms - you're both assigned to some of the issues this PR fixes.

@@ -147,7 +147,7 @@ namespace ts {
options = opts;
languageVersion = getEmitScriptTarget(options);
inStrictMode = bindInStrictMode(file, opts);
classifiableNames = createMap<string>();
classifiableNames = createMap<EscapedIdentifier>() as EscapedIdentifierMap<EscapedIdentifier>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use createEscapedMap as this occurs many times.

@@ -3005,7 +3005,24 @@ namespace ts {
isRestParameter?: boolean;
}

export type SymbolTable = Map<Symbol>;
export type EscapedIdentifier = (string & { __escapedIdentifier: void }) | (void & { __escapedIdentifier: void });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the union for?
Also, would name this EscapedString since it's not an escaped version of Identifier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm, the name's actually a little hard, since a plain escaped string I would expect to be a string where " has been replaced with \" to escape it (which we do! elsewhere!). I called it EscapedIdentifier at first since Identifier was the only documented place where we stored strings escaped this way - except that was by no means true. From identifiers, it leaked into symbol names (which was desirable, based on use), and from there, even string literals were (sometimes and inconsistently) escaped to be compatible with symbol names. All the
inconsistency is gone now, but the name definitely no longer makes sense. What about UnderscoreEscapedString?

Copy link
Member Author

@weswigham weswigham Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, re: the shape:

The shape of this brand is also rather unique compared to others we've used. Instead of just an intersection of a string and an object, it is that union-ed with an intersection of void and an object. This makes it wholly incompatible with a normal string (which is good, it cannot be misused on assignment or on usage), while still being comparable with a normal string via === (also good) and castable from a string.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move that to a doc comment?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnderscoreEscapedString is a bit verbose for such a common type. EscapedName might be better.

const name = getNameOfDeclaration(node);
if (name) {
if (isAmbientModule(node)) {
return isGlobalScopeAugmentation(<ModuleDeclaration>node) ? "__global" : `"${(<LiteralExpression>name).text}"`;
const moduleName = name.kind === SyntaxKind.Identifier ? unescapeIdentifier((<Identifier>name).text) : (<LiteralExpression>name).text;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If isAmbientModule, the name should always be a string literal.

}
if (name.kind === SyntaxKind.ComputedPropertyName) {
const nameExpression = (<ComputedPropertyName>name).expression;
// treat computed property names where expression is string/numeric literal as just string/numeric literal
if (isStringOrNumericLiteral(nameExpression)) {
return nameExpression.text;
return nameExpression.text && escapeIdentifier(nameExpression.text);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text shouldn't be optional on a literal, right?

@@ -1028,7 +1029,7 @@ namespace ts {
function bindBreakOrContinueStatement(node: BreakOrContinueStatement): void {
bind(node.label);
if (node.label) {
const activeLabel = findActiveLabel(node.label.text);
const activeLabel = findActiveLabel(unescapeIdentifier(node.label.text));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to just always use an escaped name in an ActiveLabel.

@@ -85,11 +85,13 @@ namespace ts.NavigateTo {

function getTextOfIdentifierOrLiteral(node: Node) {
if (node) {
if (node.kind === SyntaxKind.Identifier ||
node.kind === SyntaxKind.StringLiteral ||
if (node.kind === SyntaxKind.Identifier) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern has occurred a few times, could use a helper function.

@@ -580,12 +580,12 @@ namespace ts.NavigationBar {
// Otherwise, we need to aggregate each identifier to build up the qualified name.
const result: string[] = [];

result.push(moduleDeclaration.name.text);
result.push(moduleDeclaration.name.kind === SyntaxKind.Identifier ? unescapeIdentifier(moduleDeclaration.name.text) : moduleDeclaration.name.text);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern has occurred a few times, could use a helper function.

@@ -609,11 +609,13 @@ namespace ts {

function getTextOfIdentifierOrLiteral(node: Node) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here is a helper function!

@@ -2111,7 +2113,7 @@ namespace ts {
node.parent.kind === SyntaxKind.ExternalModuleReference ||
isArgumentOfElementAccessExpression(node) ||
isLiteralComputedPropertyDeclarationName(node)) {
setNameTable((<LiteralExpression>node).text, node);
setNameTable((<LiteralExpression>node).text as EscapedIdentifier, node); // Literal expressions are escaped
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseLiteralLikeNode doesn't seem to escape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! When I started it did (as it does right now), and that leaked escaped strings to wholly too many places. This probably surfaces as a bug.

@@ -1272,7 +1272,7 @@ namespace ts {
// If this is an export or import specifier it could have been renamed using the 'as' syntax.
// If so we want to search for whatever is under the cursor.
if (isImportOrExportSpecifierName(location) || isStringOrNumericLiteral(location) && location.parent.kind === SyntaxKind.ComputedPropertyName) {
return location.text;
return location.kind === SyntaxKind.Identifier ? unescapeIdentifier((location as Identifier).text) : (location as LiteralLikeNode).text;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a helper function

@ghost
Copy link

ghost commented Jul 5, 2017

  • Could use a string enum for special identifiers such as "__call", and include it in the EscapedIdentifier type. Would cut down on casting and help us document what things escaping is used for.
  • It might make more sense to only escape symbol tables, not every identifier. But that could be left until later.

@weswigham
Copy link
Member Author

@andy-ms

Could use a string enum for special identifiers such as "__call", and include it in the EscapedIdentifier type. Would cut down on casting and help us document what things escaping is used for.

Would "default" and "export=" and "args" be included in this list? Should all the global symbols we look up for our diagnostics be included (Iterator, Iterable, etc)? Or should it just be the ones with leading underscores?

@ghost
Copy link

ghost commented Jul 5, 2017

I think if it's a case where a string literal just happens to not need escaping (e.g. Iterator) it shouldn't be in the enum.

default would fall under that since the module namespace object would actually have a property named default.

export= is an unusual case, since it's a special name, not meant to be accessed by a property, but doesn't begin with underscores. (Maybe it should.) I think it should be in the enum because it's a special name and not accessible as a normal name.

Don't know about "args" since this is the first I've seen it.

@weswigham
Copy link
Member Author

"args" is a synthetic symbol we make when checking using jsdoc. I don't think it's supposed to actually exist inside any function bodies or other lexical scopes (but it might be simulated so it exists within the type annotation).

@weswigham
Copy link
Member Author

@andy-ms I think I've responded to all your feedback and I've implemented the list of internal symbol names as part of the brand type declaration (which, IMO, is almost more useful as a source of documentation than anything else).

@@ -1227,7 +1226,9 @@ namespace ts {

function parsePropertyNameWorker(allowComputedPropertyNames: boolean): PropertyName {
if (token() === SyntaxKind.StringLiteral || token() === SyntaxKind.NumericLiteral) {
return <StringLiteral | NumericLiteral>parseLiteralNode(/*internName*/ true);
const node = <StringLiteral | NumericLiteral>parseLiteralNode();
internIdentifier(node.text);
Copy link

@ghost ghost Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internIdentifier won't have the desired effect if used as a statement; it returns the interned string, but it won't set node.text to it. I like the way it was before.

Copy link
Member Author

@weswigham weswigham Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it, but according to my knowledge of strings in V8 (which, admitted, is gleaned from stackoverflow answers), reassignment of the value from the map isn't required to make the runtime intern the string.

It does this:

let identifier = identifiers.get(text);
if (identifier === undefined) {
    identifiers.set(text, identifier = text);
}
return identifier;

Its primary use (now that it is no longer handling escaping things) in our code is setting up the identifiers list which is used in the emitter and declaration emitter for knowing what identifiers it needs to avoid to generate names. Secondarily, this will cause the string to be interned in V8, since it's used as a property name (or map key in ES6) - a reassignment of the version from the map should not be required (since the same string primitive that's passed in is what gets interned (thus is already assigned to the object) or is already interned if already in the map).

TBQH I feel like the function should be renamed, since interning is not its primary use. Could probably reduce the space used by it a little bit by making it a Map<true>, too.

@@ -4098,7 +4099,7 @@ namespace ts {
indexedAccess.argumentExpression = allowInAnd(parseExpression);
if (indexedAccess.argumentExpression.kind === SyntaxKind.StringLiteral || indexedAccess.argumentExpression.kind === SyntaxKind.NumericLiteral) {
const literal = <LiteralExpression>indexedAccess.argumentExpression;
literal.text = internIdentifier(literal.text);
internIdentifier(literal.text);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, don't use this as a statement

@@ -5624,7 +5625,8 @@ namespace ts {
node.flags |= NodeFlags.GlobalAugmentation;
}
else {
node.name = <StringLiteral>parseLiteralNode(/*internName*/ true);
node.name = <StringLiteral>parseLiteralNode();
internIdentifier(node.name.text);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -3005,7 +3005,42 @@ namespace ts {
isRestParameter?: boolean;
}

export type SymbolTable = Map<Symbol>;
export type InternalSymbolName =
Copy link

@ghost ghost Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better as a real string enum. Also, could you add the comment to the code about why (void & { __escapedIdentifier: void }) is present?

Copy link
Member

@sandersn sandersn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments. Still looking.

export type UnderscoreEscapedString = (string & { __escapedIdentifier: void }) | (void & { __escapedIdentifier: void }) | InternalSymbolName;

/** EscapedStringMap based on ES6 Map interface. */
export interface UnderscoreEscapedMap<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid question, but why can't this just be type EscapedNameMap<T> = Map<T, EscapedName> ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We originally didn't make the key of Map generic, so it's hardcoded as string. Although maybe a better way to do this would be to define ESMap<K, V> and Map<V> = ESMap<string, V> and UnderscoreEscapedMap<V> = ESMap<UnderscoreEscapedString, V>.

switch (name.kind) {
case SyntaxKind.Identifier:
return (<Identifier>name).text;
case SyntaxKind.StringLiteral:
case SyntaxKind.NumericLiteral:
return (<LiteralExpression>name).text;
return (<LiteralExpression>name).text as UnderscoreEscapedString;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't text always UnderscoreEscapedString, at least for LiteralExpression?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this might just be an unsafe cast. Should it be escaped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, this is probably unsafe, given where this is used. I'll add the escapes, rather than casts.

export function isIntrinsicJsxName(name: UnderscoreEscapedString | string) {
// An escaped identifier had a leading underscore prior to being escaped, which would return true
// The escape adds an extra underscore which does not change the result
const ch = (name as string).substr(0, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: How does this code tell us that a string is an intrinsic JSX name? Its first character is lowercase? notAnInstrinsic seems like an obvious counterexample.

Actually, this is probably a fine predicate, just one that shouldn't be promoted to utilities.ts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already here? It's used within the checker and the jsx transform. Also, a JSX intrinsic is just any tag that starts with a lowercase letter (ie, body), indicating that its type needs to be looked up globally rather than locally, right?

export function unescapeIdentifier(identifier: string): string {
return identifier.length >= 3 && identifier.charCodeAt(0) === CharacterCodes._ && identifier.charCodeAt(1) === CharacterCodes._ && identifier.charCodeAt(2) === CharacterCodes._ ? identifier.substr(1) : identifier;
export function unescapeLeadingUnderscores(identifier: UnderscoreEscapedString): string {
return (identifier as string).length >= 3 && (identifier as string).charCodeAt(0) === CharacterCodes._ && (identifier as string).charCodeAt(1) === CharacterCodes._ && (identifier as string).charCodeAt(2) === CharacterCodes._ ? (identifier as string).substr(1) : identifier as string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: could you create a temp here to make the line shorter? (Maybe name it id. :)

@@ -1203,7 +1202,7 @@ namespace ts {
if (token() !== SyntaxKind.Identifier) {
node.originalKeywordKind = token();
}
node.text = internIdentifier(scanner.getTokenValue());
node.text = escapeLeadingUnderscores(internIdentifier(scanner.getTokenValue()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internIdentifier used to escape before interning. Now the code interns, then escapes. Is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct for how we use the identifiers map the internIdentifier function uses (namely, looking for collisions in generated identifiers).

@@ -0,0 +1,60 @@
tests/cases/compiler/index.tsx(33,1): error TS2308: Module "./b" has already exported a member named '__foo'. Consider explicitly re-exporting to resolve the ambiguity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't expect a test named doubleUnderscoreSymbolsAndEmit to have errors. Is this expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I wanted the js output of the enum (which was bugged) and this diagnostic (which used to have an extra underscore) - I could split it into two tests if you'd prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just change the name to omit 'Symbols'

Copy link
Member Author

@weswigham weswigham Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late - already split this file into about 4 unique tests which test different things with double underscore prefixes. 🐱

@@ -3974,8 +4009,8 @@ namespace ts {
* @param identifier The escaped identifier text.
* @returns The unescaped identifier text.
*/
export function unescapeIdentifier(identifier: string): string {
return identifier.length >= 3 && identifier.charCodeAt(0) === CharacterCodes._ && identifier.charCodeAt(1) === CharacterCodes._ && identifier.charCodeAt(2) === CharacterCodes._ ? identifier.substr(1) : identifier;
export function unescapeLeadingUnderscores(identifier: UnderscoreEscapedString): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should definitely be documented as breaking change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an alias to the old name with an @deprecated jsdoc.

Copy link

@ghost ghost Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fact that identifier.text is no longer a string is a breaking change and should have an entry here. See #17020.

Copy link
Member

@sandersn sandersn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  1. It's really scary that you fixed tens of bugs, especially in the language service, and didn't see any changes in the tests.
  2. I don't have a good understanding of which type to use when I'm writing new code in the checker or binder.

To fix (2), please add to the PR summary. And maybe to the wiki or to https://sandersn.github.io/manual/Typescript-compiler-implementation.html for future reference.

To fix (1), we'd need to write tons of tests for code that's now checked by the compiler anyway. So it's probably not needed.

@@ -1505,7 +1506,8 @@ namespace ts.Completions {
// TODO(jfreeman): Account for computed property name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments look old and probably out of date. Can you update them? (Just remove Jason's name from the TODO if it still applies, I guess.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed Jason's name, but I think computed property names might still not be handled specially - if a computed property is returned from getNameOfDeclaration, getEscapedTextOfIdentifierOrLiteral will quietly return undefined.

| "export=" // Export assignment symbol
| "default"; // Default export symbol (technically not wholly internal, but included here for usability)

export type UnderscoreEscapedString = (string & { __escapedIdentifier: void }) | (void & { __escapedIdentifier: void }) | InternalSymbolName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short name is __String. But it's pretty (too?) cute.

@@ -258,7 +259,7 @@ namespace ts {
case SyntaxKind.ExportDeclaration:
return "__export";
case SyntaxKind.ExportAssignment:
return (<ExportAssignment>node).isExportEquals ? "export=" : "default";
return ((<ExportAssignment>node).isExportEquals ? "export=" : "default");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some extra parens got left in after their associated casts were removed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Properties prefixed with double-underscore cannot be accessed in mapped type mappings
4 participants