New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Regex-validated string type #6579

Open
tenry92 opened this Issue Jan 22, 2016 · 42 comments

Comments

Projects
None yet
@tenry92

tenry92 commented Jan 22, 2016

There are cases, where a property can not just be any string (or a set of strings), but needs to match a pattern.

let fontStyle: 'normal' | 'italic' = 'normal'; // already available in master
let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i = '#000'; // my suggestion

It's common practice in JavaScript to store color values in css notation, such as in the css style reflection of DOM nodes or various 3rd party libraries.

What do you think?

@DanielRosenwasser

This comment has been minimized.

Show comment
Hide comment
@DanielRosenwasser

DanielRosenwasser Jan 24, 2016

Member

Yeah, I've seen this combing through DefinitelyTyped, . Even we could use something like this with ScriptElementKind in the services layer, where we'd ideally be able to describe these as a comma-separated list of specific strings.

The main problems are:

  • It's not clear how to compose these well. If I want a comma-separated list of "cat", "dog", and "fish", then I need to write something like /dog|cat|fish(,(dog|cat|fish))*/.
    • If I already have types describing string literal types for "cat", "dog", and "fish", how do I integrate them into this regex?
    • Clearly there's repetition here, which is undesirable. Perhaps fixing the previous issue would make this easier.
  • Non-standard extensions make this sort of iffy.
Member

DanielRosenwasser commented Jan 24, 2016

Yeah, I've seen this combing through DefinitelyTyped, . Even we could use something like this with ScriptElementKind in the services layer, where we'd ideally be able to describe these as a comma-separated list of specific strings.

The main problems are:

  • It's not clear how to compose these well. If I want a comma-separated list of "cat", "dog", and "fish", then I need to write something like /dog|cat|fish(,(dog|cat|fish))*/.
    • If I already have types describing string literal types for "cat", "dog", and "fish", how do I integrate them into this regex?
    • Clearly there's repetition here, which is undesirable. Perhaps fixing the previous issue would make this easier.
  • Non-standard extensions make this sort of iffy.
@zackarychapple

This comment has been minimized.

Show comment
Hide comment
@zackarychapple

zackarychapple Apr 13, 2016

Huge +1 on this, ZipCode, SSN, ONet, many other use cases for this.

zackarychapple commented Apr 13, 2016

Huge +1 on this, ZipCode, SSN, ONet, many other use cases for this.

@radziksh

This comment has been minimized.

Show comment
Hide comment
@radziksh

radziksh May 11, 2016

I faced the same problem, and I see that it is not implemented yet, maybe this workaround will be helpful:
http://stackoverflow.com/questions/37144672/guid-uuid-type-in-typescript

radziksh commented May 11, 2016

I faced the same problem, and I see that it is not implemented yet, maybe this workaround will be helpful:
http://stackoverflow.com/questions/37144672/guid-uuid-type-in-typescript

@rylphs

This comment has been minimized.

Show comment
Hide comment
@rylphs

rylphs May 18, 2016

As @mhegazy suggested I will put my sugggestion (#8665) here. What about allow simple validation functions in type declarations? Something like that:

type Integer(n:number) => String(n).macth(/^[0-9]+$/)
let x:Integer = 3 //OK
let y:Integer = 3.6 //wrong

type ColorLevel(n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

type Hex(n:string) => n.match(/^([0-9]|[A-F])+$/)
let hexValue:Hex = "F6A5" //OK
let wrongHexValue:Hex = "F6AZ5" //wrong

The value that the type can accept would be determined by the function parameter type and by the function evaluation itself. That would solve #7982 also.

rylphs commented May 18, 2016

As @mhegazy suggested I will put my sugggestion (#8665) here. What about allow simple validation functions in type declarations? Something like that:

type Integer(n:number) => String(n).macth(/^[0-9]+$/)
let x:Integer = 3 //OK
let y:Integer = 3.6 //wrong

type ColorLevel(n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

type Hex(n:string) => n.match(/^([0-9]|[A-F])+$/)
let hexValue:Hex = "F6A5" //OK
let wrongHexValue:Hex = "F6AZ5" //wrong

The value that the type can accept would be determined by the function parameter type and by the function evaluation itself. That would solve #7982 also.

@AlexLeung

This comment has been minimized.

Show comment
Hide comment
@AlexLeung

AlexLeung Jul 23, 2016

@rylphs +1 this would make TypeScript extremely powerful

AlexLeung commented Jul 23, 2016

@rylphs +1 this would make TypeScript extremely powerful

@maiermic

This comment has been minimized.

Show comment
Hide comment
@maiermic

maiermic Aug 30, 2016

How does subtyping work with regex-validated string types?

let a: RegExType_1
let b: RegExType_2

a = b // Is this allowed? Is RegExType_2 subtype of RegExType_1?
b = a // Is this allowed? Is RegExType_1 subtype of RegExType_2?

where RegExType_1 and RegExType_2 are regex-validated string types.

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

maiermic commented Aug 30, 2016

How does subtyping work with regex-validated string types?

let a: RegExType_1
let b: RegExType_2

a = b // Is this allowed? Is RegExType_2 subtype of RegExType_1?
b = a // Is this allowed? Is RegExType_1 subtype of RegExType_2?

where RegExType_1 and RegExType_2 are regex-validated string types.

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

@basarat

This comment has been minimized.

Show comment
Hide comment
@basarat

basarat Oct 17, 2016

Contributor

Would also help with TypeStyle : typestyle/typestyle#5 🌹

Contributor

basarat commented Oct 17, 2016

Would also help with TypeStyle : typestyle/typestyle#5 🌹

@DanielRosenwasser

This comment has been minimized.

Show comment
Hide comment
@DanielRosenwasser

DanielRosenwasser Oct 17, 2016

Member

In JSX, @RyanCavanaugh and I've seen people add aria- (and potentially data-) attributes. Someone actually added a string index signature in DefinitelyTyped as a catch-all. A new index signature for this would have be helpful.

interface IntrinsicElements {
    // ....
    [attributeName: /aria-\w+/]: number | string | boolean;
}
Member

DanielRosenwasser commented Oct 17, 2016

In JSX, @RyanCavanaugh and I've seen people add aria- (and potentially data-) attributes. Someone actually added a string index signature in DefinitelyTyped as a catch-all. A new index signature for this would have be helpful.

interface IntrinsicElements {
    // ....
    [attributeName: /aria-\w+/]: number | string | boolean;
}
@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Nov 18, 2016

Design Proposal

There are a lot of cases when developers need more specified value then just a string, but can't enumerate them as union of simple string literals e.g. css colors, emails, phone numbers, ZipCode, swagger extensions etc. Even json schema specification which commonly used for describing schema of JSON object has pattern and patternProperties that in terms of TS type system could be called regex-validated string type and regex-validated string type of index.

Goals

Provide developers with type system that is one step closer to JSON Schema, that commonly used by them and also prevent them from forgetting about string validation checks when needed.

Syntactic overview

Implementation of this feature consists of 4 parts:

Regex validated type

type CssColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;

Regex-validated variable type

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;

and the same, but more readable

let fontColor: CssColor;

Regex-validated variable type of index

interface UsersCollection {
    [email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i]: User;
}

and the same, but more readable

interface UsersCollection {
    [email: Email]: User;
}

Type guard for variable type

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color)) {
        fontColor = color;// correct
    }
}

and same

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color))) return;
    fontColor = color;// correct
}

and using defined type for better readability

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (CssColor.test(color)) {
        fontColor = color;// correct
    }
}

same as

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(CssColor.test(color))) return;
    fontColor = color;// correct
}

Type gurard for index type

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email))) return;
    collection[email];// type is User
}

and using defined type for better readability

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(Email.test(email))) return;
    collection[email];// type is User
}

Semantic overview

Assignments

let email: Email;
let gmail: Gmail;
email = 'test@example.com';// correct
email = 'test@gmail.com';// correct
gmail = 'test@example.com';// compile time error
gmail = 'test@gmail.com';// correct
gmail = email;// obviously compile time error
email = gmail;// unfortunately compile time error too

Unfortunately we can't check is one regex is subtype of another without hard performance impact due to this article. So it should be restricted. But there are next workarounds:

// explicit cast
gmail = <Gmail>email;// correct
// type guard
if (Gmail.test(email)) {
    gmail = email;// correct
}
// another regex subtype declaration
type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
gmail = email;// correct

Unfortunately assigning of string variable to regex-validated variable should also be restricted, because there is no guaranty in compile time that it will match regex.

let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
email = someEmail;// compile time error
gmail = someGmail;// compile time error

But we are able to use explicit cast or type guards as shown here. Second is recommended.
Luckily it's not a case for string literals, because while using them we ARE able to check that its value matches regex.

let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
email = someEmail;// correct
gmail = someGmail;// correct

Type narrowing for indexes

For simple cases of regex-validated type of index see Type gurard for index type.
But there could be more complicated cases:

type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
interface UsersCollection {
    [email: Email]: User;
    [gmail: Gmail]: GmailUser;
}
let collection: UsersCollection;
let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
collection['test@example.com'];// type is User
collection['test@gmail.com'];// type is User & GmailUser
collection[someEmail];// unfortunately type is any
collection[someGmail];// unfortunately type is any
// explicit cast is still an unsafe workaround
collection[<Email> someEmail];// type is User
collection[<Gmail> someGmail];// type is GmailUser
collection[<Email & Gmail> someGmail];// type is User & GmailUser

Literals haven't such problem:

let collection: UsersCollection;
let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
collection[someEmail];// type is User
collection[someGmail];// type is User & GmailUser

But for variables the best option is using type guards as in next more realistic examples:

getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is GmailUser
    }
}

But if we'll use better definition for Gmail type it would have another type narrowing:

type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is User & GmailUser
    }
}

Unions and intersections

Actually common types and regex-validated types are really different, so we need rules how correclty handle their unions and intersections.

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;// correct
type test_2 = Regex_1 & Regex_2;// correct
type test_3 = Regex_1 | NonRegex;// correct
type test_4 = Regex_1 & NonRegex;// compile time error
if (test_1.test(something)) {
    something;// type is test_1
    // something matches Regex_1 OR Regex_2
}
if (test_2.test(something)) {
    something;// type is test_2
    // something matches Regex_1 AND Regex_2
}
if (test_3.test(something)) {
    something;// type is Regex_1
} else {
    something;// type is NonRegex
}

Generics

There are no special cases for generics, so regex-validated type could be used with generics in same way as usual types.
For generics with constraints like below, regex-validated type behaves like string:

class Something<T extends String> { ... }
let something = new Something<Email>();// correct

Emit overview

Unlike usual types, regex-validated have some impact on emit:

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;
type test_2 = Regex_1 & Regex_2;
type test_3 = Regex_1 | NonRegex;
type test_4 = Regex_1 & NonRegex;
if (test_1.test(something)) {
    /* ... */
}
if (test_2.test(something)) {
    /* ... */
}
if (test_3.test(something)) {
    /* ... */
} else {
    /* ... */
}

will compile to:

var Regex_1 = / ... /;
var Regex_2 = / ... /;
if (Regex_1.test(something) || Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something) && Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something)) {
    /* ... */
} else {
    /* ... */
}

Compatibility overview

This feature has no issues with compatibility, because there only case that could break it and it is related to that regex-validated type has emit impact unlike usual type, so this is valid TS code:

type someType = { ... };
var someType = { ... };

when code below is not:

type someRegex = / ... /;
var someRegex = { ... };

But second already WAS invalid, but due to another reason (type declaration was wrong).
So now we have to restrict declaring of variable with name same to type, in case when this type is regex-validated.

P.S.

Feel free to point on things that I probably have missed. If you like this proposal, I could try to create tests that covers it and add them as PR.

Igmat commented Nov 18, 2016

Design Proposal

There are a lot of cases when developers need more specified value then just a string, but can't enumerate them as union of simple string literals e.g. css colors, emails, phone numbers, ZipCode, swagger extensions etc. Even json schema specification which commonly used for describing schema of JSON object has pattern and patternProperties that in terms of TS type system could be called regex-validated string type and regex-validated string type of index.

Goals

Provide developers with type system that is one step closer to JSON Schema, that commonly used by them and also prevent them from forgetting about string validation checks when needed.

Syntactic overview

Implementation of this feature consists of 4 parts:

Regex validated type

type CssColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;

Regex-validated variable type

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;

and the same, but more readable

let fontColor: CssColor;

Regex-validated variable type of index

interface UsersCollection {
    [email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i]: User;
}

and the same, but more readable

interface UsersCollection {
    [email: Email]: User;
}

Type guard for variable type

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color)) {
        fontColor = color;// correct
    }
}

and same

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color))) return;
    fontColor = color;// correct
}

and using defined type for better readability

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (CssColor.test(color)) {
        fontColor = color;// correct
    }
}

same as

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(CssColor.test(color))) return;
    fontColor = color;// correct
}

Type gurard for index type

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email))) return;
    collection[email];// type is User
}

and using defined type for better readability

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(Email.test(email))) return;
    collection[email];// type is User
}

Semantic overview

Assignments

let email: Email;
let gmail: Gmail;
email = 'test@example.com';// correct
email = 'test@gmail.com';// correct
gmail = 'test@example.com';// compile time error
gmail = 'test@gmail.com';// correct
gmail = email;// obviously compile time error
email = gmail;// unfortunately compile time error too

Unfortunately we can't check is one regex is subtype of another without hard performance impact due to this article. So it should be restricted. But there are next workarounds:

// explicit cast
gmail = <Gmail>email;// correct
// type guard
if (Gmail.test(email)) {
    gmail = email;// correct
}
// another regex subtype declaration
type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
gmail = email;// correct

Unfortunately assigning of string variable to regex-validated variable should also be restricted, because there is no guaranty in compile time that it will match regex.

let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
email = someEmail;// compile time error
gmail = someGmail;// compile time error

But we are able to use explicit cast or type guards as shown here. Second is recommended.
Luckily it's not a case for string literals, because while using them we ARE able to check that its value matches regex.

let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
email = someEmail;// correct
gmail = someGmail;// correct

Type narrowing for indexes

For simple cases of regex-validated type of index see Type gurard for index type.
But there could be more complicated cases:

type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
interface UsersCollection {
    [email: Email]: User;
    [gmail: Gmail]: GmailUser;
}
let collection: UsersCollection;
let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
collection['test@example.com'];// type is User
collection['test@gmail.com'];// type is User & GmailUser
collection[someEmail];// unfortunately type is any
collection[someGmail];// unfortunately type is any
// explicit cast is still an unsafe workaround
collection[<Email> someEmail];// type is User
collection[<Gmail> someGmail];// type is GmailUser
collection[<Email & Gmail> someGmail];// type is User & GmailUser

Literals haven't such problem:

let collection: UsersCollection;
let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
collection[someEmail];// type is User
collection[someGmail];// type is User & GmailUser

But for variables the best option is using type guards as in next more realistic examples:

getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is GmailUser
    }
}

But if we'll use better definition for Gmail type it would have another type narrowing:

type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is User & GmailUser
    }
}

Unions and intersections

Actually common types and regex-validated types are really different, so we need rules how correclty handle their unions and intersections.

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;// correct
type test_2 = Regex_1 & Regex_2;// correct
type test_3 = Regex_1 | NonRegex;// correct
type test_4 = Regex_1 & NonRegex;// compile time error
if (test_1.test(something)) {
    something;// type is test_1
    // something matches Regex_1 OR Regex_2
}
if (test_2.test(something)) {
    something;// type is test_2
    // something matches Regex_1 AND Regex_2
}
if (test_3.test(something)) {
    something;// type is Regex_1
} else {
    something;// type is NonRegex
}

Generics

There are no special cases for generics, so regex-validated type could be used with generics in same way as usual types.
For generics with constraints like below, regex-validated type behaves like string:

class Something<T extends String> { ... }
let something = new Something<Email>();// correct

Emit overview

Unlike usual types, regex-validated have some impact on emit:

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;
type test_2 = Regex_1 & Regex_2;
type test_3 = Regex_1 | NonRegex;
type test_4 = Regex_1 & NonRegex;
if (test_1.test(something)) {
    /* ... */
}
if (test_2.test(something)) {
    /* ... */
}
if (test_3.test(something)) {
    /* ... */
} else {
    /* ... */
}

will compile to:

var Regex_1 = / ... /;
var Regex_2 = / ... /;
if (Regex_1.test(something) || Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something) && Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something)) {
    /* ... */
} else {
    /* ... */
}

Compatibility overview

This feature has no issues with compatibility, because there only case that could break it and it is related to that regex-validated type has emit impact unlike usual type, so this is valid TS code:

type someType = { ... };
var someType = { ... };

when code below is not:

type someRegex = / ... /;
var someRegex = { ... };

But second already WAS invalid, but due to another reason (type declaration was wrong).
So now we have to restrict declaring of variable with name same to type, in case when this type is regex-validated.

P.S.

Feel free to point on things that I probably have missed. If you like this proposal, I could try to create tests that covers it and add them as PR.

Igmat added a commit to Igmat/TypeScript that referenced this issue Nov 18, 2016

Igmat added a commit to Igmat/TypeScript that referenced this issue Nov 21, 2016

Igmat added a commit to Igmat/TypeScript that referenced this issue Nov 21, 2016

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Nov 21, 2016

I've forgotten to point to some cases for intersections and unions of regex-validated types, but I've described them in latest test case. Should I update Design proposal to reflect that minor change?

Igmat commented Nov 21, 2016

I've forgotten to point to some cases for intersections and unions of regex-validated types, but I've described them in latest test case. Should I update Design proposal to reflect that minor change?

Igmat added a commit to Igmat/TypeScript that referenced this issue Nov 22, 2016

@alexanderbird

This comment has been minimized.

Show comment
Hide comment
@alexanderbird

alexanderbird Nov 29, 2016

@Igmat, question about your design proposal: Could you elaborate on the emit overview? Why would regex-validated types need to be emitted? As far as I can tell, other types don't support runtime checks... am I missing something?

alexanderbird commented Nov 29, 2016

@Igmat, question about your design proposal: Could you elaborate on the emit overview? Why would regex-validated types need to be emitted? As far as I can tell, other types don't support runtime checks... am I missing something?

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Nov 30, 2016

@alexanderbird, yes, any other type have no impact on emit. At first, I thought that regex-validated will do so as well, so I've started creating the proposal and playing with proposed syntax.
First approach was like this:

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
fontColor = "#000";

and this:

type CssColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let fontColor: CssColor;
fontColor = "#000";

It's ok and has no need for emit changes, because "#000" could be checked in compile time.
But we also have to handle narrowing from string to regex-validated type in order to make it useful. So I've thought about this for both previous setups:

let someString: string;
if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(someString)) {
    fontColor = someString; // Ok
}
fontColor = someString; // compile time error

So it also has no impact on emit and looks ok, except that regex isn't very readable and have to be copied in all places, so user could easily make a mistake. But in this particular case it still seems to be better than changing how type works.
But then I realized that this stuff:

let someString: string;
let email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/I;
if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(someString)) {
    email = someString; // Ok
}
email = someString; // compile time error

is a nightmare. And it's even without intersections and unions. So to avoid happening of stuff like this, we have to slightly change type emit as shown in proposal.

Igmat commented Nov 30, 2016

@alexanderbird, yes, any other type have no impact on emit. At first, I thought that regex-validated will do so as well, so I've started creating the proposal and playing with proposed syntax.
First approach was like this:

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
fontColor = "#000";

and this:

type CssColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let fontColor: CssColor;
fontColor = "#000";

It's ok and has no need for emit changes, because "#000" could be checked in compile time.
But we also have to handle narrowing from string to regex-validated type in order to make it useful. So I've thought about this for both previous setups:

let someString: string;
if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(someString)) {
    fontColor = someString; // Ok
}
fontColor = someString; // compile time error

So it also has no impact on emit and looks ok, except that regex isn't very readable and have to be copied in all places, so user could easily make a mistake. But in this particular case it still seems to be better than changing how type works.
But then I realized that this stuff:

let someString: string;
let email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/I;
if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(someString)) {
    email = someString; // Ok
}
email = someString; // compile time error

is a nightmare. And it's even without intersections and unions. So to avoid happening of stuff like this, we have to slightly change type emit as shown in proposal.

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Nov 30, 2016

@DanielRosenwasser, could you, please, provide some feedback for this proposal? And also for tests referenced here, if possible?
I really want to help with implementing of this feature, but it requires a lot of time (tsc is really complicated project and I still have to work on understanding of how it works inside) and I don't know is this proposal is ready to implement or you will reject this feature implemented in this way due to another language design vision or any other reason.

Igmat commented Nov 30, 2016

@DanielRosenwasser, could you, please, provide some feedback for this proposal? And also for tests referenced here, if possible?
I really want to help with implementing of this feature, but it requires a lot of time (tsc is really complicated project and I still have to work on understanding of how it works inside) and I don't know is this proposal is ready to implement or you will reject this feature implemented in this way due to another language design vision or any other reason.

@DanielRosenwasser

This comment has been minimized.

Show comment
Hide comment
@DanielRosenwasser

DanielRosenwasser Nov 30, 2016

Member

Hey @Igmat, I think there are a few things I should have initially asked about

To start, I still don't understand why you need any sort of change to emit, and I don't think any sort of emit based on types would be acceptable. Check out our non-goals here.

Another issue I should have brought up is the problem of regular expressions that use backreferences. My understanding (and experience) is that backreferences in a regular expression can force a test to run in time exponential to its input. Is this a corner case? Perhaps, but it's something I'd prefer to avoid in general. This is especially important given that in editor scenarios, a type-check at a location should take a minimal amount of time.

Another issue is that we'd need to either rely on the engine that the TypeScript compiler runs on, or build a custom regular expression engine to execute these things. For instance, TC39 is moving to include a new s flag so that . can match newlines. There would be a discrepancy between ESXXXX and older runtimes that support this.

Member

DanielRosenwasser commented Nov 30, 2016

Hey @Igmat, I think there are a few things I should have initially asked about

To start, I still don't understand why you need any sort of change to emit, and I don't think any sort of emit based on types would be acceptable. Check out our non-goals here.

Another issue I should have brought up is the problem of regular expressions that use backreferences. My understanding (and experience) is that backreferences in a regular expression can force a test to run in time exponential to its input. Is this a corner case? Perhaps, but it's something I'd prefer to avoid in general. This is especially important given that in editor scenarios, a type-check at a location should take a minimal amount of time.

Another issue is that we'd need to either rely on the engine that the TypeScript compiler runs on, or build a custom regular expression engine to execute these things. For instance, TC39 is moving to include a new s flag so that . can match newlines. There would be a discrepancy between ESXXXX and older runtimes that support this.

@alexanderbird

This comment has been minimized.

Show comment
Hide comment
@alexanderbird

alexanderbird Nov 30, 2016

@Igmat - there is no question in my mind that having regexes emitted at runtime would be useful. However, I don't think they're necessary for this feature to be useful (and from the sounds of what @DanielRosenwasser has said, it probably wouldn't get approved anyway). You said

But we also have to handle narrowing from string to regex-validated type in order to make it useful

I think this is only the case if we are to narrow from a dynamic string to a regex-validated type. This gets very complicated. Even in this simple case:

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

We can't be sure that the types will match - what if the number is negative? And as the regexes get more complicated, it just gets messier and messier. If we really wanted this, maybe we allow "type interpolation: type Baz = /prefix:{number}/... but I don't know if it's worth going there.

Instead, we could get partway to the goal if we only allowed string literals to be assigned to regex-validated types.

Consider the following:

type Color = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let foo: Color = '#000000';
let bar: Color = '#0000'; // Error - string literal '#0000' is not assignable to type 'Color'; '#0000' does not match /^#([0-9a-f]{3}|[0-9a-f]{6})$/i
let baz: Color = '#' + config.userColorChoice; // Error - type 'string' is not assignable to type 'regex-validated-string'

Do you think that's a workable alternative?

alexanderbird commented Nov 30, 2016

@Igmat - there is no question in my mind that having regexes emitted at runtime would be useful. However, I don't think they're necessary for this feature to be useful (and from the sounds of what @DanielRosenwasser has said, it probably wouldn't get approved anyway). You said

But we also have to handle narrowing from string to regex-validated type in order to make it useful

I think this is only the case if we are to narrow from a dynamic string to a regex-validated type. This gets very complicated. Even in this simple case:

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

We can't be sure that the types will match - what if the number is negative? And as the regexes get more complicated, it just gets messier and messier. If we really wanted this, maybe we allow "type interpolation: type Baz = /prefix:{number}/... but I don't know if it's worth going there.

Instead, we could get partway to the goal if we only allowed string literals to be assigned to regex-validated types.

Consider the following:

type Color = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let foo: Color = '#000000';
let bar: Color = '#0000'; // Error - string literal '#0000' is not assignable to type 'Color'; '#0000' does not match /^#([0-9a-f]{3}|[0-9a-f]{6})$/i
let baz: Color = '#' + config.userColorChoice; // Error - type 'string' is not assignable to type 'regex-validated-string'

Do you think that's a workable alternative?

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Dec 1, 2016

@DanielRosenwasser, I've read Design Goals carefully and, if I understand you correctly, problem is violation of Non-goals#5.
But it doesn't seem to me as violation, but as syntax improvement. For example, previously we had:

const emailRegex = /.../;
/**
 * assign it only with values tested to emailRegex 
 */
let email: string;
let userInput: string;
// somehow get user input
if (emailRegex.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

With this proposal implemented it would look like:

type Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

As you see, code is almost the same - it's a common simple usage of regex. But second case is much more expressive and will prevent user from accidental mistake, like forgetting to check string before assignment it to variable that meant to be regex-validated.
Second thing is that without such type narrowing we won't be able to normally use regex-validated type in indexes, because in most cases such index fields works with some variable that can't be checked in runtime as it could be done with literals.

Igmat commented Dec 1, 2016

@DanielRosenwasser, I've read Design Goals carefully and, if I understand you correctly, problem is violation of Non-goals#5.
But it doesn't seem to me as violation, but as syntax improvement. For example, previously we had:

const emailRegex = /.../;
/**
 * assign it only with values tested to emailRegex 
 */
let email: string;
let userInput: string;
// somehow get user input
if (emailRegex.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

With this proposal implemented it would look like:

type Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

As you see, code is almost the same - it's a common simple usage of regex. But second case is much more expressive and will prevent user from accidental mistake, like forgetting to check string before assignment it to variable that meant to be regex-validated.
Second thing is that without such type narrowing we won't be able to normally use regex-validated type in indexes, because in most cases such index fields works with some variable that can't be checked in runtime as it could be done with literals.

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Dec 1, 2016

@alexanderbird, I don't suggest making this code valid or add some hidden checks in both runtime and compile time.

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

This code have to throw error due to my proposal. But this:

function foo(bar: number) {
    let baz: /prefix:\d+/ = ('prefix:' + number) as /prefix:\d+/;
}

or this:

function foo(bar: number) {
    let baz: /prefix:\d+/;
    let possibleBaz: string = 'prefix:' + number;
    if (/prefix:\d+/.test(possibleBaz)) {
        baz = possibleBaz;
    }
}

would be correct, and even have no impact to emitted code.

And as I showed in previous comment, literals would be definitely not enough even for common use cases, because we often have to work with stings from user input or other sources. Without implementing of this emit impact, users would have to work with this type in next way:

export type Email = /.../;
export const Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = <Email>userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

or for intersections:

export type Email = /email-regex/;
export const Email = /email-regex/;
export type Gmail = Email & /gmail-regex/;
export const Gmail = {
    test: (input: string) => Email.test(input) && /gmail-regex/.test(input)
};
let gmail: Gmail;
let userInput: string;
// somehow get user input
if (Gmail.test(userInput)) {
    gmail = <Gmail>userInput;
} else {
    console.log('User provided invalid gmail. Showing validation error');
    // Some code for validation error
}

I don't think that forcing users to duplicate code and to use explicit cast, when it could be easily handled by compiler isn't a good way to go. Emit impact is really very small and predictable, I'm sure that it won't surprise users or lead to some feature misunderstood or hard to locate bugs, while implementing this feature without emit changes definitely WILL.

In conclusion I want to say that in simple terms regex-validated type is both a scoped variable and a compiler type.

Igmat commented Dec 1, 2016

@alexanderbird, I don't suggest making this code valid or add some hidden checks in both runtime and compile time.

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

This code have to throw error due to my proposal. But this:

function foo(bar: number) {
    let baz: /prefix:\d+/ = ('prefix:' + number) as /prefix:\d+/;
}

or this:

function foo(bar: number) {
    let baz: /prefix:\d+/;
    let possibleBaz: string = 'prefix:' + number;
    if (/prefix:\d+/.test(possibleBaz)) {
        baz = possibleBaz;
    }
}

would be correct, and even have no impact to emitted code.

And as I showed in previous comment, literals would be definitely not enough even for common use cases, because we often have to work with stings from user input or other sources. Without implementing of this emit impact, users would have to work with this type in next way:

export type Email = /.../;
export const Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = <Email>userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

or for intersections:

export type Email = /email-regex/;
export const Email = /email-regex/;
export type Gmail = Email & /gmail-regex/;
export const Gmail = {
    test: (input: string) => Email.test(input) && /gmail-regex/.test(input)
};
let gmail: Gmail;
let userInput: string;
// somehow get user input
if (Gmail.test(userInput)) {
    gmail = <Gmail>userInput;
} else {
    console.log('User provided invalid gmail. Showing validation error');
    // Some code for validation error
}

I don't think that forcing users to duplicate code and to use explicit cast, when it could be easily handled by compiler isn't a good way to go. Emit impact is really very small and predictable, I'm sure that it won't surprise users or lead to some feature misunderstood or hard to locate bugs, while implementing this feature without emit changes definitely WILL.

In conclusion I want to say that in simple terms regex-validated type is both a scoped variable and a compiler type.

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Dec 1, 2016

@DanielRosenwasser and @alexanderbird ok, I have one more idea for that. What about syntax like this:

const type Email = /email-regex/;

In this case user have to explicitly define that he/she want this as both type and const, so actual type system has no emit changes unless it used with such modifier. But if it used with it we are still able to avoid a lot of mistakes, casts and duplication of code by adding same emit as for:

const Email = /email-regex/;

This seems to be even bigger than just improvement for this proposal, because this probably could allow something like this (example is from project with Redux):

export type SOME_ACTION = 'SOME_ACTION';
export const SOME_ACTION = 'SOME_ACTION' as SOME_ACTION;

being converted to

export const type SOME_ACTION = 'SOME_ACTION';

I've tried to found some similar suggestion but wasn't successful. If it could be a workaround and if you like such idea, I can prepare Design Proposal and tests for it.

Igmat commented Dec 1, 2016

@DanielRosenwasser and @alexanderbird ok, I have one more idea for that. What about syntax like this:

const type Email = /email-regex/;

In this case user have to explicitly define that he/she want this as both type and const, so actual type system has no emit changes unless it used with such modifier. But if it used with it we are still able to avoid a lot of mistakes, casts and duplication of code by adding same emit as for:

const Email = /email-regex/;

This seems to be even bigger than just improvement for this proposal, because this probably could allow something like this (example is from project with Redux):

export type SOME_ACTION = 'SOME_ACTION';
export const SOME_ACTION = 'SOME_ACTION' as SOME_ACTION;

being converted to

export const type SOME_ACTION = 'SOME_ACTION';

I've tried to found some similar suggestion but wasn't successful. If it could be a workaround and if you like such idea, I can prepare Design Proposal and tests for it.

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Dec 1, 2016

@DanielRosenwasser, about your second issue - I don't think that it would ever happen, because in my suggestion compiler runs regex only for literals and it doesn't seems that someone will do something like this:

let something: /some-regex-with-backreferences/ = `
long enough string to make regex.test significantly affect performance
`

Anyway we could test how long literal should be for affecting real-time performance and create some heuristic that will warn user if we are unable to check it while he faces this circumstances in some editor scenarios, but we would check it when he will compile the project. Or there could be some other workarounds.

About third question, I'm not sure that understand everything correctly, but it seems that regex engine should be selected depending on target from tsconfig if they have different implementations. Needs some more investigation.

Igmat commented Dec 1, 2016

@DanielRosenwasser, about your second issue - I don't think that it would ever happen, because in my suggestion compiler runs regex only for literals and it doesn't seems that someone will do something like this:

let something: /some-regex-with-backreferences/ = `
long enough string to make regex.test significantly affect performance
`

Anyway we could test how long literal should be for affecting real-time performance and create some heuristic that will warn user if we are unable to check it while he faces this circumstances in some editor scenarios, but we would check it when he will compile the project. Or there could be some other workarounds.

About third question, I'm not sure that understand everything correctly, but it seems that regex engine should be selected depending on target from tsconfig if they have different implementations. Needs some more investigation.

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Dec 6, 2016

@DanielRosenwasser are there any thoughts? 😄 About initial proposal and about last one. May be I have to make more detailed overview of second one, do I?

Igmat commented Dec 6, 2016

@DanielRosenwasser are there any thoughts? 😄 About initial proposal and about last one. May be I have to make more detailed overview of second one, do I?

@streamich

This comment has been minimized.

Show comment
Hide comment
@streamich

streamich Oct 11, 2017

Allow string types to be regular expressions /#[0-9]{6}/ and allow nesting types into regular expressions ${TColor}:

type TColor = 'red' | 'blue' | /#[0-9]{6}/;
type TBorderValue = /[0-9]+px (solid|dashed) ${TColor}/

Result:

let border1: TBorderValue = '1px solid red'; // OK
let border2: TBorderValue = '1px solid yellow'; // TSError: .....

Use case: there is a library dedicated to writing "type-safe" CSS styles in TypeScript typestyle. The above proposed functionality would help greatly, because the library has to expose methods to be used at runtime, the proposed string regex types would instead be able type check code at compile time and give developers great intellisense.

streamich commented Oct 11, 2017

Allow string types to be regular expressions /#[0-9]{6}/ and allow nesting types into regular expressions ${TColor}:

type TColor = 'red' | 'blue' | /#[0-9]{6}/;
type TBorderValue = /[0-9]+px (solid|dashed) ${TColor}/

Result:

let border1: TBorderValue = '1px solid red'; // OK
let border2: TBorderValue = '1px solid yellow'; // TSError: .....

Use case: there is a library dedicated to writing "type-safe" CSS styles in TypeScript typestyle. The above proposed functionality would help greatly, because the library has to expose methods to be used at runtime, the proposed string regex types would instead be able type check code at compile time and give developers great intellisense.

@skbergam

This comment has been minimized.

Show comment
Hide comment
@skbergam

skbergam Oct 24, 2017

@DanielRosenwasser @alexanderbird @Igmat: IMO this proposal would be game-changing for TypeScript and web development. What is currently stopping it from being implemented?

I agree that extendability and type emission should not get in the way of the rest of the feature. If there's not a clear path on those aspects, implement them later when there is.

skbergam commented Oct 24, 2017

@DanielRosenwasser @alexanderbird @Igmat: IMO this proposal would be game-changing for TypeScript and web development. What is currently stopping it from being implemented?

I agree that extendability and type emission should not get in the way of the rest of the feature. If there's not a clear path on those aspects, implement them later when there is.

@nolazybits

This comment has been minimized.

Show comment
Hide comment
@nolazybits

nolazybits Nov 2, 2017

I arrived here as I am looking to have a UUID type and not a string, hence having a regex defining the string would be awesome in this case + a way to check validity of the type (Email.test example) would be also helpful.

nolazybits commented Nov 2, 2017

I arrived here as I am looking to have a UUID type and not a string, hence having a regex defining the string would be awesome in this case + a way to check validity of the type (Email.test example) would be also helpful.

@Igmat

This comment has been minimized.

Show comment
Hide comment
@Igmat

Igmat Nov 3, 2017

@skbergam I'm trying to implement it by myself once more. But TS project is really huge and I also have a work, so there are nearly no progress (I've managed only to create tests for this new feature). If somebody has more experience with extending TS any help would be greatly appreciated...

Igmat commented Nov 3, 2017

@skbergam I'm trying to implement it by myself once more. But TS project is really huge and I also have a work, so there are nearly no progress (I've managed only to create tests for this new feature). If somebody has more experience with extending TS any help would be greatly appreciated...

@RyanCavanaugh

This comment has been minimized.

Show comment
Hide comment
@RyanCavanaugh

RyanCavanaugh Jan 5, 2018

Member

Interesting that this effectively creates a nominal type, since we'd be unable to establish any subtype/assignability relationships between any two non-identical regexps

Member

RyanCavanaugh commented Jan 5, 2018

Interesting that this effectively creates a nominal type, since we'd be unable to establish any subtype/assignability relationships between any two non-identical regexps

@simonbuchan

This comment has been minimized.

Show comment
Hide comment
@simonbuchan

simonbuchan Jan 18, 2018

@RyanCavanaugh Earlier @maiermic commented

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

But that might not be good enough? One certainly hopes there is not to many regexp relationships, but you never know.

Regarding type checks, if we don't like duplicating regexps, and typeof a const isn't good enough (e.g. .d.ts files), how does TS feel about valueof e, which emits the literal value of e iff e is a literal, otherwise an error (and emits something like undefined)?

simonbuchan commented Jan 18, 2018

@RyanCavanaugh Earlier @maiermic commented

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

But that might not be good enough? One certainly hopes there is not to many regexp relationships, but you never know.

Regarding type checks, if we don't like duplicating regexps, and typeof a const isn't good enough (e.g. .d.ts files), how does TS feel about valueof e, which emits the literal value of e iff e is a literal, otherwise an error (and emits something like undefined)?

@apm963

This comment has been minimized.

Show comment
Hide comment
@apm963

apm963 Mar 29, 2018

@maxlk Also off-topic but I took your regex and improved it to not match trailing commas on otherwise valid input: /^((dog|cat|fish)(,(?=\b)|$))+$/ with test https://regex101.com/r/AuyP3g/1. This uses a positive lookahead for a word character after the comma, forcing the prior to revalidate in a DRY way.

apm963 commented Mar 29, 2018

@maxlk Also off-topic but I took your regex and improved it to not match trailing commas on otherwise valid input: /^((dog|cat|fish)(,(?=\b)|$))+$/ with test https://regex101.com/r/AuyP3g/1. This uses a positive lookahead for a word character after the comma, forcing the prior to revalidate in a DRY way.

@gtamas

This comment has been minimized.

Show comment
Hide comment
@gtamas

gtamas May 19, 2018

Hi!
What's the status of this?
Will you add this feature in the near future? Can't find anything about this in the roadmap.

gtamas commented May 19, 2018

Hi!
What's the status of this?
Will you add this feature in the near future? Can't find anything about this in the roadmap.

@zspitz

This comment has been minimized.

Show comment
Hide comment
@zspitz

zspitz May 23, 2018

Contributor

@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

Contributor

zspitz commented May 23, 2018

@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

@SrZorro SrZorro referenced this issue Jun 25, 2018

Open

Class support? #3

@AndrewEastwood

This comment has been minimized.

Show comment
Hide comment
@AndrewEastwood

AndrewEastwood Jun 27, 2018

Are those awesome improvements available? Maybe in alpha release at least?

AndrewEastwood commented Jun 27, 2018

Are those awesome improvements available? Maybe in alpha release at least?

@jpike88

This comment has been minimized.

Show comment
Hide comment
@jpike88

jpike88 Jul 6, 2018

regex validated types are great for writing tests, validating hardcoded inputs would be great.

jpike88 commented Jul 6, 2018

regex validated types are great for writing tests, validating hardcoded inputs would be great.

@digeomel

This comment has been minimized.

Show comment
Hide comment
@digeomel

digeomel Aug 29, 2018

+1. Our use case is very common, we need to have a string date format like 'dd/mm/YYYY'.

digeomel commented Aug 29, 2018

+1. Our use case is very common, we need to have a string date format like 'dd/mm/YYYY'.

@aleksey-bykov

This comment has been minimized.

Show comment
Hide comment
@aleksey-bykov

aleksey-bykov Sep 8, 2018

although as proposed it would be an extremely cool feature, it lacks the potential:

  • the output is always a sting (although compile time checked), there is no way to get a structured object
  • grammar is limited to what regexp's can do, and they cannot do much, the problem of regular expressions is that they are regular, their result is a list, not a tree, they cannot express arbitrary long nested levels
  • the parser has to be expressed in terms of typescript grammar, which is limited and not extentable

better way would be outsource parsing and emitting to a plugin, like proposed in #21861, this way all of the above is not a problem at the price of steeper learning curve, but hey! the regexp checking can be implemented atop of that so that the original proposal still stands, coming up by more advanced machinery

so as i said, a more general way would be custom syntax providers for whatever literals: #21861

examples:

const uri: via URIParserAndEmitter = http://google.com; 
console.log(uri); // --> { protocol: 'http', host: 'google.com', path: undefined, query: undefined, hash: undefined }

const a: via PositiveNumberParser = 10; // --> 10
const b: via PositiveNumberParser = -10; // --> error

const date: via DateParser = 1/1/2019; // --> new Date(2019, 1, 1)

aleksey-bykov commented Sep 8, 2018

although as proposed it would be an extremely cool feature, it lacks the potential:

  • the output is always a sting (although compile time checked), there is no way to get a structured object
  • grammar is limited to what regexp's can do, and they cannot do much, the problem of regular expressions is that they are regular, their result is a list, not a tree, they cannot express arbitrary long nested levels
  • the parser has to be expressed in terms of typescript grammar, which is limited and not extentable

better way would be outsource parsing and emitting to a plugin, like proposed in #21861, this way all of the above is not a problem at the price of steeper learning curve, but hey! the regexp checking can be implemented atop of that so that the original proposal still stands, coming up by more advanced machinery

so as i said, a more general way would be custom syntax providers for whatever literals: #21861

examples:

const uri: via URIParserAndEmitter = http://google.com; 
console.log(uri); // --> { protocol: 'http', host: 'google.com', path: undefined, query: undefined, hash: undefined }

const a: via PositiveNumberParser = 10; // --> 10
const b: via PositiveNumberParser = -10; // --> error

const date: via DateParser = 1/1/2019; // --> new Date(2019, 1, 1)

@Akxe

This comment has been minimized.

Show comment
Hide comment
@Akxe

Akxe Sep 25, 2018

@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

@zspitz that would make a lot of people unhappy, as they would see, that it is possible, but forbined for them, basically for their safety.

Are those awesome improvements available? Maybe in alpha release at least?

As far as I know this still need a proposal. @gtamas, @AndrewEastwood

Also I think #11152 would be affecting this.

Akxe commented Sep 25, 2018

@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

@zspitz that would make a lot of people unhappy, as they would see, that it is possible, but forbined for them, basically for their safety.

Are those awesome improvements available? Maybe in alpha release at least?

As far as I know this still need a proposal. @gtamas, @AndrewEastwood

Also I think #11152 would be affecting this.

@Akxe

This comment has been minimized.

Show comment
Hide comment
@Akxe

Akxe Sep 25, 2018

@Igmat Your proposal limits the validation to only be useful with string types. What are your thoughts on @rylphs proposal? This would allow a more generic validation for all primitive types:

type ColorLevel = (n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

I suspect however that extending this mechanism beyond primitives to non-primitive types would be too much.

The main problem I see with this is security concerns, imagine some malicious code, that would use buffers to grab the user's memory while checking for type. We would have to implement a lot of sandboxing around this. I would rather see 2 different solutions, one for strings and one for numbers.

RegExp is immune to that to some extends as the only way you can use this maliciously is to make some backtracking expression. That being said, some users might do it unintentionally, therefore, there should be some kind of protection. I would think the best way to do it would be a timer.

One point, the issue that @DanielRosenwasser raised -- about varying regex engine implementations -- would be magnified: depending on the Javascript engine under which the Typescript compiler is running, the validation function might work differently.

That is true, this is bad, but we can solve it by specifying what "modern" part of regExp we need for our codebase. It would default to normal (is it ES3?) regexp, that works in every node. And option to enable new flags and lookbehind assertions.

const unicodeMatcher = /\u{1d306}/u;
let value: typeof unicodeMatcher;
function(input: string) {
  value = input;  // Invalid
  if (input.match(unicodeMatcher)) {
    value = input;  // OK
  }
}

If a user has disabled flag with advanced flags.

let value: typeof unicodeMatcher = '𝌆';  // Warning, string literal isn't checked, because `variable` is of type `/\u{1d306}/u`.

TypeScript would not evaluate advanced RegExp, if not told to. But I would suggest that is should give warning, explaining what is happening and how to enable advanced RegExp checking.

If user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  // OK

If a user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  
// Error, NodeJS does not support advanced RegExp, upgrade NodeJS to version X.Y.Z, or disable advanced RegExp checking.

I think this is a reasonable way to go about it.
Teams of programmers have usually the same version of NodeJS or are easily able to upgrade since all their codebase is working for someone with a newer version.
Solo programmers can adapt easily on the fly,

Akxe commented Sep 25, 2018

@Igmat Your proposal limits the validation to only be useful with string types. What are your thoughts on @rylphs proposal? This would allow a more generic validation for all primitive types:

type ColorLevel = (n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

I suspect however that extending this mechanism beyond primitives to non-primitive types would be too much.

The main problem I see with this is security concerns, imagine some malicious code, that would use buffers to grab the user's memory while checking for type. We would have to implement a lot of sandboxing around this. I would rather see 2 different solutions, one for strings and one for numbers.

RegExp is immune to that to some extends as the only way you can use this maliciously is to make some backtracking expression. That being said, some users might do it unintentionally, therefore, there should be some kind of protection. I would think the best way to do it would be a timer.

One point, the issue that @DanielRosenwasser raised -- about varying regex engine implementations -- would be magnified: depending on the Javascript engine under which the Typescript compiler is running, the validation function might work differently.

That is true, this is bad, but we can solve it by specifying what "modern" part of regExp we need for our codebase. It would default to normal (is it ES3?) regexp, that works in every node. And option to enable new flags and lookbehind assertions.

const unicodeMatcher = /\u{1d306}/u;
let value: typeof unicodeMatcher;
function(input: string) {
  value = input;  // Invalid
  if (input.match(unicodeMatcher)) {
    value = input;  // OK
  }
}

If a user has disabled flag with advanced flags.

let value: typeof unicodeMatcher = '𝌆';  // Warning, string literal isn't checked, because `variable` is of type `/\u{1d306}/u`.

TypeScript would not evaluate advanced RegExp, if not told to. But I would suggest that is should give warning, explaining what is happening and how to enable advanced RegExp checking.

If user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  // OK

If a user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  
// Error, NodeJS does not support advanced RegExp, upgrade NodeJS to version X.Y.Z, or disable advanced RegExp checking.

I think this is a reasonable way to go about it.
Teams of programmers have usually the same version of NodeJS or are easily able to upgrade since all their codebase is working for someone with a newer version.
Solo programmers can adapt easily on the fly,

amrdraz added a commit to amrdraz/bitwise that referenced this issue Oct 3, 2018

Not much can be done
So generally speaking other than using the slightly shorter array on the Bits and BinaryBits types

Not much can be done regarding the unsigned int types.
Your salvation may come with this proposal should the day come Microsoft/TypeScript#15480

But Typescript will probably not support base uintx types as this discussion ended 
Microsoft/TypeScript#4639
essentially JS is expecting BigInt and other types may be added someday and so no extra primitives will be added until T39 adds them

on a side note this regex proposal should it get implemented would be interesting 
Microsoft/TypeScript#6579

@weswigham weswigham referenced this issue Oct 16, 2018

Open

JSON type #27930

4 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment