Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for user defined literal expressions. #263

Closed
mburbea opened this issue Feb 5, 2015 · 35 comments
Closed

Add support for user defined literal expressions. #263

mburbea opened this issue Feb 5, 2015 · 35 comments

Comments

@mburbea
Copy link

mburbea commented Feb 5, 2015

A user defined literal expression would be defined as a new operator on a class or value type. The literal would be typed without quotes and must end with _TypeName This should help disambiguate it from existing literals.
Only one literal operator can be declared per type and it must return the type in question.
e.g.

public static TypeName operator Suffix(string input)

A literal will be considered a constant expression, and as a result will also be allowed as a const and defaulted expression. Since IL will not view it as such, the compiler will emit an attribute when declared as a const or a defaulted parameter containing the string literal and on usage will emit an invoke to the TypeName.op_suffix(string).

Open questions:
How can the compiler not get confused by the literal? Should spaces and operators be disallowed?
(e.g. is 2015-12-31_MyDate is that going to translate to MyDate.op_suffix("2015-12-31") or will that be confused and thought of as 2015 - 12 - MyDate("31")?
The literals might become very unwieldy if generics are in the mix. Should we allow custom names? If so how to handle disambiguation?
While this is cool, what if I'm feeling very jealous of the VB guys and want a constant datetime how would I go about doing that. Will we be allowed to add them to existing types?
void Scheduler(DateTime start,DateTime cutOff= 2015-12-31T23:59:59Z_DateTime)
That could be really nice.

@gafter
Copy link
Member

gafter commented Feb 5, 2015

This proposal requires a lexical grammar. Please provide one.

@mikedn
Copy link

mikedn commented Feb 5, 2015

Eh, grammar. Grammar is overrated 😄

Now seriously, if you want user defined literals you'd better restrict them to being built from existing literals. As in "2015-12-13"_MyDate. Even if you can somehow sort out the grammar for something like 2015-12-31_MyDate such code would be unnecessarily confusing.

@BrannonKing
Copy link

What is the use case for this proposal? What problem is it trying to solve?

@mirhagk
Copy link

mirhagk commented Feb 5, 2015

I believe the use case is to make DateTime.Parse("2015-02-05") a compile time constant. Then you could use it as default parameters etc.

I think that use case is valid and nice, but I don't think this proposal is the right way to do that. I'd rather see DateTime.Parse("2015-02-05") allowed as a compile time constant. I think a much more useful proposal would be a way to mark DateTime.Parse as a constant method - a method that called with constant parameters would return a constant result. (Such a method would be a pure method, so I think it's probably better to say that pure methods allow propagation of compile time constants)

@mburbea
Copy link
Author

mburbea commented Feb 5, 2015

My main motivation is for things like Defaulting parameters. Where you want to allow a defaulted value to be used but the Default(T) is still a plausible and reasonable parameter choice (e.g. 0.0 is pretty reasonable for a lot of float math). If my type a struct I could make it nullable, so I could do something like T? for it. However, I then have to have some code in my method like

if(!myDefaultableParam.HasValue){ myDefaultParam = new T(....)}

I could also create overloads to emulate defaulted parameters, but that leads to code bloat, all because I can't specify a nicer.

@mikedn, Actually, cribbing more from C++ perhaps we can consider using a different marker and use a different bracketing to support it. Maybe, <>_TypeName and instead of a string literal, the compiler will take any number of existing constant-expressions delimited by comma, that match the signature of the method. Params will not be supported. Consider this <2.0f,4.7f>_Vector2f the compiler can verify that the arguments are floats and that they match the specification. Perhaps then with this format <2015,02,05>_DateTime isn't so bad.

@mikedn
Copy link

mikedn commented Feb 5, 2015

@mburbea

Actually, cribbing more from C++ perhaps we can consider using a different marker and use a different bracketing to support it

But what has this to do with C++? What you're describing is not what C++ does, C++ does what I said in my previous post. And anyway, why write something like <2015,02,05>_DateTime when you can write new DateTime(2015,02,05), just to replace new with _?

@mirhagk

I believe the use case is to make DateTime.Parse("2015-02-05") a compile time constant. Then you could use it as default parameters etc.

That will never work, the result of DateTime.Parse depends on the current culture. Even if it's "pure" the amount of work required to implement such methods is likely to be very high. Ask the VC++ compiler guys who even today don't have full support for C++'s constexpr.

@mirhagk
Copy link

mirhagk commented Feb 5, 2015

the result of DateTime.Parse depends on the current culture.

True in which case the correct overload should be used, but that should be used anyways.

Even if it's "pure" the amount of work required to implement such methods is likely to be very high.

It definitely is more work than this proposal, but it's orders of magnitude's nicer than this method. This proposed feature would just be a temporary fix that would be ditched if pure/const methods did get supported.

The language doesn't need to do the big jump that C++ did (which also added tons of other crazy features at the same time). We can slowly get there, and the work for marking a method as pure is something the language should add anyways.

@mikedn
Copy link

mikedn commented Feb 5, 2015

Hrm, there seems to be some serious confusion here.

  • Even if user defined literals are added to the language that doesn't mean that you can use them to specify default parameter values. Such values are limited to primitive types because that's the only thing that can be encoded in metadata. User defined literals are entirely a language feature, syntactic sugar for method calls that doesn't have anything to do with what can be stored in metadata.
  • There's no way that const methods are nicer than user defined literals because they're different features with different goals, you're comparing apples and oranges. However, it is possible to combine the 2 features by having a const literal operator.
  • Maybe that C++ "added tons of other crazy feature" but I was specifically referring to constexpr. When you say "make DateTime.Parse("2015-02-05")" a compile time constant you're really asking the compiler to evaluate that piece of code at compile time to produce the constant. And that's exactly what constexpr does in C++. I hope I don't need to explain why this is difficult or even impossible to implement. And even if you implement it the result is still not a constant, not according to the runtime definition of a constant.

@mirhagk
Copy link

mirhagk commented Feb 5, 2015

constexpr is more complicated since it must include constructors as well. It's still difficult to do so in C# but not to the point where the feature can't be considered (especially with the modern roslyn compiler where it would be possible for earlier parts of the compilation to use components from later parts. Not sure if this is the best design or not, but it wouldn't require rewriting an expression evaluator or anything)

However, it is possible to combine the 2 features by having a const literal operator.

This is what I mean. Doing so would solve the underlying use case without introducing new literal syntax and being more understandable to how developers would think it works. (Very often I hear how frustrating it is that the compiler is too "dumb" to now that something is "clearly" constant).

Such values are limited to primitive types because that's the only thing that can be encoded in metadata.

Hmm so either way it can't be expressed easily in the CLR. If either proposal wants to move forward there's two choices then, either the CLR could be expanded to understand other defaults, or the compiler could fake the optional parameters with overloads (perhaps using some attributes to mark to consuming compilers that this is actually an optional parameter).

I vote for holding off on this until the team decides on doing something with pure functions, which has been in discussion for a while.

@svick
Copy link
Contributor

svick commented Feb 6, 2015

@mikedn

Even if user defined literals are added to the language that doesn't mean that you can use them to specify default parameter values. Such values are limited to primitive types because that's the only thing that can be encoded in metadata.

You could encode the string value in the metadata. So, code like this:

void M(string s = "foo", MyDate date = "2015-12-31"_MyDate)

…

M();

would compile into:

M("foo", MyDate.op_suffix("2015-12-31"));

Though this means that the value would be parsed every time the method is called, which is not ideal.

@mburbea
Copy link
Author

mburbea commented Feb 6, 2015

The VB compiler and C# both already have some mechanism for this behavior. C# allows Decimal, and VB also allows datetime literals to be treated as constants. This data is encoded in an attribute when you use it as a default or as a constant field, and when you use it in code the compiler replaces it with a call to the constructor.
This is actually why somewhere in the roslyn codebase they decided not use to use 0m as a constant, and replaced it with a Decimal.Zero as calling the ctor over over and again in a critical path could be expensive.

@mikedn
Copy link

mikedn commented Feb 6, 2015

Yes, you can use decimal as a default value but that's because there's a special attribute - System.Runtime.CompilerServices.DecimalConstantAttribute. A similar one exists for DateTime.

Does this mean that when you add a user defined literal operator to a type the compiler should automatically generate such an attribute? Sounds kind of ugly because the public surface of the library containing the type will end up containing another type that's really just an implementation detail.

And assuming we go the custom attribute way, what has this to do with user defined literals? It seems to me that one could simply mark a constructor as "const/literal/pure/etc." to make this happen, similar to what @mirhagk suggested but only for constructors which have only primitive types as parameters.

Or, if you only care about DateTime, the C# compiler could simply treat some of the DateTime constructors specially and generate the DateTimeConstantAttribute. The language doesn't need to support DateTime literals to solve this problem.

@gafter
Copy link
Member

gafter commented Feb 10, 2015

This proposal is not actionable because it does not propose a lexical grammar. Please reopen if you make the proposal much more specific.

@HaloFour
Copy link

@mburbea @mikedn

I know this is closed but this came up in #2401. The attributes that allow the compilers to "fake" literals for System.Decimal and System.DateTime (for VB.NET) don't produce the embedded metadata necessary to allow either of those types to be used as either default values or custom attribute parameters. The compiler will allow you to declare a const but they're not an IL literal, instead they're converted into static readonly fields and a static constructor is emitted to initialize the value of that field.

@dsaf
Copy link

dsaf commented May 23, 2015

@gafter

Please find my take at requested lexical grammar below (includes portions of existing grammar for context, three dots indicate omitted text). Could you please re-open (unless it's better to create a new one)?


...
operator-declaration:
     attributesopt   operator-modifiers   operator-declarator   operator-body

operator-declarator:
...
     literal-operator-declarator

...
literal-operator-declarator:
     type
   literal   operator   identifier   (   type   identifier   )
...


literal:
...
     user-string-literal

...
user-string-literal:
     regular-string-literal   identifier
     verbatim-string-literal   identifier

...


Used sources:

https://msdn.microsoft.com/en-us/library/aa664812(v=vs.71).aspx
http://en.cppreference.com/w/cpp/language/user_literal

@dsaf
Copy link

dsaf commented May 23, 2015

Maybe this could be a more general feature of C# 7 subsuming the #215?

@gafter
Copy link
Member

gafter commented May 24, 2015

@dsaf Can you please add some motivating examples and a description of the semantics?

I don't see how that lexical grammar supports the use case described in the original post.

@amcasey amcasey added Resolution-Duplicate The described behavior is tracked in another issue Verified and removed Resolution-Duplicate The described behavior is tracked in another issue Verified labels Jun 2, 2015
@amcasey
Copy link
Member

amcasey commented Jun 2, 2015

I misread, #215 is not a dup.

@dsaf
Copy link

dsaf commented Aug 5, 2015

@gafter Thank you for reviewing. In terms of motivating examples to me personally it would help with custom attributes:

public class Sample
{
    [DefaultValueInt(123)]
    public int Index { get; set; }

    [DefaultValueDecimal(123m)]
    public decimal Price { get; set; }

    [DefaultValueVector2("15,20"_V2)]
    public Vector2 Position { get; set; }

    [DefaultValueDateTime("2015-05-02"_DateTime)]
    public DateTime Position { get; set; }

    [DefaultValueGuid("{9B37C7A8-63DA-4D8A-9FC6-DCBCF162D471}"_Guid)]
    public Guid UniqueId { get; set; }
}

public class DefaultValueIntAttribute : Attribute
{
    public DefaultValueIntAttribute(int defaultValue) { DefaultValue = defaultValue; }
    public int DefaultValue { get; private set; }
}

public class DefaultValueDecimalAttribute : Attribute
{
    public DefaultValueDecimalAttribute(decimal defaultValue) { DefaultValue = defaultValue; }
    public decimal DefaultValue { get; private set; }
}

public class DefaultValueVector2Attribute : Attribute
{
    public DefaultValueVector2Attribute(Vector2 defaultValue) { DefaultValue = defaultValue; }
    public Vector2 DefaultValue { get; private set; }
}

public class DefaultValueDateTimeAttribute : Attribute
{
    public DefaultValueDateTimeAttribute(DateTime defaultValue) { DefaultValue = defaultValue; }
    public DateTime DefaultValue { get; private set; }
}

public class DefaultValueGuidAttribute : Attribute
{
    public DefaultValueGuidAttribute(Guid defaultValue) { DefaultValue = defaultValue; }
    public Guid DefaultValue { get; private set; }
}

public struct Vector2
{
    public float X { get; set; }
    public float Y { get; set; }

    public static Vector2 literal operator _V2(string value)
    {
        //...
    }
}

In above code only the int-based attribute can be implemented today without parsing or other hacks. The code would of course look even simpler once #953 is implemented.

I find it weird that even decimal would not just work as a literal.

@dsaf
Copy link

dsaf commented Aug 5, 2015

@gafter Regarding semantics description - I am not too sure what you had meant. VB.NET already supports DateTime literals, so I guess it's the same semantics.

@HaloFour
Copy link

HaloFour commented Aug 5, 2015

@dsaf I don't see how that would help with attributes. The limitation there is that only very specific types can be encoded into the attribute BLOB metadata that the CLR then uses to reconstruct the instance at run time through reflection. Your helper methods couldn't come into play either during compilation or at runtime.

@dsaf
Copy link

dsaf commented Aug 5, 2015

@HaloFour do you mean it would involve CLR changes?

@HaloFour
Copy link

HaloFour commented Aug 5, 2015

@dsaf

Yes. How the metadata is encoded is a part of the ECMA-355 CLI spec II.23.3. The only supported types are explicitly bool, char, float, double, sbyte, short, int, long, byte, ushort, uint, ulong, string (UTF8 encoded), enum (prefixed and then stored as the underlying value), Type (prefixed and stored as the qualified type name), object (prefixed boxed value) and single-dimensional arrays of these types.

Even though C# fakes decimal as being a literal type (and VB.NET does the same for DateTime) the CLR does not recognize these value types as being something that can be embedded in the blob stream for custom attributes.

When it comes to default parameter values things are a little more forgiving since the default value is actually embedded at the callsite by the compiler. When you define a default value for a decimal parameter the C# compiler marks the parameter with the [opt] flag as well as the System.DecimalConstantValueattribute that breaks thedecimal` value down into a series of primitive values (two bytes and two ints) that the compiler reconstructs into a default decimal.

@Opiumtm
Copy link

Opiumtm commented Oct 11, 2016

@mirhagk

I believe the use case is to make DateTime.Parse("2015-02-05") a compile time constant. Then you could use it as default parameters etc.

As C# support scripting mode, it should be possible to declare compile-time evaluated values in code.

public const DateTime MyDate = DateTime.Parse("2015-02-05");

DateTime.Parse("2015-02-05"); expression can be evaluated at compile time. Compiler certainly know this part of syntax tree at compile-time and can execute it like script, and then assign value to constant. So const field qualifier would signal compiler to execute expression at compile-time if it is possible. If it isn't possible, compiler should convert it to static readonly and output warning "expression NNN can not be evaluated at compile time, const field converted to static readonly".

If default value for method parameter can't be evaluated at compile-time it would result in error "expression NNN can not be evaluated at compile time".

@iam3yal
Copy link

iam3yal commented Oct 23, 2016

@Opiumtm

expression can be evaluated at compile time. Compiler certainly know this part of syntax tree at compile-time and can execute it like script

The compiler doesn't really know whether the method is deterministic or whether it has side-effects, this is the reason in C++ we need to mark our methods with constexpr that comes with some restrictions.

Doing some heuristics takes time and it will be very slow, complex and finally not safe.

@Opiumtm
Copy link

Opiumtm commented Oct 23, 2016

@eyalsk
There are two options:

  1. Requirement to explicitly or implicitly mark code as deterministic using attributes (compiler can infer it in many cases and add such attribute automatically). It will be beneficial for .NET in general as CLR JIT or .NET Native could use this information for code optimizations. If some code is known to be deterministic and if it don't have side effects, constant arguments would lead to known at compile time results. It's also usable for C# compiler.
  2. Let developer to shoot at his leg if he want to.

I'd prefer first approach. .NET compilers, runtime and common base libraries absolutely needs to mark code paths and type members as deterministic or not and as having side-effects or not. In many cases compiler would automatically infer if code path is deterministic and haven't side effects. If method doesn't change any state outside of its body and don't invoke any nondeterministic nor side-effected code, method body is inherently deterministic and could be marked by compiler as deterministic.
For more explicit approach it would be acceptable to mark method as "deterministic" (and class as immutable and so on) explicitly by attribute or keyword. Explicitly marked as "immutable" or "deterministic" code wouldn't compile if it isn't meet expectations. If it's not marked explicitly it anyway should be marked by compiler as "deterministic or not", "mutable or not", but in this case it means you don't care about its status and it's acceptable for you to successfully compile it regardless of its actual "pureness".

@Opiumtm
Copy link

Opiumtm commented Oct 23, 2016

@eyalsk

Doing some heuristics takes time and it will be very slow, complex and finally not safe.

No heuristics are needed. Let's take conservative approach and mark code during compilation as deterministic so those evaluations are "cached" and transfered in binary form. It would end up at some form of dependency graph (possibly having cycles). Change or fetch of some state at method other than from its arguments would immediately make method nondeterministic. Any invokes of already known as nondeterministic members would also make method nondeterministic unless this code path is never executed actually and it's clearly known at compile time.

One thing to consider (if we talk about this proposal) is the recursion and infinite cycle as method with infinite recursion or infinite cycle can be heuristically "pure", but any attempts to execute it would result in stack overflow, out of memory or just never finished so it would crash or hang compiler. And even if cycle is formally finite, taking 100 billion of iterations would result in effectively compiler hang. There is no much difference between hang and 8 hours of execution practically.

@iam3yal
Copy link

iam3yal commented Oct 23, 2016

@Opiumtm

Requirement to explicitly or implicitly mark code as deterministic using attributes (compiler can infer it in many cases and add such attribute automatically).

How would the compiler infer it? and if it can why do we need an attribute? it can compute the result and emit the result directly.

Change or fetch of some state at method other than from its arguments would immediately make method nondeterministic.

Can you give an example? because I can't really see how you got to this conclusion. :)

One thing to consider (if we talk about this proposal) is the recursion and infinite cycle as method with infinite recursion or infinite cycle can be heuristically "pure"

Yeah, there's some recursion depth limit, can't remember what it is but it's fair.

@Opiumtm
Copy link

Opiumtm commented Oct 23, 2016

@eyalsk

and if it can why do we need an attribute? it can compute the result and emit the result directly.

Because without it wouldn't be possible for compiler to make decisions on 3-rd party or framework code. So, to expose this to consumers it require to mark assembly public API with attributes.

Can you give an example? because I can't really see how you got to this conclusion. :)

Just we take conservative approach and make assumption of "nondeterministic if doesn't proved opposite". Interaction with external (to method body) state except that its parameters don't always make method nondeterministic, but it's reasonable to think that if method don't interact with external state it's deterministic.

public class TestClass
{
    // this method is pure as it don't interact with anything except method argument.
    public int Method1(int arg)
    {
        return arg + 10;
    }

    private int someVal;

    public void MutateState(int v)
    {
        someVal = v;
    }

    public int Method2(int arg)
    {
        return arg + someVal;  // OOPS, we fetch mutable value here - method isn't deterministic in any fashion
    }

    private readonly int immutableVal; // assigned in constructor and immutable

    public int Method2(int arg)
    {
        return arg + immutableVal;  // This method is deterministic only in the object instance context as it fetch immutable deterministic instance field
    }

    private static readonly int staticallyImmutableVal = 5;

    public int Method3(int arg)
    {
        return arg + staticallyImmutableVal;  // This method is strongly deterministic as it reference static immutable deterministic field
    }

   private static readonly int randomValue = (new Random()).NextInteger();

    public int Method4(int arg)
    {
        return arg + randomValue;  // This method is nondeterministic as randomValue invoke non-deterministic member of Random and so randomValue field itself considered inherently nondeterministic
    }
}

@ghost
Copy link

ghost commented Dec 8, 2017

Such literals would be quite useful in e.g such case.

Thread.Sleep(10s);

public static int operator s(int seconds) {
   return seconds * 1000;
}

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Dec 8, 2017

why is 10s equal to 10000? Seems to only make sense in contexts where you need milliseconds.

That's why things like TimeSpan are useful, as they make it completely clear what you're doing:

Thread.Sleep(TimeSpan.FromSeconds(10));

If you really wanted s, m, h, and whatnot to be operators, it would make much more sense for them to return TimeSpans.

@ghost
Copy link

ghost commented Dec 8, 2017

Of course in context of OOO TimeSpan is much better. Just wanted to show a bit better example of usage than parsing dates. At least in my opinion and from what I've seen in C++11 usages.

@julealgon
Copy link

It's not clear to me exactly why this was closed. The initial request was for a grammar, but isn't that kind of an implementation detail at the end of the day?

Allowing user-created literal suffixes would be so cool with several different types of data:

  • temperatures (kelvin, celsius, Fahrenheit)
  • time (seconds, minutes, hours)
  • distances (meters, yards...)
  • angles...
  • electrical power...
  • so on and so forth

It would also allow to potentially combine multiple suffixes to have "multiplicative" suffixes (kilo, kibi, centi, nano...) with the units themselves.

Honestly, the examples in the OP and some of the follow-up discussions seem pretty weak to me and of very narrow usage. Representing real-word math and physics data seems so much more impactful.

Say I want "1 hour and 15 minutes".
Today I have to:

var value = TimeSpan.FromHours(1) + TimeSpan.FromMinutes(15);

With fluent extensions methods, it gets better:

var value = 1.Hour() + 15.Minutes();

But literals would be a vast improvement still:

var value = 1h + 15m;

And could even be expanded to something that accepts "multiple inline suffixes", like:

var value = 1h15m;

For bytes, you could do (preferably with a custom TimeSpan-like type, for example ´ByteSize` (https://github.com/omar/ByteSize):

var totalMemory = 15kiB; // => 15 * 1024 bytes

@CyrusNajmabadi
Copy link
Member

It's not clear to me exactly why this was closed.

Language change requests go through dotnet/csharplang now, not roslyn. Roslyn is the implementation of the language, not the place to design the language itself :-)

But literals would be a vast improvement still: var value = 1h + 15m;

m is already a legal suffix :-)

@julealgon
Copy link

Language change requests go through dotnet/csharplang now, not roslyn. Roslyn is the implementation of the language, not the place to design the language itself :-)

Got it. Is there a similar proposal on csharplang that we could link with this for future readers/people interested like myself?

m is already a legal suffix :-)

Hadn't come to my mind while posting. Yeah that's tough... I guess such a system would have to have some sort of ambiguity resolution where if multiple suffixes are found in scope, the selection would be based on target type (not too dissimilar to "target-typed new"), so you'd have to do:

TimeSpan value = 1h + 15m;

or

decimal value = 15m;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests