Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A case against positional or by keyword binding of ParamArray/ParamDictionary parameters #249

Open
BCSharp opened this issue Nov 13, 2020 · 5 comments

Comments

@BCSharp
Copy link
Member

BCSharp commented Nov 13, 2020

There is a certain inconsistency in handling parameters marked with ParamArrayAttribute or ParamDictionaryAttribute. It is reflected in both comments in the DLR code, as well as code itself. Since this is a fundamental concept in the overload resolution, I want to discuss it explicitly and agree on a principle, before changing the DLR behaviour in one way or another. It would be very useful to know which behaviour is desirable and which should be considered a bug. Once this principle is agreed on and implemented, it may be difficult to reverse on it in the future, since many parts of the overload resolution code depend on it.

Let's look at the current situation first. For instance, ParameterWrapper.cs reads:

// params arrays & dictionaries don't allow assignment by keyword

On the other hand, a comment on ParamDictionaryAttribute reads:

/// For languages which don't have language level support the user will be required to
/// create and populate the dictionary by hand.
/// 
/// This attribute is the dictionary equivalent of the System.ParamArrayAttribute.

This suggests that it should be possible to pass such param dictionary and/or param array somehow. Since passing by keyword should be forbidden, this leaves that passing by position should be supported. But this leads to logical inconsistencies, which I describe a little bit later below.

On the actual code side, some parts of the code assume that passing param dictionary/array by keyword is allowed, and some assume that it is forbidden. This leads to erroneous behaviour, ranging from unexpected results, through confusing error messages, to exceptions thrown. Example (from IronPython):

import clr
clr.AddReference("rowantest.methodargs")
from Merlin.Testing.Call import VariousParameters
obj = VariousParameters()
obj.M560(1, 2, 3, z=4, b=5, c=6)

Method M560 has signature public void M560(int x, [ParamDictionary] IDictionary<string, int> y, params int[] z) and the call should succeed. Instead, and IndexOutOfRangeException is thrown:

Traceback (most recent call last):   
  File "<stdin>", line 1, in <module>
IndexError: Index was outside the bounds of the array.

Interestingly, the exception is not thrown during overload resolution, but during the actual call itself, because the call site has been constructed incorrectly: it contains three keywords for keyword arguments (z, b, c) but only two values (5, 6). The value for parameter z is taken over by an actual param array to accommodate additional positional arguments (2 and 3).

Regarding passing by position, the current DLR code does accept param array passed by position:

import System
ar_int = System.Array[System.Int32](3)
# public void M202(params int[] arg)
obj.M202(ar_int) # OK
ar_obj = System.Array[System.Object](3)
obj.M202(ar_obj) # ar_obj passed directly to arg
obj.M202(ar_obj, ar_obj) # arg is an array of 2 references to ar_obj

The last two lines show inconsistency in handling matching type values. I personally think it is unfortunate for a dynamic language, but it matches C#.

Param dictionary, on the other hand, is not accepted when passed as a positional argument:

dict_obj = System.Collections.Generic.Dictionary[System.Object, System.Object]()
# public void M203([ParamDictionary] IDictionary<object, object> arg)
obj.M203(dict_obj)
TypeError: M203() takes no arguments (1 given)

Inconsistencies by allowing keyword binding

Regardless the error, method M560 raises a question what the desired behaviour should have been. For instance, one may say that z=1 should not be interpreted as providing an argument for params int[] z because 1 is not int[], and instead it should be put in dictionary y. But what if the argument (which could be just a variable) were of type int[]? Then we would have a strange behaviour that depending on the type of the argument, a named argument is being processed quite differently. This may be OK in statically-typed languages, but in dynamic languages the type of a variable may not be obvious or may be undetermined at the call site. Finally, such resolution would significantly complicate the current DLR algorithm, which does arity based resolution first, then type based (and determining which positional and keyword arguments go where is part of method arity determination).

Another option could be to always assume that z=some_value binds to the param array and report an error in the case above. Such error would be confusing at best (something like expected int[] got something), but also eliminate a possibility for passing arbitrary keyword arguments to be collected in the param dictionary. Traditionally (IronPython 2), this was solved by using Unicode character U+00F8 trick in the parameter name, which made the name a valid identifier in C# but not in Python. This trick is no longer working with IronPython3 because Python 3 identifiers can contain Unicode letters too. A better solution would be to use a dedicated parameter attribute (say PositionalOnlyAttribute) to explicitly prevent binding by name. This could work but it would mean that some methods with param arrays behave differently than other, and it may be confusing to the user especially if there are several methods in the method group to choose from.

Similar reasoning can be done for keyword binding of param dictionary. In a such case, the option to always assume that y=some_value (where y is a param dictionary) always binds by name to the whole param dictionary is even more untenable, because an often assumption is that a param dictionary as a sole parameter should be able to catch any keyword arguments.

So my conclusion is that the comment in the DLR that params arrays & dictionaries don't allow assignment by keyword is the best way to resolve the ambiguities and the code should be updated accordingly.

Inconsistencies by allowing positional binding

Let's assume the following signature:

public static void M([ParamDictionary] IDictionary<object, object> kwargs, params object[] args)

Arguably, such method should be able to take an arbitrary number of positional and keyword arguments thus never throw a type error.

Further let's assume that dynamic variables kw and ar contain values typed IDictionary<object, object> and object[] respectively, and a a value of any other type. What should be the result of the following calls:

M(a)

Expected: args[0] == a, kwargs empty, nothing strange yet.

M(kw)

kwargs == kw, args empty? (since kw in sin position 1 of kwargs). Or args[0] == kw?

M(kw, a)

kwargs == kw, args[0] == a? (since kw is in position 1 of kwargs). Or args[0] == kw, args[1] == a?

M(a, kw)

kwargs == kw, args[0] == a? By what rule? Or args[0] == a, args[1] == kw?

M(kw, ar, a)

kwargs == kw, args[0] == ar, args[1] == a? (since kw is in position 1 of kwargs, but ar is in position of 2 and not bound to args). Or args[0] == kw, args[1] == ar, args[2] == a?

M(ar)

args == ar? By what rule? ar is in position 1 so there is a type conflict between ar and kwargs. Or maybe args[0] == ar, like in the first example. At least that would be consistent. And what if ar had such a value that ar[0] == kw? Should kwargs == kw (positional binding) or args == ar?

M(ar, a)

I suppose this must be args[0] == ar and args[1] == a, but it is quite different handling of ar than in the previous example.

M(a, ar)

I suppose this must be args[0] == a and args[1] == ar. But a is in position of 1, so it does not bind positionally. Nor does ar, although the type matches.

M(kw, ar)

kwargs == kw, and args == ar, obviously, if positional binding is allowed. Otherwise args[0] == kw, args[1] == ar.

M(ar, kw)

args[0] == ar, args[1] == kw? This means that positional binding is suspended. Quite different outcome than above.

Things get way more complicated with more normal or optional parameters and mixing position and keyword arguments.

I see the only simple and consistent way of handling it by prohibiting positional binding to param arrays and param dictionaries. In the examples above, the types of kw and ar were known, but in a dynamic case, a call site may get its arguments passed on from some unknown source, of an unknown type. It would be really confusing that it may change how the overload resolution is performed.

Prohibiting positional binding means that param dictionaries and param arrays are always constructed implicitly from unassigned/superfluous keyword or positional arguments. If a language has a means (i.e. a special notation) for splatting (like Python), then there is an explicit way for the caller to provide args or kwargs in their array or dictionary form. If a language does not have such syntax, such calls are not possible. I see no reason why DLR should go out of its way to support such calls if the language itself does not.

I do realize that it is a radical call, not the least because the DLR for years already supported positional argument for param arrays (though not for param dictionaries). So a more compromising solution could be to keep it supported, but explicitly prohibit positional argument for param dictionaries. After all, while param arrays are being used by .NET, param dictionaries are specific to the DLR, so a dynamic language implementation is in full control when and if to use them. The price for this option is a more complicated overload resolution ruleset.


One more thing. Generalizing this principle of prohibiting positional binding to be available to other parameters would be very useful. Specifically I was thinking of providing a dedicated attribute (e.g. KeywordOnlyAttribute) to be applied in such cases. This would be a counterpart to PositionalOnlyAttribute.

For instance, a Python builtin function sorted(iterable, /, *, key=None, reverse=False) could be implemented by a method with the signature:

object sorted([PositionalOnly]object iterable, [KeywordOnly]object key = null, [KeywordOnly]bool reverse = false)

I am not sure about the names of the attributes, though. Technically, [PositionalOnly] would actually mean [ProhibitKeyword]. Also, [NonKeyword] would be shorter. Similarly, [KeywordOnly] is informative, but actually means [ProhibitPositional] which is a mouthful; another option [NonPositional] is not the shortest and less informative. [ProhibitPositional] could also well fit a case when applied on a param array implemented by a dynamic language, if the DLR principle turns out to stay that param arrays by default are allowed to be bound by position.

Putting both [PositionalOnly, KeywordOnly] on the same parameter sounds contradictory and should (?) be prohibited (IronPythonAnalyzer job), because it would made the parameter inaccessible to the language. In IronPython, there is already a way of making things inaccessible to the language: [PythonHidden] that is applicable to a wider range of entities but not parameters, so I doubt whether hiding parameters in this fashion would be useful.

@slozier
Copy link
Contributor

slozier commented Nov 15, 2020

Traditionally (IronPython 2), this was solved by using Unicode character U+00F8 trick in the parameter name, which made the name a valid identifier in C# but not in Python. This trick is no longer working with IronPython3 because Python 3 identifiers can contain Unicode letters too.

Actually IronPython 2 allows unicode identifiers as well so the trick just makes it harder to use.

I agree the whole thing is a confusing mess. It seems like not binding to the ParamArray/ParamDictionary parameters would be entirely reasonable. Here are a few questions that come to mind:

  • Could a language opt-in to such a behaviour or would it be stuck with whatever we choose?
  • Would PositionalOnly/KeywordOnly need to be attributes of the DLR or could the implementation be done at the language level (in IronPython)?
  • Do keyword-only attributes exist in languages other than Python? If not could it be implemented at the language level or would the DLR need to be adapted to allow this? Makes me wonder if the ParamDictionaryAttribute is used outside IronPython.
  • What happens if a method using KeywordOnly somehow makes its way back to C# and is used as a dynamic? I guess that's also a question if I define a method like def test(*, a=1): pass. Maybe the whole thing is already broken, but it would be unfortunate if a method couldn't be invoked when using dynamic.

@BCSharp
Copy link
Member Author

BCSharp commented Nov 16, 2020

  • Could a language opt-in to such a behaviour or would it be stuck with whatever we choose?

I suppose we could add a (few) new protected virtual method(s) to OverloadResolver to allow the language to override the default opt-out, but it still does not resolve the issue that currently the DLR does not handle such opt-ins well. I have just found another place in the DLR that assumes no binding to args/kwargs.

  • Would PositionalOnly/KeywordOnly need to be attributes of the DLR or could the implementation be done at the language level (in IronPython)?

I was thinking of adding it at the DLR level because handling of keyword arguments is done primarily by DefaultOverloadResolver but also partly in OverloadResolver. With the extra protected virtual method approach those attributes can be introduced at the IronPython level, though call failure reason propagation up to OverloadResolver may be tricky (I assume that the existing protected virtuals are part of the unchangeable official API, but that API is somewhat constrained with respect to handling comprehensive error reporting). So some support at the DLR will be needed; I doubt whether support for PositionalOnly/KeywordOnly could be done without any whatsoever change to the DLR.

  • Do keyword-only attributes exist in languages other than Python? If not could it be implemented at the language level or would the DLR need to be adapted to allow this? Makes me wonder if the ParamDictionaryAttribute is used outside IronPython.

Julia?

  • What happens if a method using KeywordOnly somehow makes its way back to C# and is used as a dynamic? I guess that's also a question if I define a method like def test(*, a=1): pass. Maybe the whole thing is already broken, but it would be unfortunate if a method couldn't be invoked when using dynamic.

Hm, I don't know how to provide keyword arguments to a dynamic method invocation in C#. Perhaps a way to invoke could be though ObjectOperations.Invoke? The existing Invoke overloads do not take keyword arguments either, but it could be added?

Since a builtin method after making its way back to C# will be a BuiltinFunction, maybe another way it can be invoked is though one of the Call overloads, made public for this purpose? Those guys do take kwargs.

@slozier
Copy link
Contributor

slozier commented Nov 16, 2020

Alright, I think we could go ahead with excluding binding of ParamArray/ParamDictionary parameters. I would probably make the PositionalOnly/KeywordOnly attributes a separate proposal (I'd hate to add public attributes and then later realize it wasn't what we really wanted).

Hm, I don't know how to provide keyword arguments to a dynamic method invocation in C#.

I was thinking of something along the lines of:

var engine = Python.CreateEngine();
var scope = engine.CreateScope();
engine.Execute("x = sorted", scope);
dynamic sorted = scope.GetVariable("x");
var iterable = new int[] { 1, 3, 2 };
Console.WriteLine(string.Join(", ", sorted(iterable, reverse:true)));

It looks like this currently works (which is nice) so we'd have to make sure not to break it.

Perhaps a way to invoke could be though ObjectOperations.Invoke?

Yes, it seems like some overloads in ObjectOperations would be useful:

public dynamic Invoke(object obj, object[] args, IDictionary<string, object> kwargs);
// InvokeMember
// CreateInstance

Seems to be one of the recommended way to call things (https://stackoverflow.com/questions/8970458/ironpython-call-method-by-name). I don't see how you can invoke the reversed sorted example via ObjectOperations right now so even if we don't go ahead with the proposal, some overloads may be worth adding.

Since a builtin method after making its way back to C# will be a BuiltinFunction, maybe another way it can be invoked is though one of the Call overloads, made public for this purpose?

Would be best to have something more agnostic and not rely on the fact that it's a BuiltinFunction. Invoking a user-function with keyword-only arguments should work the same way.

@BCSharp
Copy link
Member Author

BCSharp commented Nov 16, 2020

it seems like some overloads in ObjectOperations would be useful

This is my preferred approach too (over IronPython specific), but I am having second thoughts about simply overloading Invoke (and InvokeMember, CreateInstance). While the signature you suggest is exactly what I had in mind as the most convenient, it would create a cornercase incompatibility with the current behaviour. Since currently parameters is a param array, providing two parameters of type object[] and IDictionary<string, object> is now putting them in a 2-item parameters array representing two positional arguments, but with the overloads, they would be effectively splatted.

To retain existing behaviour, existing user code would have to change from

engine.Operations.Invoke(f, arr, dict)

to something like

engine.Operations.Invoke(f, new object[] { arr, dict})

Are we okay with that?

Otherwise, maybe a different method name, SplatInvoke etc.?

I would probably make the PositionalOnly/KeywordOnly attributes a separate proposal

Fair enough.

@slozier
Copy link
Contributor

slozier commented Nov 16, 2020

Are we okay with that?

Although it seems unlikely to occur it could indeed break people. I'd be fine with a separate name although I can't think of a good one. I think prefixing with the existing method name would be better for discoverability. I'm not sure I like Splat (I know it comes up in a few places, but mostly internal). These seem kind of verbose, but could be an option: InvokeWithKeywords or InvokeWithDictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants