New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe.Unbox should return "ref readonly T", not "ref T" #34258

Open
GrabYourPitchforks opened this Issue Dec 27, 2018 · 16 comments

Comments

Projects
None yet
4 participants
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Dec 27, 2018

Edit 08 Jan 2019: I've updated the title to reflect that Unsafe.Unbox<T>(object) should return ref readonly T, not ref T. If we do this we should also move this API off of the Unsafe class and on to a different static class because it's no longer an unsafe operation.

The original issue text is preserved below for historical context.


This API was introduced in #30730.

The purpose of Unsafe.Unbox<T> is to allow mutating boxed instances of value types. However, there's an unintended side effect of this change: it forever locks us in to an implementation where boxing always allocates a new instance of an object.

It's possible to imagine a future runtime where certain well-known instances of boxed value types are cached singletons. For example, when boxing the Boolean values true or false, we could as an optimization return the appropriate cached singleton instead of allocating a new instance every time. We could do the same thing with the integers 0 - 10 for convenience, just as the async state machine infrastructure does today. This could even be extended: we could cache a boxed default(T) for any readonly struct T.

Technically returning a cached singleton in response to a box instruction is a violation of ECMA-335, which says that the box instruction must perform an allocation. But we already violate ECMA in similar places where the side effect shouldn't be observable. For example, ECMA uses similar language that the newobj instruction must perform an allocation, but new string(new char[0]) returns a cached singleton instance.

My recommendation is to remove Unsafe.Unbox<T> from the reference assemblies but to keep it in CoreLib. Since CoreLib and the VM are married, we can always evolve the managed and unmanaged code bases together so that if a VM change does come online we won't break any existing managed code.

To address the immediate scenario that led to the creation of #30730, I suggest those specific call sites create their own stand-in class that mimics a boxed integer, as below.

internal sealed class MyBoxed<T> : IFormattable where T : struct, IFormattable
{
  public T Value;
  public MyBoxed(T value) { Value = value; }
  public override int GetHashCode() => Value.GetHashCode();
  public override string ToString() => Value.ToString();
  public string ToString(string format, IFormatProvider formatProvider) => Value.ToString(format, formatProvider);
}

Using such a class would not require use of any unsafe code at all.

/cc @terrajobst @jkotas @stephentoub

@stephentoub

This comment has been minimized.

Copy link
Member

stephentoub commented Dec 27, 2018

My recommendation is to remove Unsafe.Unbox from the reference assemblies but to keep it in CoreLib.

Unbox isn't in Corelib. Your proposal is to just delete the new API entirely, no?

@stephentoub

This comment has been minimized.

Copy link
Member

stephentoub commented Dec 27, 2018

It's possible to imagine a future runtime where certain well-known instances of boxed value types are cached singletons.

In which case it would still be "Unsafe", and direct use of the IL instruction it's just exposing would also be broken for such use?

@benaadams

This comment has been minimized.

Copy link
Collaborator

benaadams commented Dec 27, 2018

Might make it pointless; but could change it to readonly

public static readonly ref T Unbox<T>(object box) where T : struct;

And introduce the MyBoxed as an actual type?

namespace System
{
    public sealed class MutableBox<T> : IFormattable where T : struct, IFormattable
    {
        public ref T Value { get; }
        public MutableBox(T value) { Value = value; }
        public override int GetHashCode() => Value.GetHashCode();
        public override string ToString() => Value.ToString();
        public string ToString(string format, IFormatProvider formatProvider) => Value.ToString(format, formatProvider);
    }
}
@stephentoub

This comment has been minimized.

Copy link
Member

stephentoub commented Dec 27, 2018

I'm not seeing how the Unbox method is any more problematic than the IL instruction it simply wraps and that has existed since the beginning of .NET. If we do something in the runtime that causes grief for Unbox, it would also cause grief for unbox and anything else using it beyond the Unbox method.

@benaadams

This comment has been minimized.

Copy link
Collaborator

benaadams commented Dec 27, 2018

For example, when boxing the Boolean values true or false, we could as an optimization return the appropriate cached singleton instead of allocating a new instance every time. We could do the same thing with the integers 0 - 10 for convenience

Can't find it now, but I do remember having/reading a conversation on github about this

aside: == only works in Java for Integers between -128 and 127 as it reuses them so reference equality works, but outside that range reference equality no longer works as they are then boxed to different Integer objects https://stackoverflow.com/a/3637978

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Dec 27, 2018

allow mutating boxed instances of value types

Other methods on Unsafe allow you to achieve equivalent effect with very little effort. For example, folks can do the following:

class Boxed<T> { public T Data; }
static ref T MyUnbox<T>(object o) => Unsafe.As<Boxed<T>>(o).Data;

Or you can use Reflection.Emit to create a dynamic method that does the exact same thing as Unsafe.Unbox.

Unsafe.Unbox is not changing the game much. The advantage of it is that you can easily do static scan for it to look for problematic patterns if you need to.

I strongly prefer folks having access to the official unsafe Unbox than to create their own local hacked up versions of it. This anti-pattern is found for ImmutableArray: We did not add official unsafe method to get access to underlying mutable array, and there are many different local hacked up versions of it today. It is worse situation than having official unsafe method to get access to underlying mutable array.

@GrabYourPitchforks

This comment has been minimized.

Copy link
Member

GrabYourPitchforks commented Jan 4, 2019

I'm not seeing how the Unbox method is any more problematic than the IL instruction it simply wraps and that has existed since the beginning of .NET. If we do something in the runtime that causes grief for Unbox, it would also cause grief for unbox and anything else using it beyond the Unbox method.

@stephentoub Per ECMA-335 (III.4.32, III.1.8.1.2.2), the unbox IL instruction returns something akin to ref readonly T, not ref T. In fact, III.1.8.1.2.2 explicitly lists a boxed Sytem.Int32 as an example where the unbox instruction is intended to return a read-only reference, not a mutable reference. Trying to mutate the reference would compile and execute, but it would then become unverifiable.

Part of this is that I'm trying to draw a distinction between unverifiable (sometimes called "unsafe") and dangerous. Unverifiable / unsafe code simply means that the runtime can't confirm that what you're doing is type-safe. The best example is reading an element from an array without performing a bounds check. As long as you know what you're doing and as long as you've performed all the checks manually up-front, what you're doing will not destabilize the VM and will be supported across all CLRs until the heat death of the universe.

Dangerous code, on the other hand, is unverifiable / unsafe code that takes a dependency on a particular runtime behavior. Mutating a string in-place is an example of such code, because the runtime may employ optimizations which assume that strings are immutable, and developers violating this invariant may cause runtime instability. I see mutating a boxed object via the Unsafe.Unbox API in the same bucket as this. Since the runtime really doesn't expect things like boxed Int32 instances to be mutated, we shouldn't provide in-box a mechanism to do this, even under the Unsafe class.

Summary: I'm ok with Unsafe APIs that allow power developers to write unverifiable / unsafe code. I'm really nervous about adding Unsafe APIs that encourage developers to write dangerous code.

Ninja edit to respond to @jkotas's point: yes, I know that Unsafe.As lets you accomplish the same thing, but at that point the developer is quite literally writing "I'm lying about the type of this object in order to get the runtime to do something it normally forbids." That's not the advertised purpose of the Unsafe.As API - the advertised purpose is the unverifiable (but not necessarily dangerous) operation of casting an object to a known type without incurring a runtime type check. Lying about the real type could break in a future runtime if the layout of a boxed value type changes from the layout of the Boxed<T> class in your example. This is different in spirit from the proposed Unsafe.Unbox API, whose advertised purpose is the unverifiable and dangerous operation of allowing mutation of boxed value types.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Jan 4, 2019

@GrabYourPitchforks Would changing the return type to readonly ref address your concerns?

We have been avoiding readonly ref in S.R.CS.Unsafe because of it is easy to end up with unintentional clones, but I think we can make an exception for this API.

@GrabYourPitchforks

This comment has been minimized.

Copy link
Member

GrabYourPitchforks commented Jan 4, 2019

@jkotas Yes, changing it to readonly ref T would address my concerns re: dangerousness, but then you could argue that the API isn't useful because it doesn't address the original scenario which led to its proposal (#30730). Assuming it's changed to return readonly ref T, is there some other scenario the API still addresses?

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Jan 4, 2019

Assuming it's changed to return readonly ref T, is there some other scenario the API still addresses?

You can use it to unbox types without creating a copy. I do not think C# allows you to do that today.

static void m(object boxedGuid)
{
   ref readonly Guid g = (Guid)boxedGuid; // does not compile
   ...
}
@stephentoub

This comment has been minimized.

Copy link
Member

stephentoub commented Jan 4, 2019

but then you could argue that the API isn't useful because it doesn't address the original scenario which led to its proposal

When used with Unsafe.AsRef to cast away the readonly-ness it still would, albeit more cumbersome.

That said, I'm still not convinced it's bad as-is.

@GrabYourPitchforks

This comment has been minimized.

Copy link
Member

GrabYourPitchforks commented Jan 4, 2019

When used with Unsafe.AsRef to cast away the readonly-ness it still would, albeit more cumbersome.

But that's the crux of the problem. That's taking an object that the runtime intended to be immutable (a boxed Int32) and mutating it in-place, which could lead to undefined behaviors in a future CLR. Just like how mutating immutable String instances in-place is a thing you can technically do, but it could result in undefined behavior.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Jan 4, 2019

I think the crux of the problem is that we cannot prevent dangerous code from being written. We can really only make it (1) more cumbersome to write and (2) easier to audit.

If your are targeting ECMA spec, the dangerous code can result in undefined behavior as you have said. If you are targeting a specific runtime and know what you are doing, the dangerous code often has pretty well defined behavior and you can reason about its correctness.

@benaadams

This comment has been minimized.

Copy link
Collaborator

benaadams commented Jan 4, 2019

Forcing things to go via ref T AsRef<T>(in T source) seems to both make it (1) more cumbersome and (2) easier to audit?

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Jan 4, 2019

Correct.

@GrabYourPitchforks

This comment has been minimized.

Copy link
Member

GrabYourPitchforks commented Jan 4, 2019

Yes. And if Unbox<T> is changed to return ref readonly T, you could even move it off the Unsafe class since it's a perfectly 100% type-safe and verifiable operation. (This assumes it keeps its current behavior of performing the type check.)

Forcing things to go via ref T AsRef(in T source) seems to both make it (1) more cumbersome and (2) easier to audit?

I don't really want to make things "cumbersome" per se. If unsafe / unverifiable operations are common, we should make them easy to perform. But the call sites should very clearly say "I'm not verifiable" - and I think the class name Unsafe conveys this.

Partly I'm pushing back against the original scenario. There's a separate conversation to be had regarding whether we (Microsoft) would ever want to ship a package that contains code which mutates boxed Int32 instances. Presumably any NuGet package we ship must run correctly on any compliant runtime - or at least fail gracefully with PlatformNotSupportedException if there's no viable fallback behavior for a necessary OS or runtime facility. Since the package can't guarantee that it's executing atop a runtime which allows mutation of boxed Int32 instances without incurring instability, I don't see how we could ever ship a package that contains such code.

tl;dr: readonly-returning Unbox is fine. AsRef is fine. The mutate-a-boxed-object-via-Unbox-followed-by-AsRef pattern I don't think we would want to ship in our own code. It's fine that we can't prevent developers from doing this, but I don't want to bless it.

@GrabYourPitchforks GrabYourPitchforks changed the title Remove Unsafe.Unbox from reference assemblies Unsafe.Unbox should return "ref readonly T", not "ref T" Jan 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment