New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BinaryFormatter to corefx #10088

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@stephentoub
Member

stephentoub commented Jul 15, 2016

This ports BinaryFormatter from desktop, along with all of the supporting types and general Formatter-related types. I've added a bunch of tests, bringing code coverage up to ~80%, though there's still more that can/should be done. I did not port remoting-specific functionality, though there are still some remnants I've left as they were enabled in the runtime even when remoting wasn't compiled in, as it wasn't clear if they were serving additional purposes. I did a bunch of cleanup, but there's still more that can be done in the fullness of time.

I already made several changes to CoreCLR to enable this:

  • dotnet/coreclr#6253 fixed the BclRewriter to stop removing the serialization metadata from types in coreclr. Without that, attempting to serialize even primitive types was failing.
  • dotnet/coreclr#6263 then removed [Serializable] from types that performed custom serialization (e.g. ISerializable), because currently the ISerializable in CoreFx is a different runtime type from the one in coreclr, which meant that formatters trying to serialize such types would use default serialization rather than picking up the custom implementation. Once we move ISerializable and friends into System.Runtime as type forwards rather than redefinitions of the interfaces, we can re-enable the serialization of those types.

(This PR is currently going against master. If that turns out to be the wrong branch for it, I'll resubmit against the correct branch prior to merging.)

cc: @danmosemsft, @jkotas, @KrzysztofCwalina, @ianhays, @bartonjs
Fixes #7938
Fixes #9582

@stephentoub stephentoub force-pushed the stephentoub:binary_formatter branch from d06d207 to 5e0cfd8 Jul 15, 2016

@@ -137,3 +143,199 @@ public sealed class SerializationInfoEnumerator : IEnumerator
public void Reset() { throw null; }
}
}
namespace System.Runtime.Serialization
{
[System.CLSCompliantAttribute(false)]

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

Are we not scrubbing these out?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

CLSCompliant? No. If something isn't compliant and isn't attributed to say so, it'll produce a build error.

@@ -1,5 +1,8 @@
{
"dependencies": {
"System.Collections.NonGeneric": "4.0.0",

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

(I don't actually mean this file, but it's hard to talk about files not in the change)

Have we already bumped the contract and framework version for this package?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

I believe @ericstj was telling me that this package is one of the few that we didn't actually ship and thus we don't need to rev the version number. Eric, is that correct, or did I misunderstand?

<value>Parameters 'members' and 'data' must have the same length.</value>
</data>
<data name="ArgumentNull_NullMember" xml:space="preserve">
<value>Parameters 'members' and 'data' must have the same length.</value>

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

NullMember has the same message as DataLengthDifferent?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

Good eyes. Likely a copy-and-paste error.

This comment has been minimized.

@justinvp

justinvp Jul 15, 2016

Collaborator

Or, just delete this resource string altogether and update the current uses...

Instead of:

throw new ArgumentNullException(nameof(members), SR.Format(SR.ArgumentNull_NullMember, i));

Something like:

throw new ArgumentNullException(nameof(members) + "[" + i + "]");

This comment has been minimized.

@stephentoub
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static void ThrowValueNullException()

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

Framework didn't throw from a helper. Do we care that the callstack is different?

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

(I would normally assume "no", but the NoInlining on the method has me thinking I'm missing something)

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

Do we care that the callstack is different?

I don't think so. We've done this elsewhere in corefx.

the NoInlining on the method has me thinking I'm missing something

That was me just being overly cautious about perf. The JIT currently will inline this throw method but then likely won't inline the caller, whereas with this it's more likely the caller will be inlined. Not a big deal though either way.

byte[] data = new byte[rand.Next(100, 200)];
rand.NextBytes(data);
yield return new object[] { data };
}

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

} [](start = 12, length = 1)

suggestion: also take the serialized bytes for each of the valid test objects at the top of this file, and fuzz a random number of bytes in the object, then deserialize.
when fuzzing, often the approach of munging valid data is as or more productive than simply passing random data.

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

for example, if there's normally a header indicating what follows is binaryformatter data, then this random fuzzing would achieve nothing except fuzzing the code reading the header.


In reply to: 70998937 [](ancestors = 70998937)

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

it would be interesting to set a BP on every "throw" in deserializer and run the fuzzer test - with the change above, in theory it should hit every one.


In reply to: 70999069 [](ancestors = 70999069,70998937)

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

also take the serialized bytes for each of the valid test objects at the top of this file, and fuzz a random number of bytes in the object, then deserialize

I'll try it out. I've not looked to see if there's any checksum built into the data format; if there isn't, it's likely that some changes would not result in exceptions getting thrown, e.g. modifying one of the characters in a string.

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

Good point. That kind of fuzzing then could be a separate test, the test would verify that either nothing happened, or there was a SerializationException. If there was eg a NullReferenceException, the test would fail.


In reply to: 71001854 [](ancestors = 71001854)

This comment has been minimized.

@stephentoub
if (!s_memberInfoTable.TryGetValue(mh, out members))
{
members = GetSerializableFields(type);
s_memberInfoTable.TryAdd(mh, members);

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

Why not use GetOrAdd? Since the parallel evaluators should come up with the same answer it shouldn't really matter, but I was confused as to why you weren't checking the result of TryAdd and then got differently confused.

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

Why not use GetOrAdd?

I should be able to. I was fixing things up line-by-line and didn't see the forest for the trees.

This comment has been minimized.

@stephentoub
throw new ArgumentNullException(nameof(type));
}

return s_getUninitializedObjectDelegate(type);

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

getUninitializedObjectDelegate used a lot of ?. evaluation. Is the potential nullref here "okay"? And if it's "totally not going to ever be null" why does the delegate use ?.?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

I'll remove the ?s. This is temporary anyway.

This comment has been minimized.

@stephentoub
@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Jul 15, 2016

how much should we be reviewing the binaryformatter itself - how much is essentially the desktoip code verbatim?


//If the id that we need to place is greater than our current length, and less
//than the maximum allowable size of the array. We need to double the size
//of the array. If the array has already reached it's maximum allowable size,

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

If the array has already reached it's maximum allowable size, [](start = 28, length = 62)

is there a test for this?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

is there a test for this?

You mean if we try to serialize two billion objects? No.

@stephentoub

This comment has been minimized.

Member

stephentoub commented Jul 15, 2016

how much should we be reviewing the binaryformatter itself - how much is essentially the desktoip code verbatim?

From a logic perspective, it's the desktop code. I had to supply replacements where there were dependencies that aren't exposed, remove code that wasn't relevant, plus a lot of style stuff, but for the most part it's the same code and doesn't need to be reviewed in depth... honestly there are parts I'm not entirely sure what they're doing ;)

[Serializable]
internal sealed class IntSizedArray : ICloneable
{
internal int[] objects = new int[16];

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

objects [](start = 23, length = 7)

leading underscores?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

Yup, somehow missed these two.

This comment has been minimized.

@stephentoub
[Serializable]
public class Version2ClassWithOptionalField
{
[OptionalField(VersionAdded = 2)]

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

OptionalField [](start = 9, length = 13)

minor, could have a test that VersionAdded < 1 gets ArgumentException

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

Will add.

This comment has been minimized.

@stephentoub
private BinaryObjectWithMap _bowm;
private BinaryObjectWithMapTyped _bowmt;

internal BinaryObjectString objectString;

This comment has been minimized.

@danmosemsft

danmosemsft Jul 15, 2016

Member

objectString [](start = 36, length = 12)

more _'s ? more importantly -- is it worth making some of these internal fields into properties - some could be getters-only perhaps. no need to do it right now, though

This comment has been minimized.

@stephentoub
@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Jul 15, 2016

Thank you for doing this, great stuff.

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Jul 15, 2016

:shipit:

internal FormatterTypeStyle _FEtypeFormat;
internal FormatterAssemblyStyle _FEassemblyFormat;
internal TypeFilterLevel _FEsecurityLevel;
internal InternalSerializerTypeE _FEserializerTypeEnum;

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

The casing on the names here (and in NameInfo) seems off. In Desktop they had no underscore, presenting as property names. Now they are underscore-pascal; which I don't think is any part of our style rules. Bad CodeFormatter edit?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

Will fix.

This comment has been minimized.

@stephentoub
lock (s_typeNameCache)
{
TypeInformation typeInformation;
if (!s_typeNameCache.TryGetValue(type, out typeInformation))

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

I assume that the other conversion to ConcurrentDictionary was to avoid still depending on HashTable. But I saw this one and was going to mention GetOrAdd again, and noticed it was already a generic Dictionary.

And since I had the thought I feel compelled to share it: do we want to use a ConcurrentDictionary here?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

Dictionary is generally preferable over Hashtable, but Hashtable has one critical behavior/guarantee that Dictionary does not: it's safe for readers to use the Hashtable concurrently with a single writer (it's not safe for multiple writers to be writing concurrently). Some of the uses of Hashtable in the desktop codebase were taking advantage of that, with a pattern like:

if (hashtable.ContainsKey(key)) return hashtable[key];
lock (syncobj)
{
    if (hashtable.ContainsKey(key)) return hashtable[key];
    hashtable.Add(key, ProduceValue(key));
}

I converted those to use ConcurrentDictionary, since the original developer apparently felt it was important to make the reads lock-free. In other cases, like this one, all accesses were protected by the lock, so I just made those be Dictionary, as I didn't have any perf data to suggest that a ConcurrentDictionary would be beneficial. If we find it's a bottleneck, we can always replace it later.

if (!BitConverter.IsLittleEndian)
{
// we know that we are writing a primitive type, so just do a simple swap
for (int i = 0; i < bufferUsed; i += typeLength)

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

How much, if any, faith do we have in this block? Has desktop ever had a BE target? Do we currently on core?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

How much, if any, faith do we have in this block?

Probably not much. In desktop this is under an ifdef. I could delete the block entirely and just assert IsLittleEndian? Or leave the code but add a Debug.Fail to the block to alert someone to re-review it?

This comment has been minimized.

@jkotas

jkotas Jul 15, 2016

Member

Mono runs on big endian systems. If we want to be prepared to use this implementation with mono in future, we may want to be at least leaving bread-crumbs on what needs to be fixed.

This comment has been minimized.

@bartonjs

bartonjs Jul 15, 2016

Member

Yeah, my comment is more like "I feel like this is a never-hit path; and am wondering what we need to do to remember to validate it if we move to a BE system". An assert certainly would call it out.

This comment has been minimized.

@stephentoub
private static readonly Func<Type, object> s_getUninitializedObjectDelegate = (Func<Type, object>)
typeof(string).GetTypeInfo().Assembly.GetType("System.Runtime.Serialization.FormatterServices")
?.GetMethod("GetUninitializedObject", BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static)
?.CreateDelegate(typeof(Func<Type, object>));

This comment has been minimized.

@jkotas

jkotas Jul 15, 2016

Member

I assume that we switch this to call System.Runtime.CompilerServices.RuntimeHelpers.GetUninitializedObject to fix the reflection TODO. Do we want to be calling this method via reflection as well for now?

This comment has been minimized.

@stephentoub

stephentoub Jul 15, 2016

Member

I assume that we switch this to call System.Runtime.CompilerServices.RuntimeHelpers.GetUninitializedObject to fix the reflection TODO.

Yes, that's what I had in mind. Or, rather, I'd planned to go add a SerializationAugments ala the EnvironmentAugments we added, but in the process of doing so noticed you already added the RuntimeHelpers method. So we can just use that. I've postponed it as we'll need to change how this library builds to reference the platform components, at which point it should be a very easy change.

Do we want to be calling this method via reflection as well for now?

You mean instead of creating the delegate? Seemed like the delegate approach would provide for faster execution until we can remove the reflection. Is that not the case? I don't have a strong preference, though. This reflection stuff shouldn't be here very long.

This comment has been minimized.

@jkotas

jkotas Jul 15, 2016

Member

I meant to create the delegate on the RuntimeAugments method. It is minor nit - since this is temporary anyway.

<value>Object has never been assigned an objectID.</value>
</data>
<data name="ArgumentNull_Obj" xml:space="preserve">
<value>Object cannot be null.</value>

This comment has been minimized.

@justinvp

justinvp Jul 15, 2016

Collaborator

Nit: This resource string should probably be deleted. It doesn't really add enough value to justify itself (IMO), based on the few places it's used as a custom message to ArgumentNullException.

This comment has been minimized.

@stephentoub
<value>Header reflection error: number of value members: {0}.</value>
</data>
<data name="ArgumentNull_WithParamName" xml:space="preserve">
<value>Parameter '{0}' cannot be null.</value>

This comment has been minimized.

@justinvp

justinvp Jul 15, 2016

Collaborator

Nit: Same with this resource string. It should probably deleted. ArgumentNullException already includes the param name in the default message.

This comment has been minimized.

@stephentoub
<value>The input stream is not a valid binary format. The starting contents (in bytes) are: {0} ...</value>
</data>
<data name="ArgumentNull_Stream" xml:space="preserve">
<value>Stream cannot be null.</value>

This comment has been minimized.

@justinvp

justinvp Jul 15, 2016

Collaborator

Nit: This resource string could also probably be deleted. The one place it is used the param is named "stream" so this doesn't really add much more value to the default ArgumentNullException message, which includes the param name.

This comment has been minimized.

@stephentoub
<value>Type is missing for member of type Object '{0}'.</value>
</data>
<data name="ArgumentNull_Graph" xml:space="preserve">
<value>Object Graph cannot be null.</value>

This comment has been minimized.

@justinvp

justinvp Jul 15, 2016

Collaborator

Nit: This one can also probably be removed. The one place it is used the param is already named "graph" so it should be clear what the issue is with just the default ArgumentNullException message.

This comment has been minimized.

@stephentoub

stephentoub added some commits Jul 12, 2016

Add BinaryFormatter to corefx
This ports BinaryFormatter from desktop, along with all of the supporting types and general Formatter-related types.  I've added a bunch of tests, bringing code coverage up to ~80%, though there's still more that can/should be done.  I did not port remoting-specific functionality, though there are still some remnants I've left as they were enabled in the runtime even when remoting wasn't compiled in, and it wasn't clear if they were serving additional purposes.

@stephentoub stephentoub force-pushed the stephentoub:binary_formatter branch from 5e0cfd8 to 7896300 Jul 17, 2016

@stephentoub

This comment has been minimized.

Member

stephentoub commented Jul 17, 2016

Thanks for all the feedback. I believe I've addressed everything.

@jkotas

This comment has been minimized.

Member

jkotas commented Jul 17, 2016

👍

{
try
{
Assembly.Load(new AssemblyName(assemblyName));

This comment has been minimized.

@bartonjs

bartonjs Jul 18, 2016

Member

Are you missing an assignment to assm here?

This comment has been minimized.

@stephentoub
{
Assembly.Load(new AssemblyName(assemblyName));
}
catch { }

This comment has been minimized.

@bartonjs

bartonjs Jul 18, 2016

Member

The Framework code was catch(Exception) (and log+suppress). While I don't really expect the assembly loader to do something akin to "throw 5;" there is a semantic difference between catch {} and catch (Exception) {}.

I'm not sure what our normal rule is regarding untyped catch, but I wanted to merely mention it a) to see if I could learn a new rule and b) to ensure that this was a mindful change.

This comment has been minimized.

@jkotas

jkotas Jul 18, 2016

Member

there is a semantic difference between catch {} and catch (Exception) {}

C# compiler emits
[assembly:RuntimeCompatibilityAttribute(WrapNonExceptionThrows = true)]; attribute by default. This attribute causes all exceptions that do not inherit from Exception to be wrapped by RuntimeWrappedException. So there is no semantic difference between catch{} and catch(Exception) {}, unless you override the default for WrapNonExceptionThrows.

This comment has been minimized.

@bartonjs

bartonjs Jul 18, 2016

Member

So there is no semantic difference between catch{} and catch(Exception) {}, unless you override the default for WrapNonExceptionThrows.

Clearly a thing I didn't know until now. Thanks, @jkotas.

_previousName = name;
_previousType = objectType;
}
//Console.WriteLine("name "+name+" assembly "+assemblyInfo.assemblyString+" objectType "+objectType);

This comment has been minimized.

@bartonjs

bartonjs Jul 18, 2016

Member

It seems like you caught some of the old commented out debugging code, but not this line.

This comment has been minimized.

@stephentoub
BinaryHeaderEnum.ObjectWithMapTyped;
}


This comment has been minimized.

@bartonjs

bartonjs Jul 18, 2016

Member

Nit: excess blank lines

This comment has been minimized.

@stephentoub
@stephentoub

This comment has been minimized.

Member

stephentoub commented Jul 19, 2016

Replaced by #10144

@jarroda jarroda referenced this pull request Jul 21, 2016

Open

Port to .NET Core #1

0 of 4 tasks complete

@stephentoub stephentoub deleted the stephentoub:binary_formatter branch Jul 21, 2016

@karelz karelz modified the milestone: 1.1.0 Dec 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment