Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: .Net Interactive: UDF use deprecated BinaryFormatter, fail in VS Code #795

Open
Tracked by #1139
oefe opened this issue Dec 14, 2020 · 8 comments
Open
Tracked by #1139
Labels
bug Something isn't working

Comments

@oefe
Copy link
Contributor

oefe commented Dec 14, 2020

Describe the bug
UDFs are serialized using BinaryFormatter, which is deprecated for security reasons.

In environments where BinaryFormatter is already disabled, this causes UDFs to fail with a NotSupportedException.

To Reproduce

Steps to reproduce the behavior:

  1. Use Visual Studio Code with the .Net Interactive Notebooks extension
  2. Follow the steps in UDFs in .NET Interactive
  3. Executing cell [Doc] Tpch Benchmark Instructions #3 (which contains the definition of the UDF) fails with:
Error: System.NotSupportedException: BinaryFormatter serialization and deserialization are disabled within this application. See https://aka.ms/binaryformatter for more information.
at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(Stream serializationStream, Object graph)
at Microsoft.Spark.Utils.CommandSerDe.Serialize(Delegate func, SerializedMode deserializerMode, SerializedMode serializerMode)
at Microsoft.Spark.Sql.Functions.CreateUdf(String name, Delegate execute, PythonEvalType evalType, String returnType)
at Microsoft.Spark.Sql.Functions.CreateUdf[TResult](String name, Delegate execute, PythonEvalType evalType)
at Microsoft.Spark.Sql.Functions.CreateUdf[TResult](String name, Delegate execute)
at Microsoft.Spark.Sql.Functions.Udf[T,TResult](Func`2 udf)
at Submission#11.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

Expected behavior

Screenshots
NA

Desktop (please complete the following information):

  • OS: macOS 10.15.7
  • .NET 5 (5.0.100)
  • Apache Spark 3.0.1
  • .NET for Apache Spark v1.0.0 (netcoreapp3.1, macOS)
  • VS Code:
  • Version: 1.53.0-insider
    • Commit: 96b426ef1da0f0a51ce9ef1931cd1fd2322547e4
    • Date: 2020-12-14T05:49:45.285Z
    • Electron: 11.0.3
    • Chrome: 87.0.4280.67
    • Node.js: 12.18.3
    • V8: 8.7.220.25-electron.0
    • OS: Darwin x64 19.6.0
  • .Net Interactive Notebooks v1.0.160901

Additional context
NA

@oefe oefe added the bug Something isn't working label Dec 14, 2020
@oefe
Copy link
Contributor Author

oefe commented Dec 14, 2020

The same example also fails in Jupyter, but for a different reason: #796

@imback82
Copy link
Contributor

Thanks @oefe for reporting this. We will move away from BinaryFormatter from upcoming releases.

@imback82
Copy link
Contributor

imback82 commented Feb 4, 2021

cc @suhsteve @jonsequitur (FYI)

@Frassle
Copy link

Frassle commented Mar 4, 2021

Not sure what the dotnet plan for replacing BinaryFormatter here is, but I was wanting to use net spark in F# notebooks (where UDFs have never been supported), plus I had a couple of other similar problems that all looked solvable if dotnet had something like pythons cloudpickle. So I wrote one, https://github.com/Ibasa/Pikala. Forked spark to use it and can now run F# UDFs from notebooks.

@GeorgeS2019
Copy link

We will move away from BinaryFormatter from upcoming releases.

@imback82
Is there a solution to this, now is 2023?

@dbeavon
Copy link

dbeavon commented Apr 16, 2024

@GeorgeS2019 @imback82

Interestingly if you target .net standard we don't see the compiler errors:
netstandard2.0;netstandard2.1
https://learn.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-2-1#select-net-standard-version

I think there would still be runtime errors in a project built for .net 8.

The bigger issues is that binaryformatter is bad practice from a security standpoint (and moreover it won't allow AOT compilation of the worker in .net 8!)

UPDATE 4/16:

As I mentioned, you can get past compiler errors simply by building a project in a way that targets .net standard. This will still be compatible with .net 8. See a discussion about compatibility here:
dotnet/runtime#90829

And you can also get past any runtime errors on .Net 8 simply by using these settings in the project file:

    <NoWarn>$(NoWarn);SYSLIB0011</NoWarn>
    <EnableUnsafeBinaryFormatterSerialization>true</EnableUnsafeBinaryFormatterSerialization>

... they can be introduced in .net 8 project files - of our drivers, and of the Microsoft.Spark.Worker as well.

I understand that the binary deserialization is a security threat ... but it seems a bit overblown. It is mitigated when the spark cluster is hosted entirely on a private network and the deserialization is only happening between drivers and executors. In any case, these deserialization-related concerns must be weighed against the needs of spark. There are few other environments that do as much serialization/deserialization as spark does!

I see that the "kryo" serialization also has had well-known vulnerabilities that never stopped it from being used in the Spark ecosystem:

Links regarding kryo:
https://stackoverflow.com/questions/40261987/when-to-use-kryo-serialization-in-spark
https://www.contrastsecurity.com/security-influencers/serialization-must-die-act-1-kryo-serialization

... I think the ultimate answer is to respect the "vulnerabilities" of binary serialization, and to disable that type of serialization by default. In practice users should be allowed to re-enable the functionality for the sake of convenience and performance. It is a measured risk, that should be left to the application developers.

@GeorgeS2019
Copy link

@dbeavon

From ChatGPT4 =>

The BinaryFormatter in .NET has indeed been marked as obsolete due to security concerns. Here are the key points:

  1. Obsolete Methods:

    • The following methods are now obsolete and produce a compile-time warning with ID SYSLIB0011:
      • BinaryFormatter.Serialize
      • BinaryFormatter.Deserialize
      • Formatter.Serialize (Stream, Object)
      • Formatter.Deserialize (Stream)
      • IFormatter.Serialize (Stream, Object)
      • IFormatter.Deserialize (Stream)
    • In .NET 7, these methods were marked as error.
  2. Recommended Action:

    • Stop using BinaryFormatter in your code.
    • Instead, consider using JsonSerializer or XmlSerializer.
    • If you temporarily need to suppress the BinaryFormatter compile-time warning, you can use #pragma directives around the individual call site or suppress it in the project file¹.
  3. Project Types Affected:

    • In .NET 7, the methods still functioned properly in most project types (excluding ASP.NET, WASM, and MAUI) even after being marked obsolete⁴.
    • However, it's strongly recommended not to continue using BinaryFormatter due to security risks¹.

@dbeavon
Copy link

dbeavon commented Apr 17, 2024

ChatGPT seems pretty wrong on some points. The thing I wanted to point out is that BinaryFormatter seems to be allowed in .Net 8... if nothing else that is for the sake of compatibility with .NetStandard library dependencies. The approach to getting this working is very similar to .Net 7 (use EnableUnsafeBinaryFormatterSerialization, and disable SYSLIB0011 warnings).

The fact that we are dealing with a JVM-based (linux-based) platform means we can give ourselves more flexibility than a typical .Net solution. I would agree that we need to avoid BinaryFormatter by default, but it should probably allowed based on the discretion of users. In any case, I believe that the other serialization libraries which Spark/Jvm uses would also have the security concerns that apply to BinaryFormatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants