Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: UDF Serialization bug in .Net interactive #619

Open
Niharikadutta opened this issue Aug 11, 2020 · 1 comment
Open

[BUG]: UDF Serialization bug in .Net interactive #619

Niharikadutta opened this issue Aug 11, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@Niharikadutta
Copy link
Collaborator

Describe the bug
UDF serialization in .NET interactive does not allow defining custom classes objects used in UDFs in different cells as compared to where the UDF is defined. This fails with the following error:

System.Runtime.Serialization.SerializationException: Type 'Submission#7' in Assembly 'ℛ*af5b28d4-e906-43c5-ae89-767507bdda8a#1-7, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' is not marked as serializable.

This is because during UDF serialization, it picks up the submission cell(Submission#7) as the target, and expects it to be marked as serializable, which it is not since it is a compiler-generated class.

To Reproduce

Steps to reproduce the behavior:

  1. Start .NET interactive session (through Jupyter lab for example)
  2. Import the Microsoft.Spark and Microsoft.Spark.Extensions.DotNet.Interactive nuget packages
  3. Run DotnetRunner in debug mode in a terminal
  4. Declare a custom class and instantiate its object in a cell, mark it as serializable
  5. Define a UDF in a separate cell, calling the previously defined custom object, and run that cell to see the error
@Niharikadutta Niharikadutta added the bug Something isn't working label Aug 11, 2020
@truveug123
Copy link

Hi,

I encountered a similar issue when trying to use a broadcast variable with a custom object. While running the following, I got this exception:

System.Runtime.Serialization.SerializationException: 'Type 'MyTestClass' in Assembly 'MyLibraryName, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' is not marked as serializable.'

Here is the excerpt of the code that I tried to run.

SparkContext sc = SparkContext.GetOrCreate(new SparkConf());
Broadcast<MyTestClass> test = sc.Broadcast(new MyTestClass("test string"); // line where exception is thrown
Func<Column, Column> udf = Udf<string, string>(
  str =>
  {
    return test.Value().GetStringVal();
  }
)
DataFrame udfResult = dataframe.Select(udf(dataframe["myColumn"]));
udfResult.Show();

public class MyTestClass
    {
        string myString;
        public MyTestClass(string input)
        {
            myString = input;
        }

        public string GetStringVal()
        {
            return myString;
        }
    }
}

I have not had issue with instantiating a class inside a UDF, even though the class is defined outside of the UDF. I don't quite understand why I would have different results here when I am trying to broadcast a custom class into the spark executors running the UDF.

I saw that one of suggestions in this issue is to make a class be static, but that workaround doesn't apply to me here as I need to instantiate the object with private variables. #310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants