Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Serialization issue on Spark #310

Closed
v2355 opened this issue Oct 28, 2019 · 7 comments
Closed

[BUG]: Serialization issue on Spark #310

v2355 opened this issue Oct 28, 2019 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@v2355
Copy link

v2355 commented Oct 28, 2019

I am trying to run spark.Net application on cluster. The application loads a file from hdfs location and runs some customer code on one of the columns of the input file. But I am getting below serialization issue on Spark (both local and on cluster). When same logic is ran on standalone C# application it works but when ran using Spark it is throwing below exception. I tried making class InjectionDataBuilderLib.EntityBondUtility serializable by adding [Serializable] directive but still I am getting the same issue. Can you please look into this issue?

Submit command: spark-submit --conf spark.yarn.appMasterEnv.DOTNET_WORKER_DIR=.\worker\Microsoft.Spark.Worker-0.5.0 --conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=.\udfs\Debug  --archives hdfs://CO4-4/user/vaswamy/EntityDocumentSample/Microsoft.Spark.Worker.net461.win-x64-0.5.0.zip#worker,hdfs://CO4-4/user/vaswamy/EntityDocumentSample/Debug.zip#udfs --master yarn --deploy-mode cluster --queue default --class org.apache.spark.deploy.dotnet.DotnetRunner hdfs://CO4-4/user/vaswamy/EntityDocumentSample/microsoft-spark-2.3.x-0.5.0.jar hdfs://CO4-4/user/vaswamy/EntityDocumentSample/Debug.zip TesterApp.exe

Unhandled Exception: System.Runtime.Serialization.SerializationException: Type 'System.Threading.ThreadLocal`1[[System.Collections.Generic.Dictionary`2[[System.Tuple`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[InjectionDataBuilderLib.EntityBondUtility, InjectionDataBuilderLib, Version=7.7.0.0, Culture=neutral, PublicKeyToken=null]], mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]' in Assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' is not marked as serializable.
   at System.Runtime.Serialization.FormatterServices.InternalGetSerializableMembers(RuntimeType type)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at System.Runtime.Serialization.FormatterServices.GetSerializableMembers(Type type, StreamingContext context)
   at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.InitMemberInfo()
   at System.Runtime.Serialization.Formatters.Binary.WriteObjectInfo.InitSerialize(Object obj, ISurrogateSelector surrogateSelector, StreamingContext context, SerObjectInfoInit serObjectInfoInit, IFormatterConverter converter, ObjectWriter objectWriter, SerializationBinder binder)
   at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Write(WriteObjectInfo objectInfo, NameInfo memberNameInfo, NameInfo typeNameInfo)
   at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Serialize(Object graph, Header[] inHeaders, __BinaryWriter serWriter, Boolean fCheck)
   at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(Stream serializationStream, Object graph, Header[] headers, Boolean fCheck)
   at Microsoft.Spark.Utils.CommandSerDe.Serialize(Delegate func, SerializedMode deserializerMode, SerializedMode serializerMode)
   at Microsoft.Spark.Sql.UdfRegistration.Register[TResult](String name, Delegate func, PythonEvalType evalType)
   at Microsoft.Spark.Sql.UdfRegistration.Register[TResult](String name, Delegate func)
   at TesterApp.Program.Main(String[] args) in D:\src\EntityExperience\private\knowledge\SuperFreshCDBSpark\TesterApp\Program.cs:line 33

Desktop (please complete the following information):

  • OS: Windows
  • Browser [e.g. chrome, safari]
@v2355 v2355 added the bug Something isn't working label Oct 28, 2019
@imback82 imback82 added question Further information is requested and removed bug Something isn't working labels Oct 28, 2019
@imback82
Copy link
Contributor

imback82 commented Oct 28, 2019

Looks like the type you are serializing is not serializable by .NET. One workaround is to have the object as static if applicable. Also, sharing repro code will also help.

@v2355
Copy link
Author

v2355 commented Oct 28, 2019

Hi,
ProcessJsonContentToEntityContainerWrapper function is entry point for my code from below file.

Attaching the debug project I am using to invoke the above function, in the attaching file TestApp is the spark project

will sync up offline for codebase

@imback82
Copy link
Contributor

I don't see any file. Can you paste the "minimum" code to repro this behavior?

@v2355
Copy link
Author

v2355 commented Oct 28, 2019

shared with you over email.

@suhsteve
Copy link
Member

suhsteve commented Nov 5, 2019

@v2355 Was your issue resolved?

@imback82
Copy link
Contributor

imback82 commented Nov 5, 2019

Moving to static solved the serialization issue. Closing.

@imback82 imback82 closed this as completed Nov 5, 2019
@imback82
Copy link
Contributor

imback82 commented Nov 5, 2019

This will be captured in #147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants