Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST]: .Net 6.0/7.0 Support #1149

Open
Vislesha opened this issue Apr 26, 2023 · 24 comments
Open

[FEATURE REQUEST]: .Net 6.0/7.0 Support #1149

Vislesha opened this issue Apr 26, 2023 · 24 comments
Labels
enhancement New feature or request

Comments

@Vislesha
Copy link

Vislesha commented Apr 26, 2023

Hi Team (@imback82 , @Niharikadutta , @dbeavon, @suhsteve, @AFFogarty, @bamurtaugh),

Is there ever going to be a new version of this library with .Net 6.0/7.0 support? There's been no updates or a new version from long time and many PR's are still pending. Could someone please provide a guidance on the timeline or the future of this project please?

Thanks

@Vislesha Vislesha added the enhancement New feature or request label Apr 26, 2023
@AFFogarty
Copy link
Contributor

Hi @Vislesha, the main branch is already on .NET 6.0. However, it would not be safe to do an official release until #1131 is fixed.

@relcodedev
Copy link

I have used the current version 2.1.1 writing delta format from dotnet 7.0

used sdkman to install java and spark. dotnet dotnetapp was compile to native

ubuntu 22.04
jdk 8
spark 3.2.1
dotnet 7.0
Microsoft.Spark.Worker-2.1.1

run spark submit

spark-submit --packages io.delta:delta-core_2.12:2.0.2 --class org.apache.spark.deploy.dotnet.DotnetRunner --master local ./bin/Release/dotnetapp/microsoft-spark-3-2_2.12-2.1.1.jar ./bin/Release/dotnetapp/dotnetapp

There is still the bug with the udfs.

@GeorgeS2019
Copy link

udfs is not working in polyglot note book due to #1131

@GeorgeS2019
Copy link

GeorgeS2019 commented Apr 27, 2023

@AFFogarty

Could you provide a working solution to make UDFs work in polyglot by working with the polyglot team => @claudiaregio

Basically, it is the directory path problem associated with polyglot,

@Vislesha
Copy link
Author

Hi @AFFogarty & @GeorgeS2019, thank you for the quick reply!

We are heavily reliant on this library for our solution which is ready for production now. Rest of our application is on .Net 6.0 and would like this library to be upgraded as well. We are currently using the main branch and it's all working fine on .Net 6.0, as we are not using either UDFs or polyglot notebook. However, as we are going for production, would like an official version and it appears #1131 is a security vulnerability that would fail some security checks.

Also, we are looking for a complete port of Spark along with MLLib. Would greatly appreciate if there's a new version of this library with full compatibility with latest version of Spark.

@Vislesha
Copy link
Author

Vislesha commented May 2, 2023

Hi Team (@imback82 , @Niharikadutta , @dbeavon, @suhsteve, @AFFogarty, @bamurtaugh),
Any update on the possible new release of this library?
Thanks.

@bmazzarol
Copy link

Hi @Vislesha, the main branch is already on .NET 6.0. However, it would not be safe to do an official release until #1131 is fixed.

First off this library is great and I want to comend all the hard work that has gone into it.

Just my two cents here but I think it would be a good idea to consider a release with Binary Serializer still in place for the following reasons,

  • My reading of the depreciation, was that it needs to be done, but projects can plan for it and start the process towards it. It's not designed in such a way as to block all releases
  • Although the Binary Serializer can not be made safe, it's because any schema-less Serializer is equally unsafe, swapping it out for another like protobuf without the formal contract in place does not solve the problem.
  • The spark connect grpc bindings provides a base for integration, minus UDFs and can also be considered in a future state.
  • Not having a release of this library on a supported version of dotnet is far more damaging than the security concerns around the Binary Serializer, and will kill comunity engagement with it, required to implement a proper fix

Hope my comments are clear. I look forward to hearing what others think.

@GeorgeS2019
Copy link

GeorgeS2019 commented May 4, 2023

Not having a release of this library on a supported version of dotnet is far more damaging than the security concerns around the Binary Serializer, and will kill comunity engagement with it, required to implement a proper fix

@bmazzarol <= well communicated..very appreciated 👏

We need to find ways fast to continue the iteration of improving this project.

@Vislesha
Copy link
Author

Hi Team:
Could someone clarify the future of this library. It's been so long the PRs are pending!

Also, is there a chance this library can be merged with SynapseML (https://github.com/microsoft/SynapseML)? It appears it is actively being developed and has a better technology to generate Spark bindings without much delay and also has so many other features integrated.

Thanks!

@Vislesha Vislesha reopened this May 12, 2023
@GeorgeS2019
Copy link

@bmazzarol

Although the Binary Serializer can not be made safe, it's because any schema-less Serializer is equally unsafe, swapping it out for another like protobuf without the formal contract in place does not solve the problem.

The spark connect grpc bindings provides a base for integration, minus UDFs and can also be considered in a future state.

Could you provide more information?

UDF is only an issue with PolyGlot notebook.
It is the question with PolyGlot team

Could you just elaborate further so others could continue to add more information and we iteratively get closer to a suitable solution?

I wonder if the block is due to legal issues than the software implementation

Why there is no incentive to address this at the Software level for the .NET community?

#AGAIN

@AFFogarty,

Leaving this not moving forwards could have UNDESIRABLE consequences for the entire Microsoft Big Data analytics offerings

@bmazzarol
Copy link

@GeorgeS2019 Will do my best!

Spark connect is a built-in set of grpc bindings included with Spark 3.4+

This provides a low level API that can be used to drive Spark in a very similar way to how this project works, infact the latest version of pyspark supports this client mode already

This solves the Serializer issue as it uses protobuf behind a defined grpc contract.

However my understanding is that a udf needs to run on the Spark workers and be one of the supported languages to work via Spark connect.

However it's not my intention to solution, I just want to argue for a roadmap to be created and an "as is" release to be considered so progress can be made incrementally.

At the very least a counter argument against an "as is" release would be good so the comunity can understand more issues that might not have been considered.

@GeorgeS2019
Copy link

GeorgeS2019 commented May 12, 2023

@bmazzarol

  • udf needs to run on the Spark workers
  • Spark workers needs to be one of the supported languages to work via Spark connect.

image

Is it feasible to make .NET one of the supported languages (e.g. python, R, Go according to the diagram)?

I am still fuzzy.

Are u familiar with ikvm?

It is .NET6, it is possible to load java code files and compile within VS2022 into .NET

If ikvm is feasible, then the question of keeping Spark.NET always up to date is no longer an issue

@GeorgeS2019
Copy link

GeorgeS2019 commented May 12, 2023

@bmazzarol

I wonder if it is potentially feasible to replace the JVM part of the diagram to ikvm.NET?

https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

image

@GeorgeS2019
Copy link

@bmazzarol
Copy link

@bmazzarol

I wonder if it is potentially feasible to replace the JVM part of the diagram to ikvm.NET?

https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals

image

My understanding is that ikvm.NET allows java programs to run on dotnet. So would not be equivalent to Py4j, which is essentially a python interpreter running on the jvm with access to the jvm memory space.

But again my main point was there are lots of ways to move forward, all take time and require planning, the bigger issue that creates is in the meantime there is no supported release of this library.

@Vislesha
Copy link
Author

Hi @bmazzarol! It appears it's going to be a long time for any new version of this library. We'll explore alternatives. Thank you for the clarification!

@GeorgeS2019
Copy link

@Vislesha

What alternative(s) are you expecting?

@GeorgeS2019
Copy link

@Vislesha
You are stopping brainstorming WHY?

@Vislesha
Copy link
Author

@GeorgeS2019, we are moving to Java based APIs for our Analytics Engine so we don't have to play a catchup with compatible libraries. It's going to be time consuming but looks like that's a better alternative.

@GeorgeS2019
Copy link

@Vislesha

You have abandoned, but not everyone YET. So, do consider leaving it open even if you are no longer interested

@Vislesha
Copy link
Author

@GeorgeS2019! Sure,

@Vislesha Vislesha reopened this May 13, 2023
@mlafleur
Copy link

This issues is certainly of interest to me. We are considering using Spark and Spark .NET but this issue raises some obvious concerns.

@dbeavon
Copy link

dbeavon commented Apr 17, 2024

I'm testing with .Net 8 on OSS and Azure HDI.

@AFFogarty It has been almost a year since you mentioned the concern related to
#1131
Did you see there is a PR?
#1166

I'm eager to help get this merged. Let me know how we can help. I will start testing it on OSS and HDI as soon as possible.
I think MessagePack is as good a solution as other possible replacements for BinaryFormatter. My opinion is that it could be used as the default serialization/deserialization strategy, but that users should be able to revert to BinaryFormatter if desired.

Can we get this merged? And after that I will have follow-up changes to migrate to .net 8. They are basically the same as your old changes to migrate to .net 6.

@GeorgeS2019
Copy link

@dbeavon

Thx for helping to keep this project forwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants