Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark-submit : The term 'spark-submit' is not recognized as the name of a cmdlet, function, script file, or operable program. #949

Closed
Crowts opened this issue Jun 8, 2021 · 3 comments

Comments

@Crowts
Copy link

Crowts commented Jun 8, 2021

Problem encountered on https://dotnet.microsoft.com/learn/data/spark-tutorial/run

Hi

I got the error below while executing the "Run your app" step in the .NET for Apache Spark Tutorial. I am running this on

Operating System: windows 10
VSCode : 1.52.1
Apache Spark: spark-3.0.1-bin-hadoop3.2

Code snippet:

PS C:\Users\Theo\Documents\mySparkApp> spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin\Debug\netcoreapp3.1\microsoft-spark-3-0_2.12-1.0.0.jar dotnet bin\Debug\netcoreapp3.1\mySparkApp.dll

spark-submit : The term 'spark-submit' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --ma ...
+ ~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (spark-submit:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

I've set up all the environment variables correctly as directed (i.e. I included the additional env variable DOTNET_ASSEMBLY_SEARCH_PATHS mentioned at 6:13-7:14 of the video).

So I'm really not sure what's causing the error now. I've seen issue#532, issue#276 and issue#268 but still no joy.

Could it be that I'm running spark-3.0.1-bin-hadoop3.2 instead of spark-3.0.1-bin-hadoop2.7 as in the tutorial?

@Crowts
Copy link
Author

Crowts commented Jun 8, 2021

Some additional information on the above error.

I looked at issue#571 and issue#302 and did the following:

  1. Verified that the jar exists (Indeed it does exist -> see attached screenshot)
  2. Checked the NuGet version installed (i.e. I installed Microsoft.Spark 1.0.0)

I've also tried running the app by prepending %SPARK_HOME%\bin\ before the given command as below:

C:\Windows\System32\mySparkApp>%SPARK_HOME%\bin\spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin\Debug\netcoreapp3.1\microsoft-spark-3-0_2.12-1.0.0.jar dotnet bin\Debug\netcoreapp3.1\mySparkApp.dll
21/06/08 14:36:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/06/08 14:36:35 WARN DependencyUtils: Local jar C:\Windows\System32\mySparkApp\bin\Debug\netcoreapp3.1\microsoft-spark-3-0_2.12-1.0.0.jar does not exist, skipping.
Error: Failed to load class org.apache.spark.deploy.dotnet.DotnetRunner.
21/06/08 14:36:35 INFO ShutdownHookManager: Shutdown hook called
21/06/08 14:36:35 INFO ShutdownHookManager: Deleting directory C:\Users\A241124\AppData\Local\Temp\spark-eb737269-81b9-49aa-aede-1807d48d843e

but the error still persists.

Screenshot (74)

@suhsteve
Copy link
Member

suhsteve commented Jun 8, 2021

Can you try to avoid creating project folders in C:\Windows\System32 ? Maybe somewhere safer like your %USERPROFILE% folder or your D:\ drive ?

@suhsteve
Copy link
Member

I'm closing this issue. Please feel free to reopen if your issue hasn't been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants