Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 3 - "Saving to HDFS and Executing...." location 6:20mins #27

Closed
robbie70 opened this issue Feb 9, 2018 · 2 comments
Closed

Comments

@robbie70
Copy link

robbie70 commented Feb 9, 2018

Hi Ahmad, I am not sure if this is the right place to raise issues. I have been following your Lambda Spark course on Pluralsight and I am stuck at the point shown in the title. When I try to execute the statement mentioned,

cd /pluralsight/spark/
./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar

I get an Exception and my start-up fails.

18/02/09 09:16:44 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.fs.Path.<init>(Path.java:93) at org.apache.hadoop.fs.Globber.glob(Globber.java:211) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:912) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.foreach(RDD.scala:910) at batch.BatchJob$.main(BatchJob.scala:27) at batch.BatchJob.main(BatchJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558) Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 2: c: at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.failExpecting(URI.java:2854) at java.net.URI$Parser.parse(URI.java:3057) at java.net.URI.<init>(URI.java:746) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 31 more

I have spent quite a bit of time trying to get to the bottom of it but so far no luck. I have managed to pull the attached logs from the Hadoop webservice running in my VM url here,

http://lambda-pluralsight:8042/node/containerlogs/container_1518176371217_0003_01_000001/vagrant/stderr/?start=0

Logs for container_1518176371217_0003_01_000001.html.pdf

Also I've tried to start the application up in Debug Mode (at say port 5005 or 7777 as I found in some online examples) - but when I try starting my Intellij up in Remote Debug Mode I get a Connection Refused error message.

Any help or pointers would be much appreciated. My email is,
robbie70@hotmail.com
Kind Regards,
Robert.

@robbie70
Copy link
Author

robbie70 commented Feb 16, 2018

Hi Ahmad,
I put this tutorial to one side for a few days because I was stuck (and was maybe hoping to hear from someone on this site ! ;-) ) but today I have come back to it again hoping with "fresh eyes" and indeed it helped - I was able to spot my mistake immediately.
I had renamed my Scala program from the name you show in the tutorial, "BatchJob" to "BatchJobEx4" but when I tried to run it I was still using the old progamme name you show in the example ie,
./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar
instead of updating it to my new name,
./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJobEx4 /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar

When I executed it as renamed it worked perfectly :) so I will now continue with the tutorial and this issue can be closed.

@robbie70 robbie70 changed the title Problem in Chapter 3 - "Saving to HDFS and Executing...." location 6:20mins Chapter 3 - "Saving to HDFS and Executing...." location 6:20mins Feb 21, 2018
@aalkilani
Copy link
Owner

@robbie70 , thanks for the feedback and glad you were able to move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants