Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Getting Spark to connect to S3 can require a bit of trial and error - it would be good if we had the process documented.
This recipe works for me at the moment on my local machine:
I start adam-shell like:
reading from s3a appears to work
Attempts that didn't work for me:
For the record, I don't need to provide any extra jars when using Elastic Map Reduce (EMR) version emr-5.6.0 on AWS, so Spark 2.1.1 on Hadoop 2.7.3 YARN with Ganglia 3.7.2 and Zeppelin 0.7.1, then build ADAM from source with Hadoop 2.7.3 dependency version, and then