Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SparkGCE Script for Version 0.9.1 #681

Closed
wants to merge 2 commits into from

Conversation

sigmoidanalytics
Copy link
Contributor

I have added the SparkGCE script in this pull request. Just like the spark_ec2 script, this one also reads certain command-line arguments (See the README.md for more details) like the cluster name and all, then starts the machines in the google cloud, sets up the network, adds a 500GB empty disk to all machines, generate the ssh keys on master and transfer it to all slaves and install java and downloads and configures Spark-v0.9.1/Shark-v0.9.1/Hadoop-v0.9.1. Also it starts the shark server automatically. Currently the version is 0.9.1 but I'm happy to add/support more versions

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@sigmoidanalytics
Copy link
Contributor Author

Did any of the admin had chance to check it out? Let me know if you want me to modify anything in it?

@mateiz
Copy link
Contributor

mateiz commented Jul 29, 2014

@sigmoidanalytics sorry for the late reply on this, it looks like this got created during the 1.0 release period and reviewers missed it. Hopefully we can get this in 1.2. The main question I have is whether it's possible to share more code between this and the EC2 scripts. Is there any Python library that abstracts over the cloud provider?

@sigmoidanalytics
Copy link
Contributor Author

Mostly https://libcloud.apache.org/ should work to abstract away some
launch code. Can look more deeply to check stability. Has any body used it
in some other project yet?

Whirr does some cloud level abstraction but would probably not apply here
as we are mostly deploying Spark only using this script.

Regards
Mayur

On Mon, Jul 28, 2014 at 8:00 PM, Matei Zaharia notifications@github.com
wrote:

@sigmoidanalytics https://github.com/sigmoidanalytics sorry for the
late reply on this, it looks like this got created during the 1.0 release
period and reviewers missed it. Hopefully we can get this in 1.2. The main
question I have is whether it's possible to share more code between this
and the EC2 scripts. Is there any Python library that abstracts over the
cloud provider?


Reply to this email directly or view it on GitHub
#681 (comment).

Regards,
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi http://www.twitter.com/mayur_rustagi

@mateiz
Copy link
Contributor

mateiz commented Jul 29, 2014

Libcloud looks good actually, and it's nice that it's another Apache project. Would be worth a try if you guys want to investigate it. It would be awesome if we also get OpenStack as a result of it.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@ash211
Copy link
Contributor

ash211 commented Nov 14, 2014

@sigmoidanalytics Did you ever take a look at libcloud to see if we could abstract out some commonality with the spark_ec2 scripts?

@ash211
Copy link
Contributor

ash211 commented Nov 14, 2014

Oh and also, all GitHub PRs should have a Jira SPARK-XYZ title in the title. I created one for you -- can you please update this PR name to be "SPARK-4400 Added SparkGCE Script"?

@ash211
Copy link
Contributor

ash211 commented Nov 14, 2014

Correction, please use SPARK-1422

@sigmoidanalytics
Copy link
Contributor Author

LibCloud looks decent for launching machine across datacenters providers,
but to modify build scripts to either do installation from scratch (means
longer install times than spark ec2) or keep slightly baked
AMI's/equivalents in each datacenter of each provider. Both are sizable
amount of effort.
If the purpose of script is for development support then we can choose some
locations & start there before going all out. Another possibility is to
keep this all-datacenter script separate with longer launch times on vanila
AMI for those who want to use other datacenters apart from those supported
by spark-ec2. That way general launch times dont increase & still a
solution for those who dont want to use AWS.

On Fri, Nov 14, 2014 at 4:39 PM, Andrew Ash notifications@github.com
wrote:

Correction, please use SPARK-1422


Reply to this email directly or view it on GitHub
#681 (comment).

Regards,
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi http://www.twitter.com/mayur_rustagi

@pwendell
Copy link
Contributor

This is being maintained in it's own package now, so let's close this issue.

@asfgit asfgit closed this in e12b5b6 Jan 18, 2015
Agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
Agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 8, 2023
AL-8879 Fixed bug where the new executor gets expired tokens when the new executor registration time overlaps with the driver delegation token update time
udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants