Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for accessing secured HDFS #265

Closed
wants to merge 2 commits into from
Closed

Added support for accessing secured HDFS #265

wants to merge 2 commits into from

Conversation

liyinan926
Copy link
Contributor

Also changed the way task run so tasks always run under the user who submit the tasks. This replaces the old approach of using a environment variable SPARK_USER to specify the user, which is far less flexible. This eases security management since users no longer need to open access to HDFS files under their home directory to the user who starts the Spark cluster.

Signed-off-by: Yinan Li liyinan926@gmail.com

Also changed the way task run so tasks always run under the user who submit the tasks. This replaces the old approach of using a environment variable SPARK_USER to specify the user, which is far less flexible. This eases security management since users no longer need to open access to HDFS files under their home directory to the user who starts the Spark cluster.

Signed-off-by: Yinan Li <liyinan926@gmail.com>
@liyinan926
Copy link
Contributor Author

This PR replaces https://github.com/apache/incubator-spark/pull/467.

@AmplabJenkins
Copy link

Merged build triggered. Build is starting -or- tests failed to complete.

@AmplabJenkins
Copy link

Merged build started. Build is starting -or- tests failed to complete.

@AmplabJenkins
Copy link

Merged build finished. Build is starting -or- tests failed to complete.

@AmplabJenkins
Copy link

Build is starting -or- tests failed to complete.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13560/

@pwendell
Copy link
Contributor

This is failing because of a style error:
error file=/root/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/executor/Executor.scala message=File line length exceeds 100 characters line=192

Signed-off-by: Yinan Li <liyinan926@gmail.com>
@AmplabJenkins
Copy link

Merged build triggered. Build is starting -or- tests failed to complete.

@AmplabJenkins
Copy link

Merged build started. Build is starting -or- tests failed to complete.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13562/

* @return Type of Hadoop security authentication
*/
private def getAuthenticationType: String = {
sparkConf.get("spark.hadoop.security.authentication")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not have a default value?

@dkanoafry
Copy link

hi, whatever happened to this PR? I am interested in reading data from secure HDFS into spark running on Mesos...

@huozhanfeng
Copy link

I want to know the reason that why this pull request not be merged. Does it go against the roadmap of spark?

@srowen
Copy link
Member

srowen commented Sep 2, 2014

(I imagine part of the reason is that it doesn't merge into master, and failed tests)

@pwendell
Copy link
Contributor

pwendell commented Sep 8, 2014

@dkanoafry with this patch, the main issue I see is that it distributes the delegation tokens insecurity (through sc.AddFile)... so anyone could just read the tokens over the network and mimic the user who is running the Spark job. In fact we start an HTTP file server, so you wouldn't even need to observe the traffic you could just make a request against it. I'm guessing this is fine for the company submitting the patch, but it's too weak of a security model IMO to merge upstream.

Since we've added more recently support for securing the HTTP file server through a shared secret I think this might be okay to pull in now. @tgravescs would you mind taking a quick look? I think the idea here is that in standalone mode a user would just log in with a keytab and send delegation tokens to the executors, with the main goal being to provide access to a secured HDFS deployment. Is there a way now for them to set a share secret to authenticate this HTTP request? (I think it's fine to assume that they just set something in a conf file on all of the worker nodes, i.e. we don't need to disseminate that secret).

@huozhanfeng
Copy link

@pwendell @tgravescs I have done some improvement for it and have created a new PR base on the newest master, you can work on it .

PR:#2320
JIRA:https://issues.apache.org/jira/browse/SPARK-3438

I am using this patch now (with spark-1.0.2) and I really hope it can be merged into the master so it can help others and I don't need to maintain the code.

Thanks

@pwendell
Copy link
Contributor

pwendell commented Sep 8, 2014

Hey @huozhanfeng - from what I can tell your PR also has the same issue with security I was mentioning above. I think it's worth seeing whether the addFile serving can be authenticated easily. I agree it would be great to get this patch merged in since I think a few different companies are maintaining somewhat-working versions of this.

@tgravescs
Copy link
Contributor

I commented on the other pr.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20720/

@JoshRosen
Copy link
Contributor

I think we should close this issue for now, since there's another more-recent PR to add the same feature.

@asfgit asfgit closed this in 534f24b Dec 27, 2014
liancheng pushed a commit to liancheng/spark that referenced this pull request Mar 17, 2017
…tion uses NULL as its default value

This is a backport of apache@c4a6519

#### What changes were proposed in this pull request?
This PR aims to fix the following two things:

1. `sql("SET -v").collect()` or `sql("SET -v").show()` throws a NullPointerException when a String configuration with default value `null` has been defined.
2. Currently, `SET` and `SET -v` commands show unsorted result. This PR sorts the result.

## How was this patch tested?
Added a regression test to SQLQuerySuite.

Author: Herman van Hovell <hvanhovell@databricks.com>
Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#265 from hvanhovell/SPARK-19218.
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 12, 2017
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Move bosh-openstack-cpi-release job definitions in repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants