Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16987] [None] Add spark-default.conf property to define https port for spark history server #15652

Closed
wants to merge 11 commits into from

Conversation

chie8842
Copy link
Contributor

I created a pull request, but there is a problem.
Port should be separated for each particular process such as history server and spark ui.
But user can configure spark.ssl.port with this code and when configure it, each process will try to create connections with same port and it doesn't work good.
Because of structure of existing function, it has impact to other functions to solve the problem.
So there are three answer about it.

  1. Use this fix code so user can configure a specific secure port, but spark.ssl.port doesn't work good.
  2. Don't change and close issue.
  3. We should be able to configure a specific secure port completely, so drastically change many functions.
    Please comments.

@@ -147,7 +147,10 @@ private[spark] abstract class WebUI(
}

/** Return the url of web interface. Only valid after bind(). */
def webUrl: String = s"http://$publicHostName:$boundPort"
def webUrl: String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem unrelated; aren't these in master already? can you rebase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I rebased this change.

<td><code>spark.ssl.<particular protocol>.port</code></td>
<td>0</td>
<td>
A port number hen connecting with SSL. Default value 0 means to be determined automatically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few typos here, and I don't think you need to repeat the point about multiple services. I'd suggest changing the prop name above to spark.ssl.port and then:

Port number to listen on for SSL connections. Default value of 0 means the port will be determined automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I fixed the description.

} else {
0
}
val scheme = "https"
// Create a connector on port securePort to listen for HTTPS requests
// Create a connector on port securePort to listen for HTTPS requestswork.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was an unintended change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I rebased this change.

@chie8842
Copy link
Contributor Author

Thank you very much. I fixed for your comment. Please check again.

@@ -1664,6 +1664,14 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
<td><code>spark.ssl.<particular protocol>.port</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does <particular protocol> mean here ? Is this config name parameterized ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree; see #15652 (comment) This should just refer to spark.ssl.port because the rest is documented above.

Copy link
Contributor Author

@chie8842 chie8842 Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a reason for this.
If an user set spark.ssl.port, the port of history server, SparkUI, MasterWebUI, and WorkerWebUI will collapse as I mentioned at the beginning of this PR.
It is because WebUI.scala is inheried by HistoryServer.scala, SparkUI.scala, MasterWebUI.scala, and WorkerWebUI.scala.
How do you think about it?

I created a pull request, but there is a problem.
Port should be separated for each particular process such as history server and spark
ui.
But user can configure spark.ssl.port with this code and when configure it, each
process will try to create connections with same port and it doesn't work good.
Because of structure of existing function, it has impact to other functions to solve the
problem.
So there are three answer about it.

Use this fix code so user can configure a specific secure port, but spark.ssl.port doesn't > work good.
Don't change and close issue.
We should be able to configure a specific secure port completely, so drastically change > many functions.
Please comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, setting spark.ssl.port sets the SSL port for all services. That's how it's documented to work above, and you can override with spark.ui.ssl.port, etc. I'm just saying that this is already documented, and to be consistent with other props, the config key in this section can be spark.ssl.port -- or did I misunderstand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are saying about following documentation at "spark.ssl.enabled", aren't you?
<p>Use <code>spark.ssl.YYY.XXX</code> settings to overwrite the global configuration for particular protocol denoted by <code>YYY</code>. Example values for <code>YYY</code> include <code>fs</code>, <code>ui</code>, <code>standalone</code>, and <code>historyServer</code>. See <a href="security.html#ssl-configuration">SSL Configuration</a> for details on hierarchical SSL configuration for services.</p>
I think I should refer about particular process at spark.ssl.port again. because another ssl parameter doesn't clash processes with grobal configuration and only "spark.ssl.port" clash processes. But if it's not so important, I'll fix documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I'm referring to. I would still use spark.ssl.port as the key for this section for consistency. It's OK to maybe add some text reminding the reader that this port can and probably should be set separately for each service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understood. I refered about attention, and changed the key to spark.ssl.port.

@@ -1598,7 +1598,7 @@ Apart from these, the following properties are also available, and may be useful
<td>false</td>
<td>
<p>Whether to enable SSL connections on all supported protocols.</p>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can, remove this change

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #3409 has finished for PR 15652 at commit 3964609.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

val securePort =
if (currentPort != 0) {
(currentPort + 400 - 1024) % (65536 - 1024) + 1024
if (1024 < sslOptions.port && sslOptions.port < 65536) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition should only happen when the port is 0 right? given the require above. It would be clearer to check against 0 only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the require() above you have 1024 <= sslOptions.port but in this if() you have 1024 < sslOptions.port. Just wanted to check if thats intentional.

<td>
Port number to listen on for SSL connections.
Default value of 0 means the port will be determined automatically.
Attention that the port should be separated for each particular protocols.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to write something like "Note that the SSL port for individual services can be overridden by setting values like spark.ssl.YYY.port as described above."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code above you are ensuring that the SSL port is always between 1024 and 65536. You could document that here as "allowed range of values".

if (1024 < sslOptions.port && sslOptions.port < 65536) {
sslOptions.port
} else {
(currentPort + 400 - 1024) % (65536 - 1024) + 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment in the code explaining the math done here ? It will help readability of the code.

@chie8842
Copy link
Contributor Author

chie8842 commented Nov 9, 2016

I reflected some comments.
And I handled about very trivial issues in ml library. Is it OK to contain them in this PR?

@@ -66,7 +66,7 @@ private[ml] trait VectorIndexerParams extends Params with HasInputCol with HasOu
* - This helps process a dataset of unknown vectors into a dataset with some continuous
* features and some categorical features. The choice between continuous and categorical
* is based upon a maxCategories parameter.
* - Set maxCategories to the maximum number of categorical any categorical feature should have.
* - Set maxCategories to the maximum number of categories which categorical feature should have.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look related? it should also be more like "number of categories that any categorical ..."

@@ -85,8 +85,10 @@ private[ml] trait DecisionTreeParams extends PredictorParams
* (default = 256 MB)
* @group expertParam
*/
final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB",
"Maximum memory in MB allocated to histogram aggregation.",
final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB", "Maximum memory in MB" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, looks unrelated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't related to SPARK-16987 as you say. But I think the documentation is not sufficient, and I'd like to fix it. I think it's very trivial so we can handle at this PR.
Is it better to create another issue at jira for this?
And treeParams.scala's comment is same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should not be in this pull request because it's not at all related. Sometimes it makes sense to touch up code while changing it for other reasons but that's not the case here.

I think these changes may be too insignificant for their own PR. If you can review a module for typos, that would be better to do it all at once.

For this particular change, I don't know if the parameter's help message needs to have all its documentation. It's just describing the basic nature of the parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased unrelated changes. Please confirm it.

Port number to listen on for SSL connections.
The SSL port should be between 1024 and 65535 (inclusive).
Default value of 0 means the port will be determined automatically.
Attention that the port should be separated for each particular protocols.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's clear what the last sentence means. I'd either remove it or say something like "the port can be specified for services individually, with properties like spark.ssl.YYYY.port, as described above."

val securePort =
if (currentPort != 0) {
(currentPort + 400 - 1024) % (65536 - 1024) + 1024
// If the new port wraps around, do not try a privileged port.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is duplicated and not correct here

val securePort =
if (currentPort != 0) {
(currentPort + 400 - 1024) % (65536 - 1024) + 1024
// If the new port wraps around, do not try a privileged port.
if (1024 <= sslOptions.port && sslOptions.port < 65536) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it seems easier to flip the if condition, and just check for the port being 0, rather than repeat this condition.

@srowen
Copy link
Member

srowen commented Nov 11, 2016

OK, I wouldn't mind getting @vanzin to glance at this as it involves SSL.

(Can you also replace [None] with [WEBUI] in the title?)

Looks fairly straightforward to me.

val securePort =
if (currentPort != 0) {
(currentPort + 400 - 1024) % (65536 - 1024) + 1024
if (sslOptions.port == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with this logic. It ignores retries. Imagine that you're setting the base HTTP and SSL ports for the Spark UI (not for the history server), and you want multiple drivers on the same host. So you set (names may not be totally correct):

spark.ui.port=1234
spark.ssl.ui.port=5678

The first driver comes up and binds to 1234 and 5678. Then the second driver comes up and those two ports are used; startServiceOnPort will take care of retrying the HTTP port until maxRetries, but this code does not do the same for the SSL port: it will always try 5678. So the second driver will never run because it will fail to bind to the SSL port.

You should instead be using the port value passed to startServiceOnPort as the base to calculate the offset for the defined SSL port. That's not optimal, but it's probably the best you can do here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comment. Yes, as you say,this logic will occur some conflicts.
But as I refered at the top of PR comment, startServiceOnPort is called from many unrelated methods.
So I conscious that this PR affects many unrelated components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not unrelated. As I described, you code will break things in the normal web UI if you set that option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that some methods which call startServiceOnPort is related, and for example, NettyBlockTransferService.init or RestSubmissionServer.this is unrelated, I think.
But I'm trying to fix according to your comment. I'm in progress so please wait...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NettyBlockTransferService is unrelated, but as I mentioned, the Spark UI also supports SSL and would break with your previous version of the code.

conf: SparkConf,
serviceName: String = ""): (T, Int) = {
serviceName: String = "",
securePort: Int = 0): (T, Int) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like what you did here. As you yourself explained, not all consumers of this API have the concept of separate secure and non-secure ports. And as I explained in my previous comments, it's possible to fix the problem you introduced without changing this API.

@vanzin
Copy link
Contributor

vanzin commented Nov 20, 2016

Aside from me not liking the API change you're introducing, you need to add a unit test to make sure the feature is working. I suggest taking the scenario I described in a previous comment (#15652 (comment)) and turning it into a unit test.

@HyukjinKwon
Copy link
Member

Hi @chie8842, is it still active?

@chie8842
Copy link
Contributor Author

@HyukjinKwon I'm sorry, I haven't reply for a long time.
I'll check it.

@vanzin
Copy link
Contributor

vanzin commented May 11, 2017

I added this with #16625, this PR is not needed anymore. Please close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants