-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16987] [None] Add spark-default.conf property to define https port for spark history server #15652
Conversation
@@ -147,7 +147,10 @@ private[spark] abstract class WebUI( | |||
} | |||
|
|||
/** Return the url of web interface. Only valid after bind(). */ | |||
def webUrl: String = s"http://$publicHostName:$boundPort" | |||
def webUrl: String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem unrelated; aren't these in master already? can you rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I rebased this change.
<td><code>spark.ssl.<particular protocol>.port</code></td> | ||
<td>0</td> | ||
<td> | ||
A port number hen connecting with SSL. Default value 0 means to be determined automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few typos here, and I don't think you need to repeat the point about multiple services. I'd suggest changing the prop name above to spark.ssl.port
and then:
Port number to listen on for SSL connections. Default value of 0 means the port will be determined automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I fixed the description.
} else { | ||
0 | ||
} | ||
val scheme = "https" | ||
// Create a connector on port securePort to listen for HTTPS requests | ||
// Create a connector on port securePort to listen for HTTPS requestswork. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was an unintended change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I rebased this change.
Thank you very much. I fixed for your comment. Please check again. |
@@ -1664,6 +1664,14 @@ Apart from these, the following properties are also available, and may be useful | |||
</td> | |||
</tr> | |||
<tr> | |||
<td><code>spark.ssl.<particular protocol>.port</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does <particular protocol>
mean here ? Is this config name parameterized ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree; see #15652 (comment) This should just refer to spark.ssl.port
because the rest is documented above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has a reason for this.
If an user set spark.ssl.port, the port of history server, SparkUI, MasterWebUI, and WorkerWebUI will collapse as I mentioned at the beginning of this PR.
It is because WebUI.scala is inheried by HistoryServer.scala, SparkUI.scala, MasterWebUI.scala, and WorkerWebUI.scala.
How do you think about it?
I created a pull request, but there is a problem.
Port should be separated for each particular process such as history server and spark
ui.
But user can configure spark.ssl.port with this code and when configure it, each
process will try to create connections with same port and it doesn't work good.
Because of structure of existing function, it has impact to other functions to solve the
problem.
So there are three answer about it.Use this fix code so user can configure a specific secure port, but spark.ssl.port doesn't > work good.
Don't change and close issue.
We should be able to configure a specific secure port completely, so drastically change > many functions.
Please comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, setting spark.ssl.port
sets the SSL port for all services. That's how it's documented to work above, and you can override with spark.ui.ssl.port
, etc. I'm just saying that this is already documented, and to be consistent with other props, the config key in this section can be spark.ssl.port
-- or did I misunderstand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are saying about following documentation at "spark.ssl.enabled", aren't you?
<p>Use <code>spark.ssl.YYY.XXX</code> settings to overwrite the global configuration for particular protocol denoted by <code>YYY</code>. Example values for <code>YYY</code> include <code>fs</code>, <code>ui</code>, <code>standalone</code>, and <code>historyServer</code>. See <a href="security.html#ssl-configuration">SSL Configuration</a> for details on hierarchical SSL configuration for services.</p>
I think I should refer about particular process at spark.ssl.port again. because another ssl parameter doesn't clash processes with grobal configuration and only "spark.ssl.port" clash processes. But if it's not so important, I'll fix documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I'm referring to. I would still use spark.ssl.port
as the key for this section for consistency. It's OK to maybe add some text reminding the reader that this port can and probably should be set separately for each service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understood. I refered about attention, and changed the key to spark.ssl.port.
@@ -1598,7 +1598,7 @@ Apart from these, the following properties are also available, and may be useful | |||
<td>false</td> | |||
<td> | |||
<p>Whether to enable SSL connections on all supported protocols.</p> | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can, remove this change
Test build #3409 has finished for PR 15652 at commit
|
val securePort = | ||
if (currentPort != 0) { | ||
(currentPort + 400 - 1024) % (65536 - 1024) + 1024 | ||
if (1024 < sslOptions.port && sslOptions.port < 65536) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition should only happen when the port is 0 right? given the require
above. It would be clearer to check against 0 only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the require()
above you have 1024 <= sslOptions.port
but in this if()
you have 1024 < sslOptions.port
. Just wanted to check if thats intentional.
<td> | ||
Port number to listen on for SSL connections. | ||
Default value of 0 means the port will be determined automatically. | ||
Attention that the port should be separated for each particular protocols. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to write something like "Note that the SSL port for individual services can be overridden by setting values like spark.ssl.YYY.port
as described above."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code above you are ensuring that the SSL port is always between 1024 and 65536. You could document that here as "allowed range of values".
if (1024 < sslOptions.port && sslOptions.port < 65536) { | ||
sslOptions.port | ||
} else { | ||
(currentPort + 400 - 1024) % (65536 - 1024) + 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment in the code explaining the math done here ? It will help readability of the code.
I reflected some comments. |
@@ -66,7 +66,7 @@ private[ml] trait VectorIndexerParams extends Params with HasInputCol with HasOu | |||
* - This helps process a dataset of unknown vectors into a dataset with some continuous | |||
* features and some categorical features. The choice between continuous and categorical | |||
* is based upon a maxCategories parameter. | |||
* - Set maxCategories to the maximum number of categorical any categorical feature should have. | |||
* - Set maxCategories to the maximum number of categories which categorical feature should have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look related? it should also be more like "number of categories that any categorical ..."
@@ -85,8 +85,10 @@ private[ml] trait DecisionTreeParams extends PredictorParams | |||
* (default = 256 MB) | |||
* @group expertParam | |||
*/ | |||
final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB", | |||
"Maximum memory in MB allocated to histogram aggregation.", | |||
final val maxMemoryInMB: IntParam = new IntParam(this, "maxMemoryInMB", "Maximum memory in MB" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, looks unrelated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't related to SPARK-16987 as you say. But I think the documentation is not sufficient, and I'd like to fix it. I think it's very trivial so we can handle at this PR.
Is it better to create another issue at jira for this?
And treeParams.scala's comment is same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should not be in this pull request because it's not at all related. Sometimes it makes sense to touch up code while changing it for other reasons but that's not the case here.
I think these changes may be too insignificant for their own PR. If you can review a module for typos, that would be better to do it all at once.
For this particular change, I don't know if the parameter's help message needs to have all its documentation. It's just describing the basic nature of the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased unrelated changes. Please confirm it.
Port number to listen on for SSL connections. | ||
The SSL port should be between 1024 and 65535 (inclusive). | ||
Default value of 0 means the port will be determined automatically. | ||
Attention that the port should be separated for each particular protocols. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's clear what the last sentence means. I'd either remove it or say something like "the port can be specified for services individually, with properties like spark.ssl.YYYY.port
, as described above."
val securePort = | ||
if (currentPort != 0) { | ||
(currentPort + 400 - 1024) % (65536 - 1024) + 1024 | ||
// If the new port wraps around, do not try a privileged port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is duplicated and not correct here
val securePort = | ||
if (currentPort != 0) { | ||
(currentPort + 400 - 1024) % (65536 - 1024) + 1024 | ||
// If the new port wraps around, do not try a privileged port. | ||
if (1024 <= sslOptions.port && sslOptions.port < 65536) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it seems easier to flip the if condition, and just check for the port being 0, rather than repeat this condition.
OK, I wouldn't mind getting @vanzin to glance at this as it involves SSL. (Can you also replace [None] with [WEBUI] in the title?) Looks fairly straightforward to me. |
val securePort = | ||
if (currentPort != 0) { | ||
(currentPort + 400 - 1024) % (65536 - 1024) + 1024 | ||
if (sslOptions.port == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem with this logic. It ignores retries. Imagine that you're setting the base HTTP and SSL ports for the Spark UI (not for the history server), and you want multiple drivers on the same host. So you set (names may not be totally correct):
spark.ui.port=1234
spark.ssl.ui.port=5678
The first driver comes up and binds to 1234 and 5678. Then the second driver comes up and those two ports are used; startServiceOnPort
will take care of retrying the HTTP port until maxRetries
, but this code does not do the same for the SSL port: it will always try 5678
. So the second driver will never run because it will fail to bind to the SSL port.
You should instead be using the port
value passed to startServiceOnPort
as the base to calculate the offset for the defined SSL port. That's not optimal, but it's probably the best you can do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for comment. Yes, as you say,this logic will occur some conflicts.
But as I refered at the top of PR comment, startServiceOnPort
is called from many unrelated methods.
So I conscious that this PR affects many unrelated components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not unrelated. As I described, you code will break things in the normal web UI if you set that option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that some methods which call startServiceOnPort
is related, and for example, NettyBlockTransferService.init
or RestSubmissionServer.this
is unrelated, I think.
But I'm trying to fix according to your comment. I'm in progress so please wait...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NettyBlockTransferService
is unrelated, but as I mentioned, the Spark UI also supports SSL and would break with your previous version of the code.
conf: SparkConf, | ||
serviceName: String = ""): (T, Int) = { | ||
serviceName: String = "", | ||
securePort: Int = 0): (T, Int) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like what you did here. As you yourself explained, not all consumers of this API have the concept of separate secure and non-secure ports. And as I explained in my previous comments, it's possible to fix the problem you introduced without changing this API.
Aside from me not liking the API change you're introducing, you need to add a unit test to make sure the feature is working. I suggest taking the scenario I described in a previous comment (#15652 (comment)) and turning it into a unit test. |
Hi @chie8842, is it still active? |
@HyukjinKwon I'm sorry, I haven't reply for a long time. |
I added this with #16625, this PR is not needed anymore. Please close. |
I created a pull request, but there is a problem.
Port should be separated for each particular process such as history server and spark ui.
But user can configure spark.ssl.port with this code and when configure it, each process will try to create connections with same port and it doesn't work good.
Because of structure of existing function, it has impact to other functions to solve the problem.
So there are three answer about it.
Please comments.