-
Notifications
You must be signed in to change notification settings - Fork 66
Bug: add spark.history.fs.logDirectory to required keys #456
Bug: add spark.history.fs.logDirectory to required keys #456
Conversation
node_scripts/install/spark.py
Outdated
@@ -88,12 +88,14 @@ def start_history_server(): | |||
# configure the history server | |||
spark_event_log_enabled_key = 'spark.eventLog.enabled' | |||
spark_event_log_directory_key = 'spark.eventLog.dir' | |||
spark_history_fs_log_directory = 'spark.history.fs.logDirectory' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably add this as a commented out field at https://github.com/Azure/aztk/blob/master/aztk_cli/config/spark-defaults.conf
node_scripts/install/spark.py
Outdated
path_to_spark_defaults_conf = os.path.join(spark_home, 'conf/spark-defaults.conf') | ||
properties = parse_configuration_file(path_to_spark_defaults_conf) | ||
required_keys = [spark_event_log_directory_key, spark_history_fs_log_directory] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing spark_event_log_enabled?
I think this broke between Spark versions... Different Spark versions have different requirements, which makes this a difficult dictionary to maintain.
…-server-ensure-configuration-keys
…github.com:jafreck/aztk into bug/spark-history-server-ensure-configuration-keys
aztk/node_scripts/install/spark.py
Outdated
if properties: | ||
if all(key in properties for key in required_keys): | ||
configure_history_server_log_path(properties[spark_history_fs_log_directory]) | ||
exe = os.path.join(spark_home, "sbin", "start-history-server.sh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't lines 205 - 208 be part of the if all(key in properties...) ? Otherwise it will always execute this even if the history server was not configured, right? Not sure if spark will just ignore this though since the properties will say it's a no-op, but still worth not even starting the process.
docs/13-configuration.md
Outdated
# id: <id of the cluster to be created> | ||
id: spark_cluster | ||
|
||
# vm_size: <vm-size, see available options here: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth updating the link to batch pricing:
https://azure.microsoft.com/pricing/details/batch/
docs/13-configuration.md
Outdated
# web_ui_port: <local port where the spark master web ui is forwarded to> | ||
web_ui_port: 8080 | ||
|
||
# jupyter_port: <local port which where jupyter is forwarded to> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we get rid of the items below since they are now driven by plugins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah -- we should probably get rid of ssh.yaml
entirely. The only thing it does now is set a default username. It's been almost completely replaced by plugins.
docs/13-configuration.md
Outdated
connect: true | ||
``` | ||
|
||
Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also out of date now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/13-configuration.md
Outdated
|
||
Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888. | ||
|
||
Note that all of the settings in ssh.yaml will be overrided by parameters passed on the command line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or the plugin
docs/13-configuration.md
Outdated
|
||
### History Server | ||
If you want to use Spark's history server, please set the following values in your `.aztk/spark-defaults.conf` file: | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth linking off to the spark docs for this? (hard part would be to know which version of the docs though...)
Fix #452