Bug: add spark.history.fs.logDirectory to required keys #456

jafreck · 2018-03-22T23:08:34Z

paselem · 2018-03-23T15:49:56Z

node_scripts/install/spark.py

@@ -88,12 +88,14 @@ def start_history_server():
    # configure the history server
    spark_event_log_enabled_key = 'spark.eventLog.enabled'
    spark_event_log_directory_key = 'spark.eventLog.dir'
+    spark_history_fs_log_directory = 'spark.history.fs.logDirectory'


You should probably add this as a commented out field at https://github.com/Azure/aztk/blob/master/aztk_cli/config/spark-defaults.conf

paselem · 2018-03-23T15:51:05Z

node_scripts/install/spark.py

    path_to_spark_defaults_conf = os.path.join(spark_home, 'conf/spark-defaults.conf')
    properties = parse_configuration_file(path_to_spark_defaults_conf)
+    required_keys = [spark_event_log_directory_key, spark_history_fs_log_directory]


Missing spark_event_log_enabled?

I think this broke between Spark versions... Different Spark versions have different requirements, which makes this a difficult dictionary to maintain.

…-server-ensure-configuration-keys

…ion-keys

…github.com:jafreck/aztk into bug/spark-history-server-ensure-configuration-keys

paselem · 2018-03-28T15:27:57Z

aztk/node_scripts/install/spark.py

+    if properties:
+        if all(key in properties for key in required_keys):
+            configure_history_server_log_path(properties[spark_history_fs_log_directory])
+        exe = os.path.join(spark_home, "sbin", "start-history-server.sh")


Shouldn't lines 205 - 208 be part of the if all(key in properties...) ? Otherwise it will always execute this even if the history server was not configured, right? Not sure if spark will just ignore this though since the properties will say it's a no-op, but still worth not even starting the process.

paselem · 2018-03-28T19:30:41Z

docs/13-configuration.md

+# id: <id of the cluster to be created>
+id: spark_cluster
+
+# vm_size: <vm-size, see available options here: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/>


probably worth updating the link to batch pricing:
https://azure.microsoft.com/pricing/details/batch/

paselem · 2018-03-28T19:31:36Z

docs/13-configuration.md

+# web_ui_port: <local port where the spark master web ui is forwarded to>
+web_ui_port: 8080
+
+# jupyter_port: <local port which where jupyter is forwarded to>


Should we get rid of the items below since they are now driven by plugins?

yeah -- we should probably get rid of ssh.yaml entirely. The only thing it does now is set a default username. It's been almost completely replaced by plugins.

paselem · 2018-03-28T19:31:55Z

docs/13-configuration.md

+connect: true
+```
+
+Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888.


This is also out of date now.

yeah --most of this file is out of date. All of the plugin docs update will come with #461 and #467

paselem · 2018-03-28T19:32:11Z

docs/13-configuration.md

+
+Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888.
+
+Note that all of the settings in ssh.yaml will be overrided by parameters passed on the command line.


... or the plugin

paselem · 2018-03-28T19:33:38Z

docs/13-configuration.md

+
+### History Server
+If you want to use Spark's history server, please set the following values in your `.aztk/spark-defaults.conf` file:
+```


Worth linking off to the spark docs for this? (hard part would be to know which version of the docs though...)

…ion-keys

add spark.history.fs.logDirectory to requried keys

8c2d616

jafreck added needs docs in progress labels Mar 22, 2018

paselem reviewed Mar 23, 2018

View reviewed changes

jafreck added 2 commits March 23, 2018 13:34

add spark_event_log_enabled_key to required_keys

2bbd2c7

Merge remote-tracking branch 'upstream/master' into bug/spark-history…

19d4518

…-server-ensure-configuration-keys

jafreck added this to the v0.7.0 milestone Mar 23, 2018

jafreck added 3 commits March 23, 2018 13:44

Merge branch 'master' into bug/spark-history-server-ensure-configurat…

d917909

…ion-keys

docs, add history server config to spark-defaults.conf

f4fc377

Merge branch 'bug/spark-history-server-ensure-configuration-keys' of …

1f9ea63

…github.com:jafreck/aztk into bug/spark-history-server-ensure-configuration-keys

jafreck removed the needs docs label Mar 23, 2018

jafreck changed the title ~~Bug: add spark.history.fs.logDirectory to requried keys~~ Bug: add spark.history.fs.logDirectory to required keys Mar 27, 2018

paselem reviewed Mar 28, 2018

View reviewed changes

fix bad logic

b856179

paselem reviewed Mar 28, 2018

View reviewed changes

jafreck added 2 commits March 28, 2018 12:35

crlf->lf

6a0c8b6

Merge branch 'master' into bug/spark-history-server-ensure-configurat…

8080431

…ion-keys

paselem approved these changes Apr 5, 2018

View reviewed changes

Merge branch 'master' into bug/spark-history-server-ensure-configurat…

ef44e4f

…ion-keys

jafreck merged commit 4ef3dd0 into Azure:master Apr 5, 2018

jafreck removed the in progress label Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: add spark.history.fs.logDirectory to required keys #456

Bug: add spark.history.fs.logDirectory to required keys #456

jafreck commented Mar 22, 2018

paselem Mar 23, 2018

paselem Mar 23, 2018

paselem Mar 28, 2018

paselem Mar 28, 2018

paselem Mar 28, 2018

jafreck Mar 28, 2018

paselem Mar 28, 2018

jafreck Mar 28, 2018

paselem Mar 28, 2018

paselem Mar 28, 2018


		Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888.

		Note that all of the settings in ssh.yaml will be overrided by parameters passed on the command line.

Bug: add spark.history.fs.logDirectory to required keys #456

Bug: add spark.history.fs.logDirectory to required keys #456

Conversation

jafreck commented Mar 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment