Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Bug: add spark.history.fs.logDirectory to required keys #456

Conversation

jafreck
Copy link
Member

@jafreck jafreck commented Mar 22, 2018

Fix #452

@@ -88,12 +88,14 @@ def start_history_server():
# configure the history server
spark_event_log_enabled_key = 'spark.eventLog.enabled'
spark_event_log_directory_key = 'spark.eventLog.dir'
spark_history_fs_log_directory = 'spark.history.fs.logDirectory'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add this as a commented out field at https://github.com/Azure/aztk/blob/master/aztk_cli/config/spark-defaults.conf

path_to_spark_defaults_conf = os.path.join(spark_home, 'conf/spark-defaults.conf')
properties = parse_configuration_file(path_to_spark_defaults_conf)
required_keys = [spark_event_log_directory_key, spark_history_fs_log_directory]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing spark_event_log_enabled?

I think this broke between Spark versions... Different Spark versions have different requirements, which makes this a difficult dictionary to maintain.

@jafreck jafreck added this to the v0.7.0 milestone Mar 23, 2018
@jafreck jafreck changed the title Bug: add spark.history.fs.logDirectory to requried keys Bug: add spark.history.fs.logDirectory to required keys Mar 27, 2018
if properties:
if all(key in properties for key in required_keys):
configure_history_server_log_path(properties[spark_history_fs_log_directory])
exe = os.path.join(spark_home, "sbin", "start-history-server.sh")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't lines 205 - 208 be part of the if all(key in properties...) ? Otherwise it will always execute this even if the history server was not configured, right? Not sure if spark will just ignore this though since the properties will say it's a no-op, but still worth not even starting the process.

# id: <id of the cluster to be created>
id: spark_cluster

# vm_size: <vm-size, see available options here: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth updating the link to batch pricing:
https://azure.microsoft.com/pricing/details/batch/

# web_ui_port: <local port where the spark master web ui is forwarded to>
web_ui_port: 8080

# jupyter_port: <local port which where jupyter is forwarded to>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we get rid of the items below since they are now driven by plugins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah -- we should probably get rid of ssh.yaml entirely. The only thing it does now is set a default username. It's been almost completely replaced by plugins.

connect: true
```

Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also out of date now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah --most of this file is out of date. All of the plugin docs update will come with #461 and #467


Running the command `aztk spark cluster ssh --id <cluster_id>` will ssh into the master node of the Spark cluster. It will also forward the Spark Job UI to localhost:4040, the Spark master's web UI to localhost:8080, and Jupyter to localhost:8888.

Note that all of the settings in ssh.yaml will be overrided by parameters passed on the command line.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or the plugin


### History Server
If you want to use Spark's history server, please set the following values in your `.aztk/spark-defaults.conf` file:
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth linking off to the spark docs for this? (hard part would be to know which version of the docs though...)

@jafreck jafreck merged commit 4ef3dd0 into Azure:master Apr 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants