New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve heron submit error message #1462

Closed
objmagic opened this Issue Oct 3, 2016 · 6 comments

Comments

Projects
None yet
2 participants

@objmagic objmagic added the bug label Oct 3, 2016

@objmagic objmagic self-assigned this Oct 3, 2016

@objmagic objmagic changed the title from heron submit should bail out when config file does not exist to improve heron submit error message Oct 3, 2016

@objmagic

This comment has been minimized.

Contributor

objmagic commented Oct 3, 2016

should use LOG.warning here

@objmagic

This comment has been minimized.

Contributor

objmagic commented Oct 3, 2016

document how to use cluster.yaml

@objmagic

This comment has been minimized.

Contributor

objmagic commented Oct 18, 2016

if the topology already exists, the error message is hidden in a huge trunk of stacktrace. And at the very end, we see:

ERROR: Failed to launch topology 'ExclamationTopology' because User main failed with status 1. Bailing out...
Traceback (most recent call last):
  File "heron/tools/cli/src/python/submit.py", line 149, in launch_topologies
    launch_a_topology(cl_args, tmp_dir, topology_file, defn_file)
  File "heron/tools/cli/src/python/submit.py", line 114, in launch_a_topology
    java_defines=[]
  File "heron/tools/cli/src/python/execute.py", line 73, in heron_class
    raise RuntimeError(err_str)
RuntimeError: User main failed with status 1. Bailing out...

This piece of error message does not help user understand what's going wrong.

@billonahill

This comment has been minimized.

Contributor

billonahill commented Oct 18, 2016

The following [painful] "log and return null" pattern is prevalent and deeply nested in the code:

if an error occurs:
  - log an message
  - return null

So when failures occur deep in the stack it results in a buried error message and a Main class that gets a null and returns a 1 status code. Hence the poor user experience.

We should change that pattern to either throw exceptions all the way to Main class, or to return meaningful Response objects with error codes and messages. The Main should then pass this info back to any python wrappers via stderr while returning the correct response code.

@billonahill

This comment has been minimized.

Contributor

billonahill commented Nov 4, 2016

Here's another example from when an upload to packer failed. In this case there's a log error message from the packer uploader buried in the output and the failure output from the packer command is suppressed. Ideally both of these would be shown at the tail of the output without SubmitterMain stack trace and without the User main failed with status 1. Bailing out... messaging.

[2016-11-04 21:11:27 +0000] com.twitter.heron.uploader.packer.PackerUploader INFO:  Uploading packer package heron-topology-xxxxxxx_heron-core-oss
[2016-11-04 21:12:14 +0000] com.twitter.heron.uploader.packer.PackerUploader SEVERE:  Failed to upload package to packer. Cmd: packer add_version --cluster xxxx billg heron-topology-xxxxxxx_heron-core-oss /tmp/tmpbHOAg1/topology.tar.gz --json
[2016-11-04 21:12:14 +0000] com.twitter.heron.scheduler.SubmitterMain SEVERE:  Failed to upload package.
[2016-11-04 21:12:14 +0000] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the CuratorClient to: xxxxxxxxx:yyyy
[2016-11-04 21:12:14 +0000] org.apache.zookeeper.ClientCnxn INFO:  EventThread shut down
[2016-11-04 21:12:14 +0000] org.apache.zookeeper.ZooKeeper INFO:  Session: 0x1058000caf51655a closed
[2016-11-04 21:12:14 +0000] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the tunnel processes
Exception in thread "main" java.lang.RuntimeException: Failed to submit topology xxxxxxx
        at com.twitter.heron.scheduler.SubmitterMain.main(SubmitterMain.java:326)
ERROR: Failed to launch topology 'xxxxxxx' because User main failed with status 1. Bailing out...
Traceback (most recent call last):
  File "heron/tools/cli/src/python/submit.py", line 149, in launch_topologies
    launch_a_topology(cl_args, tmp_dir, topology_file, defn_file)
  File "heron/tools/cli/src/python/submit.py", line 114, in launch_a_topology
    java_defines=[]
  File "heron/tools/cli/src/python/execute.py", line 73, in heron_class
    raise RuntimeError(err_str)
RuntimeError: User main failed with status 1. Bailing out...
INFO: Elapsed time: 117.040s.
@objmagic

This comment has been minimized.

Contributor

objmagic commented Jan 25, 2017

closed via #1571 #1610

@objmagic objmagic closed this Jan 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment