New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heron submit aurora error #883

Closed
wking1986 opened this Issue Jun 8, 2016 · 46 comments

Comments

Projects
None yet
@wking1986

wking1986 commented Jun 8, 2016

Hi guys:
I build mesos-0.25 and aurora-0.12 , and they running normally.
When I "heron submit aurora --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology" , it has error about aurora

image

 My config like this:

 scheduler.yaml is 

image

   statemgr.yaml is 

image

  I do not know why the error happen?  Help me ,Thanks a lot!!
@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 8, 2016

Contributor

Hi,
According to the logs, it failed to invoke Aurora.onScheduler(...)/ Can u add the flag "--verbose" when submitting the job and share the verbose output?

Contributor

maosongfu commented Jun 8, 2016

Hi,
According to the logs, it failed to invoke Aurora.onScheduler(...)/ Can u add the flag "--verbose" when submitting the job and share the verbose output?

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 8, 2016

Get it !!
image

I modify env with "devel" and sucess submit to aurora

image

@maosongfu Thank you for your help

wking1986 commented Jun 8, 2016

Get it !!
image

I modify env with "devel" and sucess submit to aurora

image

@maosongfu Thank you for your help

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 8, 2016

@maosongfu , I can see Topology in aurora , but not find in Heron-ui , Please Why?

image

wking1986 commented Jun 8, 2016

@maosongfu , I can see Topology in aurora , but not find in Heron-ui , Please Why?

image

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 8, 2016

Contributor
  1. Check whether the topology is running normally. You can do it via checking the log-files folder
  2. Heron trakcer feeds data for heron-ui. You need to start both of them with correct state manager configuration: https://github.com/twitter/heron/blob/master/heron/config/src/yaml/tracker/heron_tracker.yaml
Contributor

maosongfu commented Jun 8, 2016

  1. Check whether the topology is running normally. You can do it via checking the log-files folder
  2. Heron trakcer feeds data for heron-ui. You need to start both of them with correct state manager configuration: https://github.com/twitter/heron/blob/master/heron/config/src/yaml/tracker/heron_tracker.yaml
@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 8, 2016

Contributor

BTW, I added a pull request: #884 , which logs the stderr of a spawned process even without "--verbose" flag.

Contributor

maosongfu commented Jun 8, 2016

BTW, I added a pull request: #884 , which logs the stderr of a spawned process even without "--verbose" flag.

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 8, 2016

OK, I try again , Thank you very much!!

wking1986 commented Jun 8, 2016

OK, I try again , Thank you very much!!

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 8, 2016

@maosongfu ,I have modfied heron_tracker.yaml,and heron-ui can show topology
But topology is not activate,then I execute cmd:
heron activate --verbose aurora/root/devel ExclamationTopology

image

I find zk path :/heron/pplans reliably hava not TopologyName(ExclamationTopology) ,But other zk dir hava ExclamationTopology (eg: /heron/topologies/ExclamationTopology)

Why "/heron/pplans" hava no ExclamationTopology? which yaml config have problem?

statemgr.yaml like this:
image

wking1986 commented Jun 8, 2016

@maosongfu ,I have modfied heron_tracker.yaml,and heron-ui can show topology
But topology is not activate,then I execute cmd:
heron activate --verbose aurora/root/devel ExclamationTopology

image

I find zk path :/heron/pplans reliably hava not TopologyName(ExclamationTopology) ,But other zk dir hava ExclamationTopology (eg: /heron/topologies/ExclamationTopology)

Why "/heron/pplans" hava no ExclamationTopology? which yaml config have problem?

statemgr.yaml like this:
image

@qiuyij

This comment has been minimized.

Show comment
Hide comment
@qiuyij

qiuyij Jun 8, 2016

Contributor

Perhaps these may help #834 #822
More guides on troubleshooting will be published soon #877

Contributor

qiuyij commented Jun 8, 2016

Perhaps these may help #834 #822
More guides on troubleshooting will be published soon #877

@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 8, 2016

@maosongfu @qiuyij I get same error when I use aurora. on local env, I can find detail error info from log-files directory. but I can't find it on aurora env. where can I find the log-files directory?

aaronshan commented Jun 8, 2016

@maosongfu @qiuyij I get same error when I use aurora. on local env, I can find detail error info from log-files directory. but I can't find it on aurora env. where can I find the log-files directory?

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 8, 2016

@maosongfu @qiuyij If Topology sumbit to aurora,Can I figure out reasons failed to start process from:~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout ?

wking1986 commented Jun 8, 2016

@maosongfu @qiuyij If Topology sumbit to aurora,Can I figure out reasons failed to start process from:~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout ?

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 8, 2016

@aaronshan If Topology sumbit to aurora,you can find out in mesos/slaves/........./latest/sandbox/heron-executor.stdout

wking1986 commented Jun 8, 2016

@aaronshan If Topology sumbit to aurora,you can find out in mesos/slaves/........./latest/sandbox/heron-executor.stdout

@billonahill

This comment has been minimized.

Show comment
Hide comment
@billonahill

billonahill Jun 8, 2016

Contributor

@kartik894 I responded to your issue #888. Let's keep these two issues separate pls.

Contributor

billonahill commented Jun 8, 2016

@kartik894 I responded to your issue #888. Let's keep these two issues separate pls.

@nlu90

This comment has been minimized.

Show comment
Hide comment
@nlu90

nlu90 Jun 8, 2016

Member

@wking1986

Could you check logs to see if your topology is actually running? Sometimes the pplan missing is due to topology not running correctly. If this is the case, you can kill the topology and submit it again and see if the issue resolves.

Member

nlu90 commented Jun 8, 2016

@wking1986

Could you check logs to see if your topology is actually running? Sometimes the pplan missing is due to topology not running correctly. If this is the case, you can kill the topology and submit it again and see if the issue resolves.

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 8, 2016

Contributor

@aaronshan @wking1986
All scheduler implementations share similar working-directory (sandbox) structure. For aurora, can u go to the heron-executor.stdout && log-files folder in sandbox folder? (not in ~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout)?

  1. You can use Aurora page to navigate to the webpage showing sandbox content (http://aurora.apache.org/documentation/latest/getting-started/tutorial/, click "chroot browse")
  2. You can ssh to the target sandbox host and enter the sandbox folder.
Contributor

maosongfu commented Jun 8, 2016

@aaronshan @wking1986
All scheduler implementations share similar working-directory (sandbox) structure. For aurora, can u go to the heron-executor.stdout && log-files folder in sandbox folder? (not in ~/.herondata/topologies/{cluster}/{role}/{topologyName}/ heron-executor.stdout)?

  1. You can use Aurora page to navigate to the webpage showing sandbox content (http://aurora.apache.org/documentation/latest/getting-started/tutorial/, click "chroot browse")
  2. You can ssh to the target sandbox host and enter the sandbox folder.
@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 9, 2016

@wking1986 thanks.

@maosongfu
I find task run failed on mesos.
image

I get stderr log on sandbox:
image

log cotent:

I0609 11:19:34.714751 41904 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"\/usr\/bin\/thermos_executor"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/frameworks\/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000\/executors\/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc\/runs\/44dc2a93-f7e6-4892-8bbc-668c3b845e17","user":"root"}
I0609 11:19:34.716109 41904 fetcher.cpp:369] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716125 41904 fetcher.cpp:243] Fetching directly into the sandbox directory
I0609 11:19:34.716142 41904 fetcher.cpp:180] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716159 41904 fetcher.cpp:160] Copying resource with command:cp '/usr/bin/thermos_executor' '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
I0609 11:19:34.754954 41904 fetcher.cpp:446] Fetched '/usr/bin/thermos_executor' to '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
I0609 11:19:35.444795 41901 exec.cpp:134] Version: 0.25.0
I0609 11:19:35.452504 41913 exec.cpp:208] Executor registered on slave 56dd9481-d4b1-4133-a258-51d5a538c46d-S0
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
ERROR] Regular plan unhealthy!
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug:   Active thread: <_MainThread(MainThread, started 139986815493888)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-6, started daemon 139986237478656)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-7, started daemon 139986216498944)>
twitter.common.app debug:   Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc] [TID=41953], started daemon 139986125973248)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-12, started daemon 139986226988800)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-8, started daemon 139986136463104)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-15, started daemon 139986094503680)>
twitter.common.app debug:   Active thread (daemon): <_DummyThread(Dummy-2, started daemon 139986480895744)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-14, started daemon 139986081892096)>
twitter.common.app debug: Exiting cleanly.

How can I solve the problem "ERROR] Regular plan unhealthy!" thank u ~

aaronshan commented Jun 9, 2016

@wking1986 thanks.

@maosongfu
I find task run failed on mesos.
image

I get stderr log on sandbox:
image

log cotent:

I0609 11:19:34.714751 41904 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"\/usr\/bin\/thermos_executor"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/56dd9481-d4b1-4133-a258-51d5a538c46d-S0\/frameworks\/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000\/executors\/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc\/runs\/44dc2a93-f7e6-4892-8bbc-668c3b845e17","user":"root"}
I0609 11:19:34.716109 41904 fetcher.cpp:369] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716125 41904 fetcher.cpp:243] Fetching directly into the sandbox directory
I0609 11:19:34.716142 41904 fetcher.cpp:180] Fetching URI '/usr/bin/thermos_executor'
I0609 11:19:34.716159 41904 fetcher.cpp:160] Copying resource with command:cp '/usr/bin/thermos_executor' '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
I0609 11:19:34.754954 41904 fetcher.cpp:446] Fetched '/usr/bin/thermos_executor' to '/tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17/thermos_executor'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
I0609 11:19:35.444795 41901 exec.cpp:134] Version: 0.25.0
I0609 11:19:35.452504 41913 exec.cpp:208] Executor registered on slave 56dd9481-d4b1-4133-a258-51d5a538c46d-S0
Writing log files to disk in /tmp/mesos/slaves/56dd9481-d4b1-4133-a258-51d5a538c46d-S0/frameworks/dc22c117-1cd9-43fa-bb2c-bee1f5e7500d-0000/executors/thermos-1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc/runs/44dc2a93-f7e6-4892-8bbc-668c3b845e17
ERROR] Regular plan unhealthy!
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug:   Active thread: <_MainThread(MainThread, started 139986815493888)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-6, started daemon 139986237478656)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-7, started daemon 139986216498944)>
twitter.common.app debug:   Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[1465442074668-datadev-devel-ExclamationTopology-1-8c0bb83f-301d-47f9-9e46-43c07f5c13bc] [TID=41953], started daemon 139986125973248)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-12, started daemon 139986226988800)>
twitter.common.app debug:   Active thread (daemon): <Thread(Thread-8, started daemon 139986136463104)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-15, started daemon 139986094503680)>
twitter.common.app debug:   Active thread (daemon): <_DummyThread(Dummy-2, started daemon 139986480895744)>
twitter.common.app debug:   Active thread (daemon): <WaitThread(Thread-14, started daemon 139986081892096)>
twitter.common.app debug: Exiting cleanly.

How can I solve the problem "ERROR] Regular plan unhealthy!" thank u ~

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 9, 2016

Contributor

@aaronshan Can u enter the sandbox folder, at the same level as stderr you opened, which has the same structure as working directory in LocalScheduler, and check the content in heron-executor.stdout?

Contributor

maosongfu commented Jun 9, 2016

@aaronshan Can u enter the sandbox folder, at the same level as stderr you opened, which has the same structure as working directory in LocalScheduler, and check the content in heron-executor.stdout?

@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 9, 2016

@maosongfu
enter sandbox folder:
image
and then enter .logs folder:
image
in fetch_heron_system folder, I can get info from stderr file:
image

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0 37.3M    0 16383    0     0  2674k      0  0:00:14 --:--:--  0:00:14 2674k
100 37.3M  100 37.3M    0     0   826M      0 --:--:-- --:--:-- --:--:--  956M
tar: ./release.yaml: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-executor: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-shell: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-stmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-tmaster: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-local-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-slurm-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing/heron-roundrobin-packing.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr/heron-metricsmgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-localfs-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-zookeeper-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance/heron-instance.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core: implausibly old time stamp 1970-01-01 08:00:00
tar: .: implausibly old time stamp 1970-01-01 08:00:00

this error I report at #845
and in fetch_user_package folder, I can get info from stderr file:
image

curl: (6) Couldn't resolve host 'hdfs:'

I think is problem maybe caused by heron.aurora file config error, my heron.aurora file like this:

"""
Launch the topology as a single aurora job with multiple instances.
The heron-executor is responsible for starting a tmaster (container 0)
and regular stmgr/metricsmgr/instances (container index > 0).
"""

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-core.tar.gz"
topology_package_file = "topology.tar.gz"

# --- processes ---
#fetch_heron_system = Process(
#  name = 'fetch_heron_system',
#  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)
#)

fetch_heron_system = Process(
  name = 'fetch_heron_system',
  cmdline = 'hadoop fs -get  hdfs:///tmp/heron/topologies/aurora/heron-core.tar.gz  . && tar zxf %s' % ( core_release_file)
)


#fetch_user_package = Process(
#  name = 'fetch_user_package',
#  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
#)

fetch_user_package = Process(
  name = 'fetch_user_package',
  cmdline = 'hadoop fs -get  %s  .  && tar zxf %s' % (heron_topology_jar_uri, topology_package_file)
)

aaronshan commented Jun 9, 2016

@maosongfu
enter sandbox folder:
image
and then enter .logs folder:
image
in fetch_heron_system folder, I can get info from stderr file:
image

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0 37.3M    0 16383    0     0  2674k      0  0:00:14 --:--:--  0:00:14 2674k
100 37.3M  100 37.3M    0     0   826M      0 --:--:-- --:--:-- --:--:--  956M
tar: ./release.yaml: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-executor: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-shell: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-stmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin/heron-tmaster: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/bin: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-local-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler/heron-slurm-scheduler.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/scheduler: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing/heron-roundrobin-packing.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/packing: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr/heron-metricsmgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/metricsmgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-localfs-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr/heron-zookeeper-statemgr.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/statemgr: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance/heron-instance.jar: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib/instance: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core/lib: implausibly old time stamp 1970-01-01 08:00:00
tar: ./heron-core: implausibly old time stamp 1970-01-01 08:00:00
tar: .: implausibly old time stamp 1970-01-01 08:00:00

this error I report at #845
and in fetch_user_package folder, I can get info from stderr file:
image

curl: (6) Couldn't resolve host 'hdfs:'

I think is problem maybe caused by heron.aurora file config error, my heron.aurora file like this:

"""
Launch the topology as a single aurora job with multiple instances.
The heron-executor is responsible for starting a tmaster (container 0)
and regular stmgr/metricsmgr/instances (container index > 0).
"""

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-core.tar.gz"
topology_package_file = "topology.tar.gz"

# --- processes ---
#fetch_heron_system = Process(
#  name = 'fetch_heron_system',
#  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)
#)

fetch_heron_system = Process(
  name = 'fetch_heron_system',
  cmdline = 'hadoop fs -get  hdfs:///tmp/heron/topologies/aurora/heron-core.tar.gz  . && tar zxf %s' % ( core_release_file)
)


#fetch_user_package = Process(
#  name = 'fetch_user_package',
#  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
#)

fetch_user_package = Process(
  name = 'fetch_user_package',
  cmdline = 'hadoop fs -get  %s  .  && tar zxf %s' % (heron_topology_jar_uri, topology_package_file)
)
@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 9, 2016

Contributor

@nlu90
Do you know why "curl: (6) Couldn't resolve host 'hdfs:'"?
According to the modified heron.aurora file, "curl" is commented and not even used.

@aaronshan
Can u double check the actual command when running "fetch_user_package"?

20160608214214

On aurora page, you can click the name of process and get it.

Contributor

maosongfu commented Jun 9, 2016

@nlu90
Do you know why "curl: (6) Couldn't resolve host 'hdfs:'"?
According to the modified heron.aurora file, "curl" is commented and not even used.

@aaronshan
Can u double check the actual command when running "fetch_user_package"?

20160608214214

On aurora page, you can click the name of process and get it.

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 9, 2016

@maosongfu @nlu90 @qiuyij Thanks for your help,Heron on Aurora is running!!

wking1986 commented Jun 9, 2016

@maosongfu @nlu90 @qiuyij Thanks for your help,Heron on Aurora is running!!

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 9, 2016

Contributor

@wking1986 Awesome! Aslo, native mesos scheduler and yarn scheduler are coming soon too! Pull requests are being reviewed.

Contributor

maosongfu commented Jun 9, 2016

@wking1986 Awesome! Aslo, native mesos scheduler and yarn scheduler are coming soon too! Pull requests are being reviewed.

@wking1986

This comment has been minimized.

Show comment
Hide comment
@wking1986

wking1986 Jun 9, 2016

@maosongfu Great!! Very much looking forward to Heron on Mesos

wking1986 commented Jun 9, 2016

@maosongfu Great!! Very much looking forward to Heron on Mesos

@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 9, 2016

@maosongfu
I revise the heron.aurora file, and now it can working.I start two mesos slave, and I find that the one run task ok and the other one run task still fail.
image

and when I click hostname:

qq20160609-0 2x

and launch_heron_executor's stdout file and stderr file are empty.

I run these command step by step:

hadoop fs -get  hdfs:///tmp/heron/topologies/main/heron-core.tar.gz  . && tar zxf heron-core.tar.gz
hadoop fs -get hdfs:///tmp/heron/topologies/main/ExclamationTopology-ruifeng.shan-tag-0--5954092425683288689  topology.tar.gz && tar zxf topology.tar.gz
./heron-core/bin/heron-executor 1 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ExclamationTopology.defn 1:word:2:0:exclaim1:1:0 l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron ./heron-core/bin/heron-tmaster ./heron-core/bin/heron-stmgr "./heron-core/lib/metricsmgr/*" "LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;" "heron-examples.jar" 31749 31148 31006 ./heron-conf/heron_internals.yaml exclaim1:536870912,word:536870912 "" jar heron-examples.jar /home/q/java8/jdk1.8.0_91 31985 ./heron-core/bin/heron-shell 31984 main ruifeng.shan devel "./heron-core/lib/instance/*" ./heron-conf/metrics_sinks.yaml "./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*" "31347"

and output is also empty. but heron-executor.stderr info is :

Traceback (most recent call last):
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 319, in execute
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 254, in _wrap_coverage
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 286, in _wrap_profiling
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 362, in _execute
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 420, in execute_entry
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 425, in execute_module
  File "/usr/local/lib/python2.7/runpy.py", line 180, in run_module
    fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 450, in <module>
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 417, in main
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 398, in launch
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 362, in do_run_and_wait
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 352, in run_process
  File "/usr/local/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/local/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

heron-executor.stdout content:

2016-06-09 15:29:41: Set up process group; executor becomes leader
2016-06-09 15:29:41: Register the SIGTERM signal handler
2016-06-09 15:29:41: Register the atexit clean up
2016-06-09 15:29:41: Logging pid 40559 to file heron-executor-1.pid
2016-06-09 15:29:41: Running process as mkdir -p log-files
2016-06-09 15:29:41: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
word 536870912 512 64 128
exclaim1 536870912 512 64 128
2016-06-09 15:29:41: Running heron-shell-1 process as ./heron-core/bin/heron-shell --port=31782 --log_file_prefix=log-files/heron-shell.log
2016-06-09 15:29:41: Logging pid 40569 to file heron-shell-1.pid
2016-06-09 15:29:41: Running container_1_word_2 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml
2016-06-09 15:29:41: Executor terminated; exiting all process in executor.

and the other machine's heron-executor.stdout content:

2016-06-09 17:36:29: Set up process group; executor becomes leader
2016-06-09 17:36:29: Register the SIGTERM signal handler
2016-06-09 17:36:29: Register the atexit clean up
2016-06-09 17:36:29: Logging pid 7100 to file heron-executor-0.pid
2016-06-09 17:36:29: Running process as mkdir -p log-files
2016-06-09 17:36:29: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
2016-06-09 17:36:29: Running heron-shell-0 process as ./heron-core/bin/heron-shell --port=31101 --log_file_prefix=log-files/heron-shell.log
2016-06-09 17:36:29: Logging pid 7110 to file heron-shell-0.pid
2016-06-09 17:36:29: Running metricsmgr-0 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx1024M -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+PrintCommandLineFlags -Xloggc:log-files/gc.metricsmgr.log -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/metricsmgr/* com.twitter.heron.metricsmgr.MetricsManager metricsmgr-0 31132 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml
2016-06-09 17:36:29: Logging pid 7111 to file metricsmgr-0.pid
2016-06-09 17:36:29: Running heron-tmaster process as ./heron-core/bin/heron-tmaster 31481 31107 31866 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron stmgr-1 ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml 31132
2016-06-09 17:36:29: Logging pid 7112 to file heron-tmaster.pid

aaronshan commented Jun 9, 2016

@maosongfu
I revise the heron.aurora file, and now it can working.I start two mesos slave, and I find that the one run task ok and the other one run task still fail.
image

and when I click hostname:

qq20160609-0 2x

and launch_heron_executor's stdout file and stderr file are empty.

I run these command step by step:

hadoop fs -get  hdfs:///tmp/heron/topologies/main/heron-core.tar.gz  . && tar zxf heron-core.tar.gz
hadoop fs -get hdfs:///tmp/heron/topologies/main/ExclamationTopology-ruifeng.shan-tag-0--5954092425683288689  topology.tar.gz && tar zxf topology.tar.gz
./heron-core/bin/heron-executor 1 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ExclamationTopology.defn 1:word:2:0:exclaim1:1:0 l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron ./heron-core/bin/heron-tmaster ./heron-core/bin/heron-stmgr "./heron-core/lib/metricsmgr/*" "LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;" "heron-examples.jar" 31749 31148 31006 ./heron-conf/heron_internals.yaml exclaim1:536870912,word:536870912 "" jar heron-examples.jar /home/q/java8/jdk1.8.0_91 31985 ./heron-core/bin/heron-shell 31984 main ruifeng.shan devel "./heron-core/lib/instance/*" ./heron-conf/metrics_sinks.yaml "./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*" "31347"

and output is also empty. but heron-executor.stderr info is :

Traceback (most recent call last):
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 319, in execute
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 254, in _wrap_coverage
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 286, in _wrap_profiling
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 362, in _execute
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 420, in execute_entry
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/.bootstrap/_pex/pex.py", line 425, in execute_module
  File "/usr/local/lib/python2.7/runpy.py", line 180, in run_module
    fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 450, in <module>
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 417, in main
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 398, in launch
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 362, in do_run_and_wait
  File "/home/ruifeng.shan/heron-core/bin/heron-executor/heron/executor/src/python/heron-executor.py", line 352, in run_process
  File "/usr/local/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/local/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

heron-executor.stdout content:

2016-06-09 15:29:41: Set up process group; executor becomes leader
2016-06-09 15:29:41: Register the SIGTERM signal handler
2016-06-09 15:29:41: Register the atexit clean up
2016-06-09 15:29:41: Logging pid 40559 to file heron-executor-1.pid
2016-06-09 15:29:41: Running process as mkdir -p log-files
2016-06-09 15:29:41: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
word 536870912 512 64 128
exclaim1 536870912 512 64 128
2016-06-09 15:29:41: Running heron-shell-1 process as ./heron-core/bin/heron-shell --port=31782 --log_file_prefix=log-files/heron-shell.log
2016-06-09 15:29:41: Logging pid 40569 to file heron-shell-1.pid
2016-06-09 15:29:41: Running container_1_word_2 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml
2016-06-09 15:29:41: Executor terminated; exiting all process in executor.

and the other machine's heron-executor.stdout content:

2016-06-09 17:36:29: Set up process group; executor becomes leader
2016-06-09 17:36:29: Register the SIGTERM signal handler
2016-06-09 17:36:29: Register the atexit clean up
2016-06-09 17:36:29: Logging pid 7100 to file heron-executor-0.pid
2016-06-09 17:36:29: Running process as mkdir -p log-files
2016-06-09 17:36:29: Running process as chmod a+rx . && chmod a+x log-files && chmod +x ./heron-core/bin/heron-tmaster && chmod +x ./heron-core/bin/heron-stmgr && chmod +x ./heron-core/bin/heron-shell
2016-06-09 17:36:29: Running heron-shell-0 process as ./heron-core/bin/heron-shell --port=31101 --log_file_prefix=log-files/heron-shell.log
2016-06-09 17:36:29: Logging pid 7110 to file heron-shell-0.pid
2016-06-09 17:36:29: Running metricsmgr-0 process as /home/q/java8/jdk1.8.0_91/bin/java -Xmx1024M -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+PrintCommandLineFlags -Xloggc:log-files/gc.metricsmgr.log -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/metricsmgr/* com.twitter.heron.metricsmgr.MetricsManager metricsmgr-0 31132 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml
2016-06-09 17:36:29: Logging pid 7111 to file metricsmgr-0.pid
2016-06-09 17:36:29: Running heron-tmaster process as ./heron-core/bin/heron-tmaster 31481 31107 31866 ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181 /heron stmgr-1 ./heron-conf/heron_internals.yaml ./heron-conf/metrics_sinks.yaml 31132
2016-06-09 17:36:29: Logging pid 7112 to file heron-tmaster.pid
@kartik894

This comment has been minimized.

Show comment
Hide comment
@kartik894

kartik894 Jun 9, 2016

Hi,

I am getting the following error:

Error loading configuration: Could not find job aurora/root/default/ExclamationTopology
Candidates are:
  aurora/root/devel/ExclamationTopology

@wking1986 Where should I exactly change the environment?

kartik894 commented Jun 9, 2016

Hi,

I am getting the following error:

Error loading configuration: Could not find job aurora/root/default/ExclamationTopology
Candidates are:
  aurora/root/devel/ExclamationTopology

@wking1986 Where should I exactly change the environment?

@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 9, 2016

@kartik894
As I known, when u submit topology, you can set env(prod | devel | test | staging).

$ heron help submit
usage: heron submit [options] cluster/[role]/[env] topology-file-name topology-class-name [topology-args]

Required arguments:
  cluster/[role]/[env]  Cluster, role, and environment to run topology
  topology-file-name    Topology jar/tar/zip file
  topology-class-name   Topology class name

Optional arguments:
  --config-path (a string; path to cluster config; default: "/home/q/heron/heron-0.14.0/heron/conf")
  --config-property (key=value; a config key and its value; default: [])
  --deploy-deactivated (a boolean; default: "false")
  --topology-main-jvm-property (property=value; JVM system property for executing topology main; default: [])
  --verbose (a boolean; default: "false")

aaronshan commented Jun 9, 2016

@kartik894
As I known, when u submit topology, you can set env(prod | devel | test | staging).

$ heron help submit
usage: heron submit [options] cluster/[role]/[env] topology-file-name topology-class-name [topology-args]

Required arguments:
  cluster/[role]/[env]  Cluster, role, and environment to run topology
  topology-file-name    Topology jar/tar/zip file
  topology-class-name   Topology class name

Optional arguments:
  --config-path (a string; path to cluster config; default: "/home/q/heron/heron-0.14.0/heron/conf")
  --config-property (key=value; a config key and its value; default: [])
  --deploy-deactivated (a boolean; default: "false")
  --topology-main-jvm-property (property=value; JVM system property for executing topology main; default: [])
  --verbose (a boolean; default: "false")
@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 9, 2016

Contributor

@aaronshan
Hi,

According to the log, heron-executor failed to start a heron-instance process.
Can u try to run the command directly:
/home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml

and check the output?

Contributor

maosongfu commented Jun 9, 2016

@aaronshan
Hi,

According to the log, heron-executor failed to start a heron-instance process.
Can u try to run the command directly:
/home/q/java8/jdk1.8.0_91/bin/java -Xmx320M -Xms320M -Xmn160M -XX:MaxPermSize=128M -XX:PermSize=128M -XX:ReservedCodeCacheSize=64M -XX:+CMSScavengeBeforeRemark -XX:TargetSurvivorRatio=90 -XX:+PrintCommandLineFlags -verbosegc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCCause -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:ParallelGCThreads=4 -Xloggc:log-files/gc.container_1_word_2.log -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -cp ./heron-core/lib/instance/*:heron-examples.jar com.twitter.heron.instance.HeronInstance ExclamationTopology ExclamationTopology603f5dd1-da30-46ac-8e6b-01650fd35cfe container_1_word_2 word 2 0 stmgr-1 31719 31300 ./heron-conf/heron_internals.yaml

and check the output?

@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 10, 2016

@maosongfu thank u very much~ Heron on Aurora is run ok!!

aaronshan commented Jun 10, 2016

@maosongfu thank u very much~ Heron on Aurora is run ok!!

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 10, 2016

Contributor

@aaronshan So what was the issue?

Contributor

maosongfu commented Jun 10, 2016

@aaronshan So what was the issue?

@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 10, 2016

@maosongfu
the problem caused by no directory "/home/q/java8/jdk1.8.0_91". I forgot to configure it on the machine.😂😂😂.

aaronshan commented Jun 10, 2016

@maosongfu
the problem caused by no directory "/home/q/java8/jdk1.8.0_91". I forgot to configure it on the machine.😂😂😂.

@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 10, 2016

@maosongfu
when I sumbit a new topology

heron submit main/ruifeng.shan/devel /home/q/ruifeng.shan/heron-learn-1.0-SNAPSHOT-shaded.jar com.qunar.data.WordCountTopology WordCountTopology

and it still waiting:

[2016-06-10 01:50:54 +0000] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO:  Launching topology in aurora
[2016-06-10 01:50:54 +0000] com.twitter.heron.spi.common.ShellUtils INFO:  $> [aurora, job, create, --wait-until, RUNNING, --bind, TOPOLOGY_NAME=WordCountTopology, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, COMPONENT_RAMMAP=sentence-spout:1073741824,count-bolt:1073741824,report-bolt:1073741824,split-bolt:1073741824, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="", --bind, ROLE=ruifeng.shan, --bind, ENVIRON=devel, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, ISPRODUCTION=false, --bind, TOPOLOGY_CLASSPATH=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, CLUSTER=main, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, STATEMGR_CONNECTION_STRING=l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_ID=WordCountTopology1117b603-69c3-4096-b005-789fa81ea727, --bind, TOPOLOGY_PACKAGE_URI=hdfs:///tmp/heron/topologies/main/WordCountTopology-ruifeng.shan-tag-0--3163552258663319321, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, CORE_PACKAGE_URI=file:///home/q/heron/heron-0.14.0/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, RAM_PER_CONTAINER=5368709120, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, TOPOLOGY_DEFINITION_FILE=WordCountTopology.defn, --bind, INSTANCE_DISTRIBUTION=1:count-bolt:2:0:report-bolt:3:0:split-bolt:4:0:sentence-spout:1:0, --bind, NUM_CONTAINERS=2, --bind, CPUS_PER_CONTAINER=5.0, --bind, TOPOLOGY_JAR_FILE=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, DISK_PER_CONTAINER=17179869184, --bind, STATEMGR_ROOT_PATH=/heron, --bind, HERON_SANDBOX_JAVA_HOME=/home/q/java8/jdk1.8.0_91, main/ruifeng.shan/devel/WordCountTopology, /home/q/heron/heron-0.14.0/heron/conf/main/heron.aurora, --verbose]
[2016-06-10 01:51:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:54 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:54 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms

I find aurora show "PENDING : Insufficient: disk"
image

and mesos resources:
image

If I kill ExclamationTopology and re-submit ExclamationTopology, ExclamationTopology will work.

aaronshan commented Jun 10, 2016

@maosongfu
when I sumbit a new topology

heron submit main/ruifeng.shan/devel /home/q/ruifeng.shan/heron-learn-1.0-SNAPSHOT-shaded.jar com.qunar.data.WordCountTopology WordCountTopology

and it still waiting:

[2016-06-10 01:50:54 +0000] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO:  Launching topology in aurora
[2016-06-10 01:50:54 +0000] com.twitter.heron.spi.common.ShellUtils INFO:  $> [aurora, job, create, --wait-until, RUNNING, --bind, TOPOLOGY_NAME=WordCountTopology, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, COMPONENT_RAMMAP=sentence-spout:1073741824,count-bolt:1073741824,report-bolt:1073741824,split-bolt:1073741824, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="", --bind, ROLE=ruifeng.shan, --bind, ENVIRON=devel, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, ISPRODUCTION=false, --bind, TOPOLOGY_CLASSPATH=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, CLUSTER=main, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, STATEMGR_CONNECTION_STRING=l-hdps1.data.cn5:2181,l-hdps2.data.cn5:2181,l-hdps3.data.cn5:2181, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_ID=WordCountTopology1117b603-69c3-4096-b005-789fa81ea727, --bind, TOPOLOGY_PACKAGE_URI=hdfs:///tmp/heron/topologies/main/WordCountTopology-ruifeng.shan-tag-0--3163552258663319321, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, CORE_PACKAGE_URI=file:///home/q/heron/heron-0.14.0/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, RAM_PER_CONTAINER=5368709120, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, TOPOLOGY_DEFINITION_FILE=WordCountTopology.defn, --bind, INSTANCE_DISTRIBUTION=1:count-bolt:2:0:report-bolt:3:0:split-bolt:4:0:sentence-spout:1:0, --bind, NUM_CONTAINERS=2, --bind, CPUS_PER_CONTAINER=5.0, --bind, TOPOLOGY_JAR_FILE=heron-learn-1.0-SNAPSHOT-shaded.jar, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, DISK_PER_CONTAINER=17179869184, --bind, STATEMGR_ROOT_PATH=/heron, --bind, HERON_SANDBOX_JAVA_HOME=/home/q/java8/jdk1.8.0_91, main/ruifeng.shan/devel/WordCountTopology, /home/q/heron/heron-0.14.0/heron/conf/main/heron.aurora, --verbose]
[2016-06-10 01:51:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:51:54 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:52:54 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:04 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:14 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:24 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:34 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms
[2016-06-10 01:53:44 +0000] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x15515dbd90f005e after 0ms

I find aurora show "PENDING : Insufficient: disk"
image

and mesos resources:
image

If I kill ExclamationTopology and re-submit ExclamationTopology, ExclamationTopology will work.

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 10, 2016

Contributor

@aaronshan
You can specifiy the disk_per_container in Config to override the default one: https://github.com/twitter/heron/blob/master/heron/api/src/java/com/twitter/heron/api/Config.java#L266

As aurora shows, it failed to schedule containers with requested disk. It is related to the Aurora Resource Management we can rarely do anything.

Contributor

maosongfu commented Jun 10, 2016

@aaronshan
You can specifiy the disk_per_container in Config to override the default one: https://github.com/twitter/heron/blob/master/heron/api/src/java/com/twitter/heron/api/Config.java#L266

As aurora shows, it failed to schedule containers with requested disk. It is related to the Aurora Resource Management we can rarely do anything.

@jiandongjia

This comment has been minimized.

Show comment
Hide comment
@jiandongjia

jiandongjia Jun 10, 2016

@maosongfu I had the same problem, but my Aurora PENDING didn't have any tips.

[2016-06-10 12:09:21 +0800] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO:  Launching topology in aurora  
[2016-06-10 12:09:21 +0800] com.twitter.heron.spi.common.ShellUtils INFO:  $> [aurora, job, create, --wait-until, RUNNING, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, ENVIRON=devel, --bind, ROLE=root, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, TOPOLOGY_ID=ExclamationTopology24ef552e-69d1-48ae-ade2-cb9cc932f47e, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/main/ExclamationTopology-root-tag-0--7553500226791833473, --bind, STATEMGR_CONNECTION_STRING=192.168.1.108:2181, --bind, HERON_SANDBOX_JAVA_HOME=/usr/src/jdk1.7.0_79, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, DISK_PER_CONTAINER=1073741824, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, NUM_CONTAINERS=2, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, RAM_PER_CONTAINER=2147483648, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;", --bind, COMPONENT_RAMMAP=exclaim1:536870912,word:536870912, --bind, CORE_PACKAGE_URI=file:///usr/local/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ISPRODUCTION=false, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CLUSTER=main, --bind, CPUS_PER_CONTAINER=1.0, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, TOPOLOGY_JAR_FILE=heron-examples.jar, main/root/devel/ExclamationTopology, /usr/local/heron/conf/main/heron.aurora, --verbose]  
[2016-06-10 12:09:31 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:09:41 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:09:51 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:01 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:11 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:10:21 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:10:31 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:41 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  

jiandongjia commented Jun 10, 2016

@maosongfu I had the same problem, but my Aurora PENDING didn't have any tips.

[2016-06-10 12:09:21 +0800] com.twitter.heron.scheduler.aurora.AuroraLauncher INFO:  Launching topology in aurora  
[2016-06-10 12:09:21 +0800] com.twitter.heron.spi.common.ShellUtils INFO:  $> [aurora, job, create, --wait-until, RUNNING, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, ENVIRON=devel, --bind, ROLE=root, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, TOPOLOGY_ID=ExclamationTopology24ef552e-69d1-48ae-ade2-cb9cc932f47e, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/main/ExclamationTopology-root-tag-0--7553500226791833473, --bind, STATEMGR_CONNECTION_STRING=192.168.1.108:2181, --bind, HERON_SANDBOX_JAVA_HOME=/usr/src/jdk1.7.0_79, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, DISK_PER_CONTAINER=1073741824, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, NUM_CONTAINERS=2, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, RAM_PER_CONTAINER=2147483648, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;", --bind, COMPONENT_RAMMAP=exclaim1:536870912,word:536870912, --bind, CORE_PACKAGE_URI=file:///usr/local/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ISPRODUCTION=false, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CLUSTER=main, --bind, CPUS_PER_CONTAINER=1.0, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, TOPOLOGY_JAR_FILE=heron-examples.jar, main/root/devel/ExclamationTopology, /usr/local/heron/conf/main/heron.aurora, --verbose]  
[2016-06-10 12:09:31 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:09:41 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:09:51 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:01 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:11 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:10:21 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
[2016-06-10 12:10:31 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 1ms  
[2016-06-10 12:10:41 +0800] org.apache.zookeeper.ClientCnxn FINE:  Got ping response for sessionid: 0x1553877b4970008 after 0ms  
@aaronshan

This comment has been minimized.

Show comment
Hide comment
@aaronshan

aaronshan Jun 10, 2016

@maosongfu
thank u. I config the containers disk, cpu,ram with a smaller value, and now it run ok! and do you know that how can I increase my aurora resource config?

aaronshan commented Jun 10, 2016

@maosongfu
thank u. I config the containers disk, cpu,ram with a smaller value, and now it run ok! and do you know that how can I increase my aurora resource config?

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 10, 2016

Contributor

@jiandongjia @aaronshan
You migh get more insights from Aurora Offical Website: http://aurora.apache.org/

Contributor

maosongfu commented Jun 10, 2016

@jiandongjia @aaronshan
You migh get more insights from Aurora Offical Website: http://aurora.apache.org/

@kartik894

This comment has been minimized.

Show comment
Hide comment
@kartik894

kartik894 Jun 10, 2016

I am using HDFS uploader for the aurora cluster. I am getting the following error upon submitting the topology:

Caused by: java.lang.IllegalArgumentException: Invalid path string "/hdfs:///heron/topologies/foo" caused by empty node name specified @7

These are my config files:

scheduler.yaml

# scheduler class for distributing the topology for execution
heron.class.scheduler: com.twitter.heron.scheduler.aurora.AuroraScheduler

# launcher class for submitting and launching the topology
heron.class.launcher: com.twitter.heron.scheduler.aurora.AuroraLauncher

# location of the core package
heron.package.core.uri: hdfs:///tmp/.heron/dist/heron-core.tar.gz

# location of java - pick it up from shell environment
heron.directory.sandbox.java.home: /usr/lib/jvm/java-8-oracle

# Invoke the IScheduler as a library directly
heron.scheduler.is.service: False

statemgr.yaml

# local state manager class for managing state in a persistent fashion
heron.class.state.manager: com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager

# local state manager connection string
heron.statemgr.connection.string:  "masternode:2181"

# path of the root address to store the state in a local file system
heron.statemgr.root.path: "hdfs:///heron"

# create the zookeeper nodes, if they do not exist
heron.statemgr.zookeeper.is.initialize.tree: True

# timeout in ms to wait before considering zookeeper session is dead
heron.statemgr.zookeeper.session.timeout.ms: 30000

# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.connection.timeout.ms: 30000

# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.retry.count: 10

# duration of time to wait until the next retry
heron.statemgr.zookeeper.retry.interval.ms: 10000

uploader.yaml

# uploader class for transferring the topology jar/tar files to storage
heron.class.uploader: com.twitter.heron.uploader.hdfs.HdfsUploader

# Directory of config files for hadoop client to read from
heron.uploader.hdfs.config.directory: /usr/local/hadoop/etc/hadoop

# name of the directory to upload topologies for HDFS uploader
heron.uploader.hdfs.topologies.directory.uri: hdfs:///heron/topologies/${CLUSTER}

client.yaml

# location of the core package
heron.package.core.uri:                      "hdfs:///tmp/.heron/dist/heron-core.tar.gz"

# Whether role/env is required to submit a topology. Default value is False.
heron.config.is.role.required:               False
heron.config.is.env.required:               False

Is there anything wrong in the config files?

kartik894 commented Jun 10, 2016

I am using HDFS uploader for the aurora cluster. I am getting the following error upon submitting the topology:

Caused by: java.lang.IllegalArgumentException: Invalid path string "/hdfs:///heron/topologies/foo" caused by empty node name specified @7

These are my config files:

scheduler.yaml

# scheduler class for distributing the topology for execution
heron.class.scheduler: com.twitter.heron.scheduler.aurora.AuroraScheduler

# launcher class for submitting and launching the topology
heron.class.launcher: com.twitter.heron.scheduler.aurora.AuroraLauncher

# location of the core package
heron.package.core.uri: hdfs:///tmp/.heron/dist/heron-core.tar.gz

# location of java - pick it up from shell environment
heron.directory.sandbox.java.home: /usr/lib/jvm/java-8-oracle

# Invoke the IScheduler as a library directly
heron.scheduler.is.service: False

statemgr.yaml

# local state manager class for managing state in a persistent fashion
heron.class.state.manager: com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager

# local state manager connection string
heron.statemgr.connection.string:  "masternode:2181"

# path of the root address to store the state in a local file system
heron.statemgr.root.path: "hdfs:///heron"

# create the zookeeper nodes, if they do not exist
heron.statemgr.zookeeper.is.initialize.tree: True

# timeout in ms to wait before considering zookeeper session is dead
heron.statemgr.zookeeper.session.timeout.ms: 30000

# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.connection.timeout.ms: 30000

# timeout in ms to wait before considering zookeeper connection is dead
heron.statemgr.zookeeper.retry.count: 10

# duration of time to wait until the next retry
heron.statemgr.zookeeper.retry.interval.ms: 10000

uploader.yaml

# uploader class for transferring the topology jar/tar files to storage
heron.class.uploader: com.twitter.heron.uploader.hdfs.HdfsUploader

# Directory of config files for hadoop client to read from
heron.uploader.hdfs.config.directory: /usr/local/hadoop/etc/hadoop

# name of the directory to upload topologies for HDFS uploader
heron.uploader.hdfs.topologies.directory.uri: hdfs:///heron/topologies/${CLUSTER}

client.yaml

# location of the core package
heron.package.core.uri:                      "hdfs:///tmp/.heron/dist/heron-core.tar.gz"

# Whether role/env is required to submit a topology. Default value is False.
heron.config.is.role.required:               False
heron.config.is.env.required:               False

Is there anything wrong in the config files?

@maosongfu

This comment has been minimized.

Show comment
Hide comment
@maosongfu

maosongfu Jun 10, 2016

Contributor

@kartik894
It is caused by invalid config value in statemgr.yaml when trying to connect zookeeper:
heron.statemgr.root.path: "hdfs:///heron"
You can try with: /heron
Or check zookeeper for path format.

Contributor

maosongfu commented Jun 10, 2016

@kartik894
It is caused by invalid config value in statemgr.yaml when trying to connect zookeeper:
heron.statemgr.root.path: "hdfs:///heron"
You can try with: /heron
Or check zookeeper for path format.

@kartik894

This comment has been minimized.

Show comment
Hide comment
@kartik894

kartik894 Jun 13, 2016

@maosongfu Thanks! Its running now.

kartik894 commented Jun 13, 2016

@maosongfu Thanks! Its running now.

@mhajibaba

This comment has been minimized.

Show comment
Hide comment
@mhajibaba

mhajibaba Jul 2, 2016

@maosongfu I have the same problem with error message in #883, but i get the following messages:

[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO:    
[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO:  DEBUG] Command=(['job', 'create', '--wait-until', 'RUNNING', '--bind', 'TOPOLOGY_NAME=ExclamationTopology', '--bind', 'SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml', '--bind', 'COMPONENT_RAMMAP=exclaim1:536870912,word:536870912', '--bind', 'SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml', '--bind', 'INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;"', '--bind', 'ROLE=root', '--bind', 'ENVIRON=devel', '--bind', 'SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*', '--bind', 'SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*', '--bind', 'ISPRODUCTION=false', '--bind', 'TOPOLOGY_CLASSPATH=heron-examples.jar', '--bind', 'CLUSTER=aurora', '--bind', 'SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor', '--bind', 'STATEMGR_CONNECTION_STRING=192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181', '--bind', 'COMPONENT_JVM_OPTS_IN_BASE64=""', '--bind', 'TOPOLOGY_ID=ExclamationTopologyc2f53ad0-76be-4e83-8c63-2134faede687', '--bind', 'TOPOLOGY_PACKAGE_URI=file:///root/.herondata/repository/topologies/aurora/root/ExclamationTopology/ExclamationTopology-root-tag-0--3706733491519378097', '--bind', 'SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr', '--bind', 'CORE_PACKAGE_URI=file:///root/.heron/dist/heron-core.tar.gz', '--bind', 'SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*', '--bind', 'TOPOLOGY_PACKAGE_TYPE=jar', '--bind', 'RAM_PER_CONTAINER=2147483648', '--bind', 'SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster', '--bind', 'TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn', '--bind', 'INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0', '--bind', 'NUM_CONTAINERS=2', '--bind', 'CPUS_PER_CONTAINER=1.0', '--bind', 'TOPOLOGY_JAR_FILE=heron-examples.jar', '--bind', 'SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell', '--bind', 'DISK_PER_CONTAINER=1073741824', '--bind', 'STATEMGR_ROOT_PATH=/heron', '--bind', 'HERON_SANDBOX_JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64', 'aurora/root/devel/ExclamationTopology', '/root/.heron/conf/aurora/heron.aurora', '--verbose'])
DEBUG] Config: ['"""\n', 'Launch the topology as a single aurora job with multiple instances.\n', 'The heron-executor is responsible for starting a tmaster (container 0)\n', 'and regular stmgr/metricsmgr/instances (container index > 0).\n', '"""\n', '\n', "heron_core_release_uri = '{{CORE_PACKAGE_URI}}'\n", "heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'\n", 'core_release_file = "heron-core.tar.gz"\n', 'topology_package_file = "topology.tar.gz"\n', '\n', '# --- processes ---\n', 'fetch_heron_system = Process(\n', "  name = 'fetch_heron_system',\n", "  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)\n", ')\n', '\n', 'fetch_user_package = Process(\n', "  name = 'fetch_user_package',\n", "  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)\n", ')\n', '\n', 'command_to_start_executor = \'{{SANDBOX_EXECUTOR_BINARY}} {{mesos.instance}} {{TOPOLOGY_NAME}} {{TOPOLOGY_ID}} {{TOPOLOGY_DEFINITION_FILE}} {{INSTANCE_DISTRIBUTION}} {{STATEMGR_CONNECTION_STRING}} {{STATEMGR_ROOT_PATH}} {{SANDBOX_TMASTER_BINARY}} {{SANDBOX_STMGR_BINARY}} "{{SANDBOX_METRICSMGR_CLASSPATH}}" {{INSTANCE_JVM_OPTS_IN_BASE64}} "{{TOPOLOGY_CLASSPATH}}" {{thermos.ports[port1]}} {{thermos.ports[port2]}} {{thermos.ports[port3]}} {{SANDBOX_SYSTEM_YAML}} {{COMPONENT_RAMMAP}} {{COMPONENT_JVM_OPTS_IN_BASE64}} {{TOPOLOGY_PACKAGE_TYPE}} {{TOPOLOGY_JAR_FILE}} {{HERON_SANDBOX_JAVA_HOME}} {{thermos.ports[http]}} {{SANDBOX_SHELL_BINARY}} {{thermos.ports[port4]}} {{CLUSTER}} {{ROLE}} {{ENVIRON}} "{{SANDBOX_INSTANCE_CLASSPATH}}" {{SANDBOX_METRICS_YAML}} "{{SANDBOX_SCHEDULER_CLASSPATH}}" "{{thermos.ports[scheduler]}}"\'\n', '\n', 'launch_heron_executor = Process(\n', "  name = 'launch_heron_executor',\n", '  cmdline = command_to_start_executor,\n', '  max_failures = 1\n', ')\n', '\n', 'discover_profiler_port = Process(\n', "  name = 'discover_profiler_port',\n", "  cmdline = 'echo {{thermos.ports[yourkit]}} > yourkit.port'\n", ')\n', '\n', '# --- tasks ---\n', 'heron_task = SequentialTask(\n', "  name = 'setup_and_run',\n", '  processes = [fetch_heron_system, fetch_user_package, launch_heron_executor, discover_profiler_port],\n', "  resources = Resources(cpu = '{{CPUS_PER_CONTAINER}}', ram = '{{RAM_PER_CONTAINER}}', disk = '{{DISK_PER_CONTAINER}}')\n", ')\n', '\n', '# -- jobs ---\n', 'jobs = [\n', '  Job(\n', "    name = '{{TOPOLOGY_NAME}}',\n", "    cluster = '{{CLUSTER}}',\n", "    role = '{{ROLE}}',\n", "    environment = '{{ENVIRON}}',\n", '    service = True,\n', '    task = heron_task,\n', "    instances = '{{NUM_CONTAINERS}}',\n", "    announce = Announcer(primary_port = 'http')\n", '  )\n', ']\n']
Unknown cluster: aurora

[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.utils.SchedulerUtils SEVERE:  Failed to invoke IScheduler as library  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 19,2  replyHeader:: 19,4294967667,0  request:: '/heron/executionstate/ExclamationTopology,-1  response:: null  
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST:  Trace: DeleteBuilderImpl-Foreground - 9 ms  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Deleted node for path: /heron/executionstate/ExclamationTopology  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 20,2  replyHeader:: 20,4294967668,0  request:: '/heron/topologies/ExclamationTopology,-1  response:: null  
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST:  Trace: DeleteBuilderImpl-Foreground - 7 ms  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Deleted node for path: /heron/topologies/ExclamationTopology  
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.LaunchRunner SEVERE:  Failed to launch topology  
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.SubmitterMain SEVERE:  Failed to launch topology. Attempting to roll back upload.  
[2016-07-02 16:27:05 +0430] com.twitter.heron.uploader.localfs.LocalFileSystemUploader INFO:  Clean uploaded jar  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the CuratorClient to: 192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181  
[2016-07-02 16:27:05 +0430] org.apache.curator.framework.imps.CuratorFrameworkImpl FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.curator.CuratorZookeeperClient FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.curator.ConnectionState FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper FINE:  Closing session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Closing client for session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 21,-11  replyHeader:: 21,4294967669,0  request:: null response:: null  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Disconnecting client for session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn INFO:  EventThread shut down  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper INFO:  Session: 0x255aabf8eaa0028 closed  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the tunnel processes  
Exception in thread "main" java.lang.RuntimeException: Failed to submit topology ExclamationTopology
    at com.twitter.heron.scheduler.SubmitterMain.main(SubmitterMain.java:319)
ERROR: Failed to launch topology 'ExclamationTopology' because User main failed with status 1. Bailing out...
INFO: Elapsed time: 3.951s.

I changed the env role and ..., but issue didn't solved.

mhajibaba commented Jul 2, 2016

@maosongfu I have the same problem with error message in #883, but i get the following messages:

[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO:    
[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.common.ShellUtils INFO:  DEBUG] Command=(['job', 'create', '--wait-until', 'RUNNING', '--bind', 'TOPOLOGY_NAME=ExclamationTopology', '--bind', 'SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml', '--bind', 'COMPONENT_RAMMAP=exclaim1:536870912,word:536870912', '--bind', 'SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml', '--bind', 'INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;"', '--bind', 'ROLE=root', '--bind', 'ENVIRON=devel', '--bind', 'SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*', '--bind', 'SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*', '--bind', 'ISPRODUCTION=false', '--bind', 'TOPOLOGY_CLASSPATH=heron-examples.jar', '--bind', 'CLUSTER=aurora', '--bind', 'SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor', '--bind', 'STATEMGR_CONNECTION_STRING=192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181', '--bind', 'COMPONENT_JVM_OPTS_IN_BASE64=""', '--bind', 'TOPOLOGY_ID=ExclamationTopologyc2f53ad0-76be-4e83-8c63-2134faede687', '--bind', 'TOPOLOGY_PACKAGE_URI=file:///root/.herondata/repository/topologies/aurora/root/ExclamationTopology/ExclamationTopology-root-tag-0--3706733491519378097', '--bind', 'SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr', '--bind', 'CORE_PACKAGE_URI=file:///root/.heron/dist/heron-core.tar.gz', '--bind', 'SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*', '--bind', 'TOPOLOGY_PACKAGE_TYPE=jar', '--bind', 'RAM_PER_CONTAINER=2147483648', '--bind', 'SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster', '--bind', 'TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn', '--bind', 'INSTANCE_DISTRIBUTION=1:word:2:0:exclaim1:1:0', '--bind', 'NUM_CONTAINERS=2', '--bind', 'CPUS_PER_CONTAINER=1.0', '--bind', 'TOPOLOGY_JAR_FILE=heron-examples.jar', '--bind', 'SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell', '--bind', 'DISK_PER_CONTAINER=1073741824', '--bind', 'STATEMGR_ROOT_PATH=/heron', '--bind', 'HERON_SANDBOX_JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64', 'aurora/root/devel/ExclamationTopology', '/root/.heron/conf/aurora/heron.aurora', '--verbose'])
DEBUG] Config: ['"""\n', 'Launch the topology as a single aurora job with multiple instances.\n', 'The heron-executor is responsible for starting a tmaster (container 0)\n', 'and regular stmgr/metricsmgr/instances (container index > 0).\n', '"""\n', '\n', "heron_core_release_uri = '{{CORE_PACKAGE_URI}}'\n", "heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'\n", 'core_release_file = "heron-core.tar.gz"\n', 'topology_package_file = "topology.tar.gz"\n', '\n', '# --- processes ---\n', 'fetch_heron_system = Process(\n', "  name = 'fetch_heron_system',\n", "  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)\n", ')\n', '\n', 'fetch_user_package = Process(\n', "  name = 'fetch_user_package',\n", "  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)\n", ')\n', '\n', 'command_to_start_executor = \'{{SANDBOX_EXECUTOR_BINARY}} {{mesos.instance}} {{TOPOLOGY_NAME}} {{TOPOLOGY_ID}} {{TOPOLOGY_DEFINITION_FILE}} {{INSTANCE_DISTRIBUTION}} {{STATEMGR_CONNECTION_STRING}} {{STATEMGR_ROOT_PATH}} {{SANDBOX_TMASTER_BINARY}} {{SANDBOX_STMGR_BINARY}} "{{SANDBOX_METRICSMGR_CLASSPATH}}" {{INSTANCE_JVM_OPTS_IN_BASE64}} "{{TOPOLOGY_CLASSPATH}}" {{thermos.ports[port1]}} {{thermos.ports[port2]}} {{thermos.ports[port3]}} {{SANDBOX_SYSTEM_YAML}} {{COMPONENT_RAMMAP}} {{COMPONENT_JVM_OPTS_IN_BASE64}} {{TOPOLOGY_PACKAGE_TYPE}} {{TOPOLOGY_JAR_FILE}} {{HERON_SANDBOX_JAVA_HOME}} {{thermos.ports[http]}} {{SANDBOX_SHELL_BINARY}} {{thermos.ports[port4]}} {{CLUSTER}} {{ROLE}} {{ENVIRON}} "{{SANDBOX_INSTANCE_CLASSPATH}}" {{SANDBOX_METRICS_YAML}} "{{SANDBOX_SCHEDULER_CLASSPATH}}" "{{thermos.ports[scheduler]}}"\'\n', '\n', 'launch_heron_executor = Process(\n', "  name = 'launch_heron_executor',\n", '  cmdline = command_to_start_executor,\n', '  max_failures = 1\n', ')\n', '\n', 'discover_profiler_port = Process(\n', "  name = 'discover_profiler_port',\n", "  cmdline = 'echo {{thermos.ports[yourkit]}} > yourkit.port'\n", ')\n', '\n', '# --- tasks ---\n', 'heron_task = SequentialTask(\n', "  name = 'setup_and_run',\n", '  processes = [fetch_heron_system, fetch_user_package, launch_heron_executor, discover_profiler_port],\n', "  resources = Resources(cpu = '{{CPUS_PER_CONTAINER}}', ram = '{{RAM_PER_CONTAINER}}', disk = '{{DISK_PER_CONTAINER}}')\n", ')\n', '\n', '# -- jobs ---\n', 'jobs = [\n', '  Job(\n', "    name = '{{TOPOLOGY_NAME}}',\n", "    cluster = '{{CLUSTER}}',\n", "    role = '{{ROLE}}',\n", "    environment = '{{ENVIRON}}',\n", '    service = True,\n', '    task = heron_task,\n', "    instances = '{{NUM_CONTAINERS}}',\n", "    announce = Announcer(primary_port = 'http')\n", '  )\n', ']\n']
Unknown cluster: aurora

[2016-07-02 16:27:05 +0430] com.twitter.heron.spi.utils.SchedulerUtils SEVERE:  Failed to invoke IScheduler as library  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 19,2  replyHeader:: 19,4294967667,0  request:: '/heron/executionstate/ExclamationTopology,-1  response:: null  
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST:  Trace: DeleteBuilderImpl-Foreground - 9 ms  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Deleted node for path: /heron/executionstate/ExclamationTopology  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 20,2  replyHeader:: 20,4294967668,0  request:: '/heron/topologies/ExclamationTopology,-1  response:: null  
[2016-07-02 16:27:05 +0430] org.apache.curator.utils.DefaultTracerDriver FINEST:  Trace: DeleteBuilderImpl-Foreground - 7 ms  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Deleted node for path: /heron/topologies/ExclamationTopology  
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.LaunchRunner SEVERE:  Failed to launch topology  
[2016-07-02 16:27:05 +0430] com.twitter.heron.scheduler.SubmitterMain SEVERE:  Failed to launch topology. Attempting to roll back upload.  
[2016-07-02 16:27:05 +0430] com.twitter.heron.uploader.localfs.LocalFileSystemUploader INFO:  Clean uploaded jar  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the CuratorClient to: 192.168.11.231:2181,192.168.11.232:2181,192.168.11.233:2181  
[2016-07-02 16:27:05 +0430] org.apache.curator.framework.imps.CuratorFrameworkImpl FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.curator.CuratorZookeeperClient FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.curator.ConnectionState FINE:  Closing  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper FINE:  Closing session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Closing client for session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Reading reply sessionid:0x255aabf8eaa0028, packet:: clientPath:null serverPath:null finished:false header:: 21,-11  replyHeader:: 21,4294967669,0  request:: null response:: null  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn FINE:  Disconnecting client for session: 0x255aabf8eaa0028  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ClientCnxn INFO:  EventThread shut down  
[2016-07-02 16:27:05 +0430] org.apache.zookeeper.ZooKeeper INFO:  Session: 0x255aabf8eaa0028 closed  
[2016-07-02 16:27:05 +0430] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager INFO:  Closing the tunnel processes  
Exception in thread "main" java.lang.RuntimeException: Failed to submit topology ExclamationTopology
    at com.twitter.heron.scheduler.SubmitterMain.main(SubmitterMain.java:319)
ERROR: Failed to launch topology 'ExclamationTopology' because User main failed with status 1. Bailing out...
INFO: Elapsed time: 3.951s.

I changed the env role and ..., but issue didn't solved.

@kartik894

This comment has been minimized.

Show comment
Hide comment
@kartik894

kartik894 Jul 2, 2016

Check /etc/aurora/clusters.json file . Change name of cluster to 'aurora'

kartik894 commented Jul 2, 2016

Check /etc/aurora/clusters.json file . Change name of cluster to 'aurora'

@mhajibaba

This comment has been minimized.

Show comment
Hide comment
@mhajibaba

mhajibaba Jul 2, 2016

@kartik894 Thanks a lot! It resolved.

mhajibaba commented Jul 2, 2016

@kartik894 Thanks a lot! It resolved.

@harbby

This comment has been minimized.

Show comment
Hide comment
@harbby

harbby Jul 5, 2016

thanks! I deployed in ubuntu 16.04 and centos7 successful, but centos6 submit a job error occurred.

harbby commented Jul 5, 2016

thanks! I deployed in ubuntu 16.04 and centos7 successful, but centos6 submit a job error occurred.

@chatterjeesubarna

This comment has been minimized.

Show comment
Hide comment
@chatterjeesubarna

chatterjeesubarna Mar 29, 2017

Hello, I am new to Heron. I am submitting a topology as root.

Initially, I did "heron submit aurora/ubuntu/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"

Then I got the error "Failed to initialize sandbox: Could not create sandbox because user does not exist: ubuntu"

So I modified and did this:

heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"

I am getting this error:

E0329 14:34:14.479283 32970 runner.py:299] Regular plan unhealthy!

Can someone help? Thanks a lot!

chatterjeesubarna commented Mar 29, 2017

Hello, I am new to Heron. I am submitting a topology as root.

Initially, I did "heron submit aurora/ubuntu/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"

Then I got the error "Failed to initialize sandbox: Could not create sandbox because user does not exist: ubuntu"

So I modified and did this:

heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --verbose"

I am getting this error:

E0329 14:34:14.479283 32970 runner.py:299] Regular plan unhealthy!

Can someone help? Thanks a lot!

@huijunwu

This comment has been minimized.

Show comment
Hide comment
@huijunwu

huijunwu Mar 29, 2017

Member

@chatterjeesubarna i guess, your first submit created some metadata in zookeeper, which your second submit conflited with. i suggest try to submit with a different name
heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopologyDifferent1 --verbose
@maosongfu to confirm

Member

huijunwu commented Mar 29, 2017

@chatterjeesubarna i guess, your first submit created some metadata in zookeeper, which your second submit conflited with. i suggest try to submit with a different name
heron submit aurora/root/devel --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopologyDifferent1 --verbose
@maosongfu to confirm

@chatterjeesubarna

This comment has been minimized.

Show comment
Hide comment
@chatterjeesubarna

chatterjeesubarna Mar 30, 2017

chatterjeesubarna commented Mar 30, 2017

@billonahill

This comment has been minimized.

Show comment
Hide comment
@billonahill

billonahill Mar 30, 2017

Contributor

@chatterjeesubarna I've answered your question on the mailing list. Please don't double post questions on both the mailing list and git issues. Also, troubleshooting questions are best handled on the mailing list. Git issues should just be for bugs or feature requests.

Contributor

billonahill commented Mar 30, 2017

@chatterjeesubarna I've answered your question on the mailing list. Please don't double post questions on both the mailing list and git issues. Also, troubleshooting questions are best handled on the mailing list. Git issues should just be for bugs or feature requests.

@bjmota

This comment has been minimized.

Show comment
Hide comment
@bjmota

bjmota May 25, 2017

Hello! I have a problem developing a Heron Cluster, when I submit the ExclamationTopoly......

b1@master_1:~$ heron submit aurora/b1/devel  --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology
[2017-05-25 09:18:08 +0000] [INFO]: Using config file under /home/b1/.heron/conf/aurora
[2017-05-25 09:18:08 +0000] [INFO]: Launching topology: 'ExclamationTopology'
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: 192.168.57.163:2181  
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting  
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -test -e /heron/topologies/aurora''  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Target topology file already exists at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'. Overwriting it now  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Uploading topology package at '/tmp/tmpvMxs3m/topology.tar.gz' to target HDFS at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -copyFromLocal -f /tmp/tmpvMxs3m/topology.tar.gz /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/executionstate/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.aurora.AuroraLauncher: Launching topology in aurora  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.utils.SchedulerUtils: Updating scheduled-resource in packing plan: ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``aurora job create --wait-until RUNNING --bind STMGR_BINARY=./heron-core/bin/heron-stmgr --bind RAM_PER_CONTAINER=11811160064 --bind TOPOLOGY_PACKAGE_TYPE=jar --bind SHELL_BINARY=./heron-core/bin/heron-shell --bind TMASTER_BINARY=./heron-core/bin/heron-tmaster --bind STATEMGR_ROOT_PATH=/heron --bind TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz --bind JAVA_HOME=/usr/lib/jvm/java-8-oracle --bind CLUSTER=aurora --bind TOPOLOGY_BINARY_FILE=heron-examples.jar --bind SYSTEM_YAML=./heron-conf/heron_internals.yaml --bind EXECUTOR_BINARY=./heron-core/bin/heron-executor --bind CPUS_PER_CONTAINER=5.0 --bind IS_PRODUCTION=false --bind PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance --bind METRICS_YAML=./heron-conf/metrics_sinks.yaml --bind CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz --bind TOPOLOGY_CLASSPATH=heron-examples.jar --bind TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6 --bind ROLE=b1 --bind COMPONENT_JVM_OPTS_IN_BASE64="" --bind TOPOLOGY_NAME=ExclamationTopology --bind STATEMGR_CONNECTION_STRING=192.168.57.163:2181 --bind INSTANCE_CLASSPATH=./heron-core/lib/instance/* --bind DISK_PER_CONTAINER=5368709120 --bind COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472 --bind METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/* --bind ENVIRON=devel --bind SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/* --bind TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn --bind NUM_CONTAINERS=3 --bind INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;" aurora/b1/devel/ExclamationTopology /home/b1/.heron/conf/aurora/heron.aurora''  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
Error loading configuration: Unknown cluster: aurora
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.aurora.AuroraCLIController: Failed to run process. Command=[aurora, job, create, --wait-until, RUNNING, --bind, STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, RAM_PER_CONTAINER=11811160064, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz, --bind, JAVA_HOME=/usr/lib/jvm/java-8-oracle, --bind, CLUSTER=aurora, --bind, TOPOLOGY_BINARY_FILE=heron-examples.jar, --bind, SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CPUS_PER_CONTAINER=5.0, --bind, IS_PRODUCTION=false, --bind, PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance, --bind, METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6, --bind, ROLE=b1, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, STATEMGR_CONNECTION_STRING=192.168.57.163:2181, --bind, INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, DISK_PER_CONTAINER=5368709120, --bind, COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472, --bind, METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ENVIRON=devel, --bind, SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, NUM_CONTAINERS=3, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;", aurora/b1/devel/ExclamationTopology, /home/b1/.heron/conf/aurora/heron.aurora], STDOUT=, STDERR=Error loading configuration: Unknown cluster: aurora  
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.utils.LauncherUtils: Failed to invoke IScheduler as library  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/executionstate/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -rm /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
Deleted /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: 192.168.57.163:2181  
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [INFO]: Elapsed time: 15.872s.


Also in Aurora I only see the Example in thehttp://192.168.57.163:8081/scheduler....

Help me, Please!

bjmota commented May 25, 2017

Hello! I have a problem developing a Heron Cluster, when I submit the ExclamationTopoly......

b1@master_1:~$ heron submit aurora/b1/devel  --config-path ~/.heron/conf/ ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology
[2017-05-25 09:18:08 +0000] [INFO]: Using config file under /home/b1/.heron/conf/aurora
[2017-05-25 09:18:08 +0000] [INFO]: Launching topology: 'ExclamationTopology'
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: 192.168.57.163:2181  
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting  
[2017-05-25 09:18:09 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -test -e /heron/topologies/aurora''  
[2017-05-25 09:18:09 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Target topology file already exists at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'. Overwriting it now  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Uploading topology package at '/tmp/tmpvMxs3m/topology.tar.gz' to target HDFS at '/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz'  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -copyFromLocal -f /tmp/tmpvMxs3m/topology.tar.gz /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''  
[2017-05-25 09:18:13 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:17 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/executionstate/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.aurora.AuroraLauncher: Launching topology in aurora  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.scheduler.utils.SchedulerUtils: Updating scheduled-resource in packing plan: ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``aurora job create --wait-until RUNNING --bind STMGR_BINARY=./heron-core/bin/heron-stmgr --bind RAM_PER_CONTAINER=11811160064 --bind TOPOLOGY_PACKAGE_TYPE=jar --bind SHELL_BINARY=./heron-core/bin/heron-shell --bind TMASTER_BINARY=./heron-core/bin/heron-tmaster --bind STATEMGR_ROOT_PATH=/heron --bind TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz --bind JAVA_HOME=/usr/lib/jvm/java-8-oracle --bind CLUSTER=aurora --bind TOPOLOGY_BINARY_FILE=heron-examples.jar --bind SYSTEM_YAML=./heron-conf/heron_internals.yaml --bind EXECUTOR_BINARY=./heron-core/bin/heron-executor --bind CPUS_PER_CONTAINER=5.0 --bind IS_PRODUCTION=false --bind PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance --bind METRICS_YAML=./heron-conf/metrics_sinks.yaml --bind CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz --bind TOPOLOGY_CLASSPATH=heron-examples.jar --bind TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6 --bind ROLE=b1 --bind COMPONENT_JVM_OPTS_IN_BASE64="" --bind TOPOLOGY_NAME=ExclamationTopology --bind STATEMGR_CONNECTION_STRING=192.168.57.163:2181 --bind INSTANCE_CLASSPATH=./heron-core/lib/instance/* --bind DISK_PER_CONTAINER=5368709120 --bind COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472 --bind METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/* --bind ENVIRON=devel --bind SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/* --bind TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn --bind NUM_CONTAINERS=3 --bind INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;" aurora/b1/devel/ExclamationTopology /home/b1/.heron/conf/aurora/heron.aurora''  
[2017-05-25 09:18:18 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
Error loading configuration: Unknown cluster: aurora
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.aurora.AuroraCLIController: Failed to run process. Command=[aurora, job, create, --wait-until, RUNNING, --bind, STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, RAM_PER_CONTAINER=11811160064, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, SHELL_BINARY=./heron-core/bin/heron-shell, --bind, TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, STATEMGR_ROOT_PATH=/heron, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz, --bind, JAVA_HOME=/usr/lib/jvm/java-8-oracle, --bind, CLUSTER=aurora, --bind, TOPOLOGY_BINARY_FILE=heron-examples.jar, --bind, SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, CPUS_PER_CONTAINER=5.0, --bind, IS_PRODUCTION=false, --bind, PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance, --bind, METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, TOPOLOGY_ID=ExclamationTopologyf7fa5898-9d12-4e3d-917c-e24be0fd9ef6, --bind, ROLE=b1, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_NAME=ExclamationTopology, --bind, STATEMGR_CONNECTION_STRING=192.168.57.163:2181, --bind, INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, DISK_PER_CONTAINER=5368709120, --bind, COMPONENT_RAMMAP=exclaim1:3221225472,word:3221225472, --bind, METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, ENVIRON=devel, --bind, SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, TOPOLOGY_DEFINITION_FILE=ExclamationTopology.defn, --bind, NUM_CONTAINERS=3, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg&equals;&equals;", aurora/b1/devel/ExclamationTopology, /home/b1/.heron/conf/aurora/heron.aurora], STDOUT=, STDERR=Error loading configuration: Unknown cluster: aurora  
[2017-05-25 09:18:20 -0700] [SEVERE] com.twitter.heron.scheduler.utils.LauncherUtils: Failed to invoke IScheduler as library  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/executionstate/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/topologies/ExclamationTopology  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/lib/hadoop-2.8.0/etc/hadoop fs -rm /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz''  
[2017-05-25 09:18:20 -0700] [INFO] com.twitter.heron.spi.utils.ShellUtils: Process output (stdout+stderr):  
Deleted /heron/topologies/aurora/ExclamationTopology-b1-tag-0--7108568726115264257.tar.gz
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: 192.168.57.163:2181  
[2017-05-25 09:18:23 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [ERROR]: Failed to launch topology 'ExclamationTopology'
[2017-05-25 09:18:23 +0000] [INFO]: Elapsed time: 15.872s.


Also in Aurora I only see the Example in thehttp://192.168.57.163:8081/scheduler....

Help me, Please!

@billonahill

This comment has been minimized.

Show comment
Hide comment
@billonahill

billonahill May 25, 2017

Contributor

@bjmota would you please ask troubleshooting questions on the mailing list? Github issues should be used for filing bugs and feature requests.

Contributor

billonahill commented May 25, 2017

@bjmota would you please ask troubleshooting questions on the mailing list? Github issues should be used for filing bugs and feature requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment