Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

can not submit to aurora, it seems halt up. #3502

Closed
dttlgotv opened this issue Mar 28, 2020 · 24 comments
Closed

can not submit to aurora, it seems halt up. #3502

dttlgotv opened this issue Mar 28, 2020 · 24 comments

Comments

@dttlgotv
Copy link

DEBUG] Using auth module: <apache.aurora.common.auth.auth_module.InsecureAuthModule object at 0x7f875d640510>
INFO] Creating job Test3Topology
DEBUG] Full configuration: JobConfiguration(instanceCount=2, cronSchedule=None, cronCollisionPolicy=0, key=JobKey(environment=u'devel', role=u'gxh', name=u'Test3Topology'), taskConfig=TaskConfig(isService=True, contactEmail=None, taskLinks={}, tier=u'preemptible', mesosFetcherUris=None, executorConfig=ExecutorConfig(data='{"environment": "devel", "health_check_config": {"health_checker": {"http": {"expected_response_code": 200, "endpoint": "/health", "expected_response": "ok"}}, "min_consecutive_successes": 1, "initial_interval_secs": 30.0, "max_consecutive_failures": 2, "timeout_secs": 5.0, "interval_secs": 10.0}, "name": "Test3Topology", "service": true, "max_task_failures": 1, "cron_collision_policy": "KILL_EXISTING", "enable_hooks": false, "cluster": "aurora", "task": {"processes": [{"daemon": false, "name": "fetch_heron_system", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "hdfs dfs -get /heron/dist/heron-core.tar.gz heron-core.tar.gz && tar zxf heron-core.tar.gz", "final": false}, {"daemon": false, "name": "fetch_user_package", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "hdfs dfs -get /heron/topologies/aurora/Test3Topology-gxh-tag-0-922516477660846776.tar.gz topology.tar.gz && tar zxf topology.tar.gz", "final": false}, {"daemon": false, "name": "launch_heron_executor", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "./heron-core/bin/heron-executor --shard={{mesos.instance}} --master-port={{thermos.ports[port1]}} --tmaster-controller-port={{thermos.ports[port2]}} --tmaster-stats-port={{thermos.ports[port3]}} --shell-port={{thermos.ports[http]}} --metrics-manager-port={{thermos.ports[port4]}} --scheduler-port={{thermos.ports[scheduler]}} --metricscache-manager-master-port={{thermos.ports[metricscachemgr_masterport]}} --metricscache-manager-stats-port={{thermos.ports[metricscachemgr_statsport]}} --checkpoint-manager-port={{thermos.ports[ckptmgr_port]}} --topology-name=Test3Topology --topology-id=Test3Topology3dd4ac0f-b248-4dd9-a91d-6bc53dafb8c2 --topology-defn-file=Test3Topology.defn --state-manager-connection=127.0.0.1:2181 --state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml --tmaster-binary=./heron-core/bin/heron-tmaster --stmgr-binary=./heron-core/bin/heron-stmgr --metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts=\"\" --classpath=heron-streamlet-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml --override-config-file=./heron-conf/override.yaml --component-ram-map=random-sentences-source:209715200 --component-jvm-opts=\"\" --pkg-type=jar --topology-binary-file=heron-streamlet-examples.jar --heron-java-home=/usr/lib/jvm/java-1.8.0-openjdk-amd64 --heron-shell-binary=./heron-core/bin/heron-shell --cluster=aurora --role=gxh --environment=devel --instance-classpath=./heron-core/lib/instance/* --metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/:./heron-core/lib/packing/:./heron-core/lib/statemgr/* --python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance --metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled --is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/:./heron-core/lib/statefulstorage/: --stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824 --health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*", "final": false}, {"daemon": false, "name": "discover_profiler_port", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "echo {{thermos.ports[yourkit]}} > yourkit.port", "final": false}], "name": "setup_and_run", "finalization_wait": 30, "max_failures": 1, "max_concurrency": 0, "resources": {"gpu": 0, "disk": 13958643712, "ram": 2357198848, "cpu": 1.0}, "constraints": [{"order": ["fetch_heron_system", "fetch_user_package", "launch_heron_executor", "discover_profiler_port"]}]}, "production": false, "role": "gxh", "tier": "preemptible", "announce": {"primary_port": "http", "portmap": {"health": "http"}}, "lifecycle": {"http": {"graceful_shutdown_endpoint": "/quitquitquit", "port": "health", "shutdown_endpoint": "/abortabortabort"}}, "priority": 0}', name='AuroraExecutor'), requestedPorts=set([u'port4', u'http', u'metricscachemgr_masterport', u'yourkit', u'metricscachemgr_statsport', u'scheduler', u'ckptmgr_port', u'port2', u'port3', u'port1']), maxTaskFailures=1, priority=0, ramMb=2248, job=JobKey(environment=u'devel', role=u'gxh', name=u'Test3Topology'), production=False, diskMb=13312, resources=frozenset([]), owner=Identity(user='root'), container=Container(docker=None, mesos=MesosContainer(image=None, volumes=None)), metadata=frozenset([]), numCpus=1.0, constraints=set([])), owner=Identity(user='root'))
DEBUG] Querying instance statuses: None
DEBUG] Response from scheduler: OK (message: )
DEBUG] Querying instance statuses: None
DEBUG] Response from scheduler: OK (message: )
DEBUG] Querying instance statuses: None
DEBUG] Response from scheduler: OK (message: )
DEBUG] Querying instance statuses: None
DEBUG] Response from scheduler: OK (message: )
DEBUG] Querying instance statuses: None
DEBUG] Response from scheduler: OK (message: )
DEBUG] Querying instance statuses: None
DEBUG] Response from scheduler: OK (message: )
DEBUG] Querying instance statuses: None
DEBUG] Response from scheduler: OK (message: )

@thinker0
Copy link
Member

thinker0 commented Mar 28, 2020

@dttlgotv
See Mesos or Aurora stderr, stdout log.

hdfs dfs -get /heron/dist/heron-core.tar.gz heron-core.tar.gz && tar zxf heron-core.tar.gz

work ?

Please refer to this in my aurora file.

import textwrap

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-centos.tar.gz"
topology_package_file = "topology.tar.gz"

# --- processes ---
fetch_heron_system = Process(
    name='fetch_heron_system',
    cmdline=textwrap.dedent('''
        set -x
        curl %s -o %s && tar zxf %s && {
            rm -f heron-centos.tar.gz
        } && tar xvfz dist/heron-core.tar.gz && {
            rm -f dist/heron-core.tar.gz
        }
    ''') % (heron_core_release_uri, core_release_file, core_release_file)
)

@dttlgotv
Copy link
Author

About cmd: hdfs dfs -get /heron/dist/heron-core.tar.gz heron-core.tar.gz && tar zxf heron-core.tar.gz

I tried this command on three cluster machine, it works well.

My heron.aurora is below:

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-core.tar.gz"
topology_package_file = "topology.tar.gz"

--- processes ---

fetch_heron_system = Process(
name = 'fetch_heron_system',
cmdline = 'hdfs dfs -get %s %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)
)

fetch_user_package = Process(
name = 'fetch_user_package',
cmdline = 'hdfs dfs -get %s %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
)

result:
in aurara stderr:
log stderr download
/bin/bash: hdfs: command not found

mesos stderr:
I0329 10:58:19.809157 22467 logging.cpp:201] INFO level logging started!
I0329 10:58:19.809937 22467 fetcher.cpp:562] Fetcher Info: {"cache_directory":"/tmp/mesos/fetch/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"/usr/bin/thermos_executor"}}],"sandbox_directory":"/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-Test3Topology-1-d2093f25-b263-4ada-9a33-823ae2ff5075/runs/bd2f2ba6-9be1-4243-b027-9ac8ddee447a","stall_timeout":{"nanoseconds":60000000000},"user":"root"}
I0329 10:58:19.836925 22467 fetcher.cpp:459] Fetching URI '/usr/bin/thermos_executor'
I0329 10:58:19.836985 22467 fetcher.cpp:290] Fetching '/usr/bin/thermos_executor' directly into the sandbox directory
I0329 10:58:19.849915 22467 fetcher.cpp:618] Fetched '/usr/bin/thermos_executor' to '/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-Test3Topology-1-d2093f25-b263-4ada-9a33-823ae2ff5075/runs/bd2f2ba6-9be1-4243-b027-9ac8ddee447a/thermos_executor'
I0329 10:58:19.850029 22467 fetcher.cpp:623] Successfully fetched all URIs into '/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-Test3Topology-1-d2093f25-b263-4ada-9a33-823ae2ff5075/runs/bd2f2ba6-9be1-4243-b027-9ac8ddee447a'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-Test3Topology-1-d2093f25-b263-4ada-9a33-823ae2ff5075/runs/bd2f2ba6-9be1-4243-b027-9ac8ddee447a
I0329 10:58:21.379936 22469 exec.cpp:162] Version: 1.1.0
I0329 10:58:21.391768 22481 exec.cpp:237] Executor registered on agent e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1
Writing log files to disk in /root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-Test3Topology-1-d2093f25-b263-4ada-9a33-823ae2ff5075/runs/bd2f2ba6-9be1-4243-b027-9ac8ddee447a
ERROR] Regular plan unhealthy!
Traceback (most recent call last):
File "/root/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/init.py", line 126, in _excepting_run
self.__real_run(*args, **kw)
File "/root/.pex/install/twitter.common.concurrent-0.3.7-py2-none-any.whl.f1ab836a5554c86d07fa3f075905c95fb20c78dd/twitter.common.concurrent-0.3.7-py2-none-any.whl/twitter/common/concurrent/deferred.py", line 42, in run
self._closure()
File "/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-Test3Topology-1-d2093f25-b263-4ada-9a33-823ae2ff5075/runs/bd2f2ba6-9be1-4243-b027-9ac8ddee447a/thermos_executor/apache/aurora/executor/common/announcer.py", line 269, in stop
AttributeError: 'NoneType' object has no attribute 'stop'
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug: Active thread: <_MainThread(MainThread, started 140676464191296)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-8, started daemon 140676227225344)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-3, started daemon 140676277581568)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-14, started daemon 140675077109504)>
twitter.common.app debug: Active thread (daemon): <ThreadedHealthChecker(Thread-7 [TID=22502], started daemon 140675855664896)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-10, started daemon 140675864057600)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-20, started daemon 140675060324096)>
twitter.common.app debug: Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[root-devel-Test3Topology-1-d2093f25-b263-4ada-9a33-823ae2ff5075] [TID=22503], started daemon 140675847272192)>
twitter.common.app debug: Active thread (daemon): <ServerSetJoinThread(Thread-21 [TID=22514], started daemon 140675882415872)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-19, started daemon 140675068716800)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-2, started daemon 140676260796160)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-9, started daemon 140675874023168)>
twitter.common.app debug: Exiting cleanly.

@dttlgotv
Copy link
Author

I used your reference:
heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-centos.tar.gz"
topology_package_file = "topology.tar.gz"

#heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
#heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
#core_release_file = "heron-core.tar.gz"
#topology_package_file = "topology.tar.gz"

--- processes ---

fetch_heron_system = Process(
name='fetch_heron_system',
cmdline=textwrap.dedent('''
set -x
curl %s -o %s && tar zxf %s && {
rm -f heron-centos.tar.gz
} && tar xvfz dist/heron-core.tar.gz && {
rm -f dist/heron-core.tar.gz
}
''') % (heron_core_release_uri, core_release_file, core_release_file)
)
fetch_user_package = Process(
name = 'fetch_user_package',
cmdline = 'hdfs dfs -get %s %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
)

result: mesos and aurora stderr can not be seen , perhaps the task is not be scheduled. But I cmd this error can be seen below:

Error loading configuration: name 'textwrap' is not defined
[2020-03-29 11:15:44 +0800] [严重] org.apache.heron.scheduler.aurora.AuroraCLIController: Failed to run process. Command=[aurora, job, create, --wait-until, RUNNING, --bind, CPUS_PER_CONTAINER=1.0, --bind, EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, ROLE=root, --bind, TOPOLOGY_NAME=Test3Topology, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/aurora/Test3Topology-root-tag-0--8888688933785017338.tar.gz, --bind, RAM_PER_CONTAINER=2357198848, --bind, CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz, --bind, TIER=preemptible, --bind, TOPOLOGY_ARGUMENTS=--topology-name=Test3Topology --topology-id=Test3Topologye1439966-dc4f-47b9-9749-136375ba4528 --topology-defn-file=Test3Topology.defn --state-manager-connection=127.0.0.1:2181 --state-manager-root=/heron --state-manager-config-file=./heron-conf/statemgr.yaml --tmaster-binary=./heron-core/bin/heron-tmaster --stmgr-binary=./heron-core/bin/heron-stmgr --metrics-manager-classpath=./heron-core/lib/metricsmgr/* --instance-jvm-opts="" --classpath=heron-streamlet-examples.jar --heron-internals-config-file=./heron-conf/heron_internals.yaml --override-config-file=./heron-conf/override.yaml --component-ram-map=random-sentences-source:209715200 --component-jvm-opts="" --pkg-type=jar --topology-binary-file=heron-streamlet-examples.jar --heron-java-home=/usr/lib/jvm/java-1.8.0-openjdk-amd64 --heron-shell-binary=./heron-core/bin/heron-shell --cluster=aurora --role=root --environment=devel --instance-classpath=./heron-core/lib/instance/* --metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml --scheduler-classpath=./heron-core/lib/scheduler/:./heron-core/lib/packing/:./heron-core/lib/statemgr/* --python-instance-binary=./heron-core/bin/heron-python-instance --cpp-instance-binary=./heron-core/bin/heron-cpp-instance --metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* --metricscache-manager-mode=disabled --is-stateful=false --checkpoint-manager-classpath=./heron-core/lib/ckptmgr/:./heron-core/lib/statefulstorage/: --stateful-config-file=./heron-conf/stateful.yaml --checkpoint-manager-ram=1073741824 --health-manager-mode=disabled --health-manager-classpath=./heron-core/lib/healthmgr/*, --bind, NUM_CONTAINERS=2, --bind, CLUSTER=aurora, --bind, ENVIRON=devel, --bind, DISK_PER_CONTAINER=13958643712, aurora/root/devel/Test3Topology, /root/.heron/conf/aurora/heron.aurora, --verbose], STDOUT=null, STDERR=null
[2020-03-29 11:15:44 +0800] [严重] org.apache.heron.scheduler.utils.LauncherUtils: Failed to invoke IScheduler as library
[2020-03-29 11:15:44 +0800] [非常详细] org.apache.curator.utils.DefaultTracerDriver: Trace: DeleteBuilderImpl-Foreground - 20 ms
[2020-03-29 11:15:44 +0800] [信息] org.apache.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/executionstate/Test3Topology
[2020-03-29 11:15:44 +0800] [非常详细] org.apache.curator.utils.DefaultTracerDriver: Trace: DeleteBuilderImpl-Foreground - 18 ms
[2020-03-29 11:15:44 +0800] [信息] org.apache.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/Test3Topology
[2020-03-29 11:15:44 +0800] [非常详细] org.apache.curator.utils.DefaultTracerDriver: Trace: DeleteBuilderImpl-Foreground - 21 ms
[2020-03-29 11:15:44 +0800] [信息] org.apache.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/topologies/Test3Topology
[2020-03-29 11:15:44 +0800] [详细] org.apache.heron.spi.utils.ShellUtils: Running synced process: ``hadoop --config /usr/local/hadoop/etc/hadoop fs -rm /heron/topologies/aurora/Test3Topology-root-tag-0--8888688933785017338.tar.gz''
[2020-03-29 11:15:44 +0800] [详细] org.apache.heron.spi.utils.ShellUtils: Process output (stdout+stderr):
Deleted /heron/topologies/aurora/Test3Topology-root-tag-0--8888688933785017338.tar.gz
[2020-03-29 11:15:47 +0800] [信息] org.apache.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: 127.0.0.1:2181
[2020-03-29 11:15:47 +0800] [详细] org.apache.curator.framework.imps.CuratorFrameworkImpl: Closing
[2020-03-29 11:15:47 +0800] [详细] org.apache.curator.CuratorZookeeperClient: Closing
[2020-03-29 11:15:47 +0800] [详细] org.apache.curator.ConnectionState: Closing
[2020-03-29 11:15:47 +0800] [信息] org.apache.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes
[2020-03-29 11:15:47 +0800] [严重] org.apache.heron.scheduler.SubmitterMain: Exception when submitting topology
org.apache.heron.spi.scheduler.LauncherException: Failed to launch topology 'Test3Topology'
at org.apache.heron.scheduler.LaunchRunner.call(LaunchRunner.java:177)
at org.apache.heron.scheduler.SubmitterMain.callLauncherRunner(SubmitterMain.java:556)
at org.apache.heron.scheduler.SubmitterMain.submitTopology(SubmitterMain.java:460)
at org.apache.heron.scheduler.SubmitterMain.main(SubmitterMain.java:334)

[2020-03-29 11:15:47 +0000] [ERROR]: Failed to launch topology 'Test3Topology'
[2020-03-29 11:15:47 +0000] [ERROR]: Failed to launch topology 'Test3Topology'
[2020-03-29 11:15:47 +0000] [DEBUG]: Elapsed time: 12.904s.

@dttlgotv
Copy link
Author

@dttlgotv
See Mesos or Aurora stderr, stdout log.

hdfs dfs -get /heron/dist/heron-core.tar.gz heron-core.tar.gz && tar zxf heron-core.tar.gz

work ?

Please refer to this in my aurora file.

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-centos.tar.gz"
topology_package_file = "topology.tar.gz"

# --- processes ---
fetch_heron_system = Process(
    name='fetch_heron_system',
    cmdline=textwrap.dedent('''
        set -x
        curl %s -o %s && tar zxf %s && {
            rm -f heron-centos.tar.gz
        } && tar xvfz dist/heron-core.tar.gz && {
            rm -f dist/heron-core.tar.gz
        }
    ''') % (heron_core_release_uri, core_release_file, core_release_file)
)

I just give two results (your reference and mine), please check my comment .Thanks a lot

@thinker0
Copy link
Member

import textwrap

or

Remove

textwrap.dedent

@dttlgotv
Copy link
Author

import textwrap

or

Remove

textwrap.dedent

I missed this line.
Then I add it , the topology can not be run well.

aurora error:

  • curl /heron/dist/heron-core.tar.gz -o heron-centos.tar.gz
    curl: (3) malformed

mesos error:
I0329 12:12:03.846160 23213 logging.cpp:201] INFO level logging started!
I0329 12:12:03.846814 23213 fetcher.cpp:562] Fetcher Info: {"cache_directory":"/tmp/mesos/fetch/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"/usr/bin/thermos_executor"}}],"sandbox_directory":"/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-cebe9de4-50dc-4790-93e2-43bcd550ce6f/runs/7bd1c130-001e-452e-b2af-dc283269288f","stall_timeout":{"nanoseconds":60000000000},"user":"root"}
I0329 12:12:03.870548 23213 fetcher.cpp:459] Fetching URI '/usr/bin/thermos_executor'
I0329 12:12:03.870640 23213 fetcher.cpp:290] Fetching '/usr/bin/thermos_executor' directly into the sandbox directory
I0329 12:12:03.880676 23213 fetcher.cpp:618] Fetched '/usr/bin/thermos_executor' to '/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-cebe9de4-50dc-4790-93e2-43bcd550ce6f/runs/7bd1c130-001e-452e-b2af-dc283269288f/thermos_executor'
I0329 12:12:03.880781 23213 fetcher.cpp:623] Successfully fetched all URIs into '/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-cebe9de4-50dc-4790-93e2-43bcd550ce6f/runs/7bd1c130-001e-452e-b2af-dc283269288f'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-cebe9de4-50dc-4790-93e2-43bcd550ce6f/runs/7bd1c130-001e-452e-b2af-dc283269288f
I0329 12:12:05.558152 23215 exec.cpp:162] Version: 1.1.0
I0329 12:12:05.566694 23227 exec.cpp:237] Executor registered on agent e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1
Writing log files to disk in /root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-cebe9de4-50dc-4790-93e2-43bcd550ce6f/runs/7bd1c130-001e-452e-b2af-dc283269288f
ERROR] Regular plan unhealthy!
Traceback (most recent call last):
File "/root/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/init.py", line 126, in _excepting_run
self.__real_run(*args, **kw)
File "/root/.pex/install/twitter.common.concurrent-0.3.7-py2-none-any.whl.f1ab836a5554c86d07fa3f075905c95fb20c78dd/twitter.common.concurrent-0.3.7-py2-none-any.whl/twitter/common/concurrent/deferred.py", line 42, in run
self._closure()
File "/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-cebe9de4-50dc-4790-93e2-43bcd550ce6f/runs/7bd1c130-001e-452e-b2af-dc283269288f/thermos_executor/apache/aurora/executor/common/announcer.py", line 269, in stop
AttributeError: 'NoneType' object has no attribute 'stop'
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug: Active thread: <_MainThread(MainThread, started 140638577588032)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-3, started daemon 140638357231360)>
twitter.common.app debug: Active thread (daemon): <ThreadedHealthChecker(Thread-7 [TID=23249], started daemon 140637496145664)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-2, started daemon 140638374016768)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-10, started daemon 140638312908544)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-14, started daemon 140637462574848)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-8, started daemon 140638332053248)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-18, started daemon 140638340445952)>
twitter.common.app debug: Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[root-devel-StreamletCloneTopology-0-cebe9de4-50dc-4790-93e2-43bcd550ce6f] [TID=23250], started daemon 140637487752960)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-19, started daemon 140637445789440)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-9, started daemon 140638321301248)>
twitter.common.app debug: Active thread (daemon): <ServerSetJoinThread(Thread-22 [TID=23262], started daemon 140637470967552)>
twitter.common.app debug: Exiting cleanly.

@dttlgotv
Copy link
Author

some my config files:

upload.yaml file:
heron.class.uploader: "org.apache.heron.uploader.hdfs.HdfsUploader"

heron.uploader.hdfs.config.directory: "/usr/local/hadoop/etc/hadoop"

# heron.uploader.hdfs.topologies.directory.uri: hdfs://heron/topologies/${CLUSTER}

heron.uploader.hdfs.topologies.directory.uri: "/heron/topologies/${CLUSTER}"

client.yaml:

location of the core package

heron.package.core.uri: "/heron/dist/heron-core.tar.gz"

Whether role/env is required to submit a topology. Default value is False.

heron.config.is.role.required: True
heron.config.is.env.required: True

@thinker0
Copy link
Member

curl /heron/

to

hdfs get 

@dttlgotv
Copy link
Author

curl

Sorry, I can not understand what is your meanings.

@dttlgotv
Copy link
Author

curl /heron/

your meanings is that using hdfs to get ..... ?

@dttlgotv
Copy link
Author

@thinker0

The last modification.

My heron.aurora file is below, but error can be seen yet.
import textwrap

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'
heron_topology_jar_uri = '{{TOPOLOGY_PACKAGE_URI}}'
core_release_file = "heron-centos.tar.gz"
topology_package_file = "topology.tar.gz"

--- processes ---

fetch_heron_system = Process(
name = 'fetch_heron_system',
cmdline = 'hdfs dfs -get %s %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)
)

fetch_user_package = Process(
name = 'fetch_user_package',
cmdline = 'hdfs dfs -get %s %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
)

Aurora error:
log stderr download
/bin/bash: hdfs: command not found

mesos error:
I0329 12:48:59.507381 23905 logging.cpp:201] INFO level logging started!
I0329 12:48:59.508378 23905 fetcher.cpp:562] Fetcher Info: {"cache_directory":"/tmp/mesos/fetch/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"/usr/bin/thermos_executor"}}],"sandbox_directory":"/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c/runs/752698d1-c0e5-4f5f-be41-efa49aee3a12","stall_timeout":{"nanoseconds":60000000000},"user":"root"}
I0329 12:48:59.547435 23905 fetcher.cpp:459] Fetching URI '/usr/bin/thermos_executor'
I0329 12:48:59.547525 23905 fetcher.cpp:290] Fetching '/usr/bin/thermos_executor' directly into the sandbox directory
I0329 12:48:59.566543 23905 fetcher.cpp:618] Fetched '/usr/bin/thermos_executor' to '/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c/runs/752698d1-c0e5-4f5f-be41-efa49aee3a12/thermos_executor'
I0329 12:48:59.566643 23905 fetcher.cpp:623] Successfully fetched all URIs into '/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c/runs/752698d1-c0e5-4f5f-be41-efa49aee3a12'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c/runs/752698d1-c0e5-4f5f-be41-efa49aee3a12
I0329 12:49:01.113678 23907 exec.cpp:162] Version: 1.1.0
I0329 12:49:01.123690 23914 exec.cpp:237] Executor registered on agent e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1
Writing log files to disk in /root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c/runs/752698d1-c0e5-4f5f-be41-efa49aee3a12
ERROR] Regular plan unhealthy!
Traceback (most recent call last):
File "/root/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/init.py", line 126, in _excepting_run
self.__real_run(*args, **kw)
File "/root/.pex/install/twitter.common.concurrent-0.3.7-py2-none-any.whl.f1ab836a5554c86d07fa3f075905c95fb20c78dd/twitter.common.concurrent-0.3.7-py2-none-any.whl/twitter/common/concurrent/deferred.py", line 42, in run
self._closure()
File "/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c/runs/752698d1-c0e5-4f5f-be41-efa49aee3a12/thermos_executor/apache/aurora/executor/common/announcer.py", line 269, in stop
AttributeError: 'NoneType' object has no attribute 'stop'
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug: Active thread: <_MainThread(MainThread, started 140524700247872)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-7, started daemon 140524463101696)>
twitter.common.app debug: Active thread (daemon): <TaskResourceMonitor(TaskResourceMonitor[root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c] [TID=23941], started daemon 140523612403456)>
twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-2, started daemon 140524538636032)>
twitter.common.app debug: Active thread (daemon): <ThreadedHealthChecker(Thread-6 [TID=23940], started daemon 140524427171584)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-9, started daemon 140524435564288)>
twitter.common.app debug: Active thread (daemon): <Thread(Thread-8, started daemon 140524446316288)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-13, started daemon 140523576043264)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-16, started daemon 140523595618048)>
twitter.common.app debug: Active thread (daemon): <WaitThread(Thread-15, started daemon 140523567650560)>
twitter.common.app debug: Active thread (daemon): <ServerSetJoinThread(Thread-21 [TID=23953], started daemon 140523584435968)>
twitter.common.app debug: Exiting cleanly.

@thinker0
Copy link
Member

thinker0 commented Mar 29, 2020

When the mesos-agent server executor works
  The hdfs command should work.

/bin/bash: hdfs: command not found
  1. fetch_heron_system: heron package binary
  2. fetch_user_package: topology package binary

@dttlgotv
Copy link
Author

When the mesos-agent server executor works
The hdfs command should work.

/bin/bash: hdfs: command not found
1. fetch_heron_system: heron package binary

2. fetch_user_package: topology package binary

ps -e |grep mesos
1645 ? 00:29:27 mesos-agent

WeChata6e4220a388bd36c90d801a6a688eb9c

@thinker0
Copy link
Member

Download hdfs: //...../heron-cento.tgz packge from hdfs to the container.

heron_core_release_uri = '{{CORE_PACKAGE_URI}}'

@dttlgotv
Copy link
Author

dttlgotv commented Mar 29, 2020 via email

@thinker0
Copy link
Member

Make the hdfs command work on all master/agent (slave).

If you are using CentOS, CDH.
You need to create an environment so that hdfs can work in advance on your agent (slave).

` yum install hadoop-client hadoop-hdfs `

Heron should pre-set your environment and set it to custom settings.

@dttlgotv
Copy link
Author

dttlgotv commented Mar 29, 2020 via email

@thinker0
Copy link
Member

thinker0 commented Mar 29, 2020

File "/root/mesosdata/run/slaves/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-S1/frameworks/e2da2e47-de60-4a1c-a81e-4f14ac3cf16f-0000/executors/thermos-root-devel-StreamletCloneTopology-0-e50143f2-87fd-45f5-bd6f-00ab46dcf48c/runs/752698d1-c0e5-4f5f-be41-efa49aee3a12/thermos_executor/apache/aurora/executor/common/announcer.py", line 269, in stop
AttributeError: 'NoneType' object has no attribute 'stop'

this ?

image
This task seems to fail?

It should be like below to work well.
image

@thinker0
Copy link
Member

thinker0 commented Mar 29, 2020

Let's see with Zoom.us ?

@dttlgotv
Copy link
Author

Let's see with Zoom.us ?

let me download zoom, then I contact you.. thanks a lot

@thinker0
Copy link
Member

Check Slak DM!!

@dttlgotv
Copy link
Author

Check Slak DM!!

https://zoom.com.cn/j/604255874

can you use this link, I can not found your name...

@thinker0
Copy link
Member

@dttlgotv Check Slak DM!!

@thinker0
Copy link
Member

Check Slak DM!!

https://zoom.com.cn/j/604255874

can you use this link, I can not found your name...

Take a look at Slack's DM.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants