METRON-1348 Metron Service Checks Use Wrong Hostname #864
Conversation
# UI | ||
metron_management_ui_port = config['configurations']['metron-management-ui-env']['metron_management_ui_port'] | ||
# Alerts UI | ||
metron_alerts_ui_host = default("/clusterHostInfo/metron_alerts_ui_hosts", [hostname])[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to somehow find the hosts running each service and I knew it was contained in this clusterHostInfo
configuration. But it was really difficult to uncover what values Ambari keeps in this clusterHostInfo
configuration. I have not been able to find any documentation on this.
I actually had to add some debug statement to a live instance of Ambari to find out what values are stored here and how they are named. Fun, fun.
For the record, here is what is stored in clusterHostInfo
when spinning up the current state of Full Dev.
{
'snamenode_host':[
'node1'
],
'metron_alerts_ui_hosts':[
'node1'
],
'nm_hosts':[
'node1'
],
'drpc_server_hosts':[
'node1'
],
'ambari_server_use_ssl':[
'false'
],
'all_ping_ports':[
'8670'
],
'all_hosts':[
'node1'
],
'rm_host':[
'node1'
],
'kafka_broker_hosts':[
'node1'
],
'slave_hosts':[
'node1'
],
'metron_profiler_hosts':[
'node1'
],
'storm_ui_server_hosts':[
'node1'
],
'all_racks':[
'/default-rack'
],
'all_ipv4_ips':[
'127.0.0.1'
],
'app_timeline_server_hosts':[
'node1'
],
'hs_host':[
'node1'
],
'ambari_server_port':[
'8080'
],
'metron_rest_hosts':[
'node1'
],
'metron_management_ui_hosts':[
'node1'
],
'es_master_hosts':[
'node1'
],
'metron_parsers_hosts':[
'node1'
],
'kibana_master_hosts':[
'node1'
],
'metron_enrichment_master_hosts':[
'node1'
],
'hbase_rs_hosts':[
'node1'
],
'namenode_host':[
'node1'
],
'nimbus_hosts':[
'node1'
],
'hbase_master_hosts':[
'node1'
],
'metron_indexing_hosts':[
'node1'
],
'ambari_server_host':[
'node1'
],
'zookeeper_hosts':[
'node1'
],
'supervisor_hosts':[
'node1'
]
}
Ran up in full dev, works as described. +1 |
Hi @nickwallen , I tried this on a 12-node cluster. I validated that However, in my case it failed on the parser service check since the 'Metron Check' step landed on a host without Kafka broker installed. Here's the error excerpt:
I noticed that the I am perfectly fine if you think the kafka_broker fix should be a different PR. |
Thanks for testing @anandsubbu . I did not try to fix the 'Kafka not installed' issue with the service check. I am not yet sure how to fix that. I focused this PR on just fixing the bad host names. |
I did another 12-node deployment on Centos 7 with this PR (bypassed the kafka issue by installing Kafka broker on all nodes). The fix worked just perfect. Thanks much @nickwallen ! +1 (non-binding) |
I appreciate the reviews @ottobackwards and @anandsubbu . |
The Metron service check can often use the incorrect hostname when checking the Alerts UI, Management UI, and REST services. This results in a failed service check, even when the services are running successfully.
Ambari can run the service check on any node in the cluster, not just the node the service is actually running on. The service check code currently uses the hostname on which the service check is running. If the service is not actually installed on that host, the service check will incorrectly fail.
The service check code was updated to find the hostname where the service is installed and use that hostname.
Testing
This change was tested by deploying Metron on Full Dev and running Metron > Service Check in Ambari. The service check should complete successfully when the cluster is healthy. The fix has also been tested on a multi-node cluster in the same manner.
Pull Request Checklist