METRON-1249 Improve Metron MPack service checks #799
Conversation
:param env: Environment | ||
""" | ||
Logger.info("Checking for Geo database") | ||
metron_service.check_hdfs_file_exists(self.__params, self.__params.geoip_hdfs_dir + "/GeoLite2-City.mmdb.gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, honestly, the better approach here, unfortunately, is to pull the filename from the global config and ensure that the file exists. What if the user renames the GeoLite database to something other than GeoLite2-City.mmdb.gz
? Can we make that a follow-on JIRA at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I don't like hard-coding "GeoLite2-City.mmdb.gz" here. I just didn't see anywhere in the Mpack where we had that value parameterized already.
Where in the code do we have it parameterized? I'd love to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see it now under a global properties key "geo.hdfs.file". There is nothing in Ambari MPack for it, which might complicate using it. I am just thinking through, if we would need to first introduce it as an Ambari-managed configuration value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we just have the HDFS directory, which is what you're using. For this PR, I think it's good. I'd hate to make the perfect the enemy of the good.
Going forward, what would be cool is if we could execute a stellar script via the REPL and if it fails, fail the service check. Since the REPL can interact with global config parameters and has the capability to validate HDFS, that would be a clean way to do this.
for topic in topics: | ||
Logger.info("Checking existence of Kafka topic '{0}'".format(topic)) | ||
try: | ||
Execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we could make a function rather than cutting and pasting? Something like:
def exec( cmd, user_as, fail_msg ) :
try:
Execute(cmd,
tries=3,
try_sleep=5,
logoutput=True,
path='/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin',
user=user_as)
except:
raise Fail(fail_msg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this a bit and added some pydocs.
I really like this. Modulo a couple of very minor nits, I'm +1 by inspection. |
If my understanding is correct, this will work for all parser topologies listed to automatically start through the ambari configuration, but not all parser topologies that may in fact be running, as we still have the disconnect between the management ui not working with ambari. That is to say, topologies started through the management ui will not be tracked by this. |
Yes, that's right. In my opinion, though, we should probably stop managing parsers in ambari and just focus on the management UI to have one place to manage all parsers. |
untangling ambari and zookeeper would be required. If we want ( as I think we need to ) to be able to manage the parsers from the ui / rest, but also have the ambari service management still work ( restart all effected services ). I'm not sure if is more than having ambari read and write to zookeeper for the all_parsers list or not |
But I am taking what you are saying about managing parsers to mean 'managing configured sensors is not done in ambari, but managing the services is'.... |
Really, what I'd like to see is a status on the parsers in the management UI that indicates that those parsers should be running all the time and a REST call to validate that they are running or not. I'd then expect ambari to call the REST call and fail the service check if any of the installed sensors that are marked as running aren't running. What we have now is that list in Ambari and ambari managing to start them and stop them. I'd prefer to delegate teh starting and stopping of sensors to the management UI and JUST have ambari interact with the REST API to indicate whether the current state is nominal. That, however, is really more of a topic for a discuss thread, though. |
@ottobackwards Do you have any concerns that need addressed in this PR? |
@nickwallen no, the problem exists before already |
@ottobackwards Thanks. Your points definitely warrant a discussion. Just wanted to make sure we're green lighted on this PR. |
This PR enhances the Metron 'Service Check' functionality in the MPack. The Service Check is an easy way for a user to know if their Metron cluster is healthy. The Service Check can be run manually from within Ambari. It is also automatically executed after kerberization.
In the current version of the Service Check, healthy means the Parser and Indexing topologies are running. This PR enhances that to validate all of the install actions that occur across each of the Metron services. These checks include the following.
I added considerable logging so that if a check does fail, a user will have a reasonable chance to understand why. Ambari doesn't give me an easy way to tell a user "hey, this is the problem!", so the user still has to go through the output of the Service Check in the Operations Panel to know why the Service Check failed.
Testing
Pull Request Checklist