Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilities for debugging intelmq bots. #973

Closed
wants to merge 123 commits into from
Closed

Conversation

e3rd
Copy link
Member

@e3rd e3rd commented May 12, 2017

BotDebugger is called via intelmqctl. It starts a live running bot instance,
leverages logging to DEBUG level and permits even a non-skilled programmer
who may find themselves puzzled with Python nuances and server deployment twists
to see what's happening in the bot and where's the error.

Depending on the subcommand received, the class either

  • starts the bot as is (default)
  • processes single message, either injected or from default pipeline (process subcommand)
  • reads the message from input pipeline or send a message to output pipeline (message subcommand)

Further help was added to argparse help of intelmqctl:
Possible commands:

intelmqctl run bot-id (bot.start())
intelmqctl run bot-id message get (read the next message)
intelmqctl run bot-id message pop (read the next message and pop from queue)
intelmqctl run bot-id message send '{a:b}' (create message from string and send to output queue)
intelmqctl run bot-id process (process single message)
intelmqctl run bot-id process --msg '{a:b}' (process single message from string)
intelmqctl run bot-id process --dryrun (process single message from pipeline or --msg, but never really acknowledge nor send it to output pipeline)

There were commands I always wanted to have. I missed them when creating/quickly debugging the bots. If you find them useful, too, I'd be very glad to publish it in the main repository. I am open to any discussion concerning the new commands.

sebix and others added 30 commits September 22, 2015 16:44
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
add attachments
less psql queries

Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
incidents without known contact are grouped by ASN
New or modified contact can be saved to DB

Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
prettytable,
csv on python 2.x (2.6),
postgres 9.1

minor formatting issues
remove log_level from config, not used anymore

Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
not active, needs config option first

Signed-off-by: Sebastian Wagner <sebix@sebix.at>
Signed-off-by: Sebastian Wagner <sebix@sebix.at>
e3rd and others added 10 commits March 17, 2017 01:27
we dont have to wait 0.25 s * bot now
I removed the threading dependecy added last time. This works better: First, we start all the bots and then we wait once and then we ask all the bots for status.
I removed the threading dependecy added last time. This works better: First, we start all the bots and then we wait once and then we ask all the bots for status.
@ghost ghost self-requested a review May 15, 2017 08:24
@ghost ghost added this to the v1.1 Feature release milestone May 15, 2017
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The features are really great. I had something like this in mind too but never had the time to actually do it.

Sending messages does not work for me: Wrong parameter, sry

I am missing a detailed explanation including examples for the users in docs/intelmqctl.md
Please fix the code style issues: https://travis-ci.org/certtools/intelmq/jobs/231607021#L2790

retval = 0
except KeyboardInterrupt:
print('Keyboard interrupt.')
retval = 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Keyboard interrupt is the usual stop method and thus it should be retval 0

retval = 1
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the print 2 lines above is meaningless, you can remove that block (140-143) altogether.

parser_run_subparsers = parser_run.add_subparsers(title='run-subcommands')
parser_run_message = parser_run_subparsers.add_parser(
'message', help='Debug bot\'s pipelines. Get the message in the input pipeline, '
'pop it (cut it) and display it, or send the message directly to bot\' output pipeline.')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing s after bot\'

if message_action_kind == "get":
self.instance.logger.info("Trying to get the message...")
msg = self.instance.receive_message()
print(msg)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use pprint here for a nicer output (also in the block below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we couldn't use pprint because it translates double quotes to single quotes which are not reparsable again by intelmqctl: JSON standards asks for double quotes only.

help='Never really pop the message from the input pipeline '
'nor send to output pipeline.')
parser_run_process.add_argument('--msg', '-m',
help='Trick the bot to process this quoted dict '
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's not a quoted dict, but JSON. JSON's syntax is much more strict.

print("Wrong formatted msg.")
return
self.instance.send_message(msg)
self.instance.logger.info("Message send to output pipelines.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/send/sent/

def __init__(self, module_path, bot_id, run_subcommand = None, message_kind = None, dryrun = None, msg = None):
module = import_module(module_path)
bot = getattr(module, 'BOT')
self.instance = bot(bot_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs the bot's initialization. Thus, if the user only want's to get the message, this should not be run.

Copy link
Member Author

@e3rd e3rd May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really so bad? Why? Is that a bottleneck, or you mind the initialisation messages in the console? ("deduplicator-expert-cz: DeduplicatorExpertBot initialized with id deduplicator-expert and version 3.5.2 (default, Nov 17 2016, 17:05:23) as process 5560.") I may just suppress the messages.
There is so much things I have to implement if I want to connect to bot's pipelines without acually calling bot.init. So much of code that had to be reused; and I may do some small errors that would make the connection differ for debug session and for normal lifecycle of a bot. Loading configuration from different files, manually calling PipelineFactory...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about the log messages. Bot's are executing code in init(), e.g. connecting, loading and blocking resources etc.

For basic experts this is not really relevant, yes. But when they load and parse big files into memory, which is totally irrelevant for the message operations, that's annoying too.

To solve this we could add a second optional parameter to Bot.__init which controls the call to Bot.init.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Bot.init() is the only problem, it seems to me to be more easy to just strip it out by bot.init = lambda: pass before initialization occurs.
We are doing monkeypatching everywhere in this pullrequest, and I find it nicer than adding another parameter just for this debug case.

from intelmq.lib.message import Event
from importlib import import_module
from intelmq.lib.utils import StreamHandler
from intelmq.lib.message import Event
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate import

try:
msg = Event(json.loads(msg))
except:
print("Wrong formatted msg.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be invalid data (from validation) or invalid syntax (JSON). I'd print the error message (you can use lib.utils.error_message_from_exc for this)

Copy link
Member Author

@e3rd e3rd May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what you think about this?

except (Exception, KeyError, TypeError, json.JSONDecodeError) as exc:                        
                        print("Message can not be parsed from JSON: " + error_message_from_exc(exc))
                        return

For the cmd intelmqctl run deduplicator-expert message send '1', the exception message will look like: Message can not be parsed from JSON: 'int' object is not subscriptable

For the cmd intelmqctl run deduplicator-expert-cz message send '{"fsd1": "test"}' we get: Message can not be parsed from JSON: '__type'

Shouldnt we use a default_type = "Message" or something in the MessageFactory.unserialize method? (Btw, thanks for letting me know about MessageFactory.unserialize, I missed that before.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better. json.JSONDecodeError only exists in >= 3.5, below it's ValueError. As JSONDecodeError is a subclass of ValueError, catching only the latter one is fine.

The default type would be either Report or Event, based on the bot's group (the parameter "group"). Then the user does not need to give the message type at all and it's always correct :)

Also, for bot's without destination pipeline this throws an exception:

...
file-output: Opening '/opt/intelmq/var/lib/bots/file-output/events.txt' file.
file-output: File '/opt/intelmq/var/lib/bots/file-output/events.txt' is open.
file-output: Loading source pipeline and queue 'file-output-queue'.
file-output: Connected to source queue.
file-output: No destination queues to load.
file-output: Pipeline ready.
Traceback (most recent call last):
  File "/usr/local/bin/intelmqctl", line 9, in <module>
    load_entry_point('intelmq==1.0.0.dev7', 'console_scripts', 'intelmqctl')()
  File "/home/sebastian/dev/intelmq/intelmq/bin/intelmqctl.py", line 885, in main
    return x.run()
  File "/home/sebastian/dev/intelmq/intelmq/bin/intelmqctl.py", line 531, in run
    results = args.func(**args_dict)
  File "/home/sebastian/dev/intelmq/intelmq/bin/intelmqctl.py", line 541, in bot_run
    return self.bot_process_manager.bot_run(bot_id, run_subcommand, message_action_kind, dryrun, msg)
  File "/home/sebastian/dev/intelmq/intelmq/bin/intelmqctl.py", line 135, in bot_run
    BotDebugger(self.__runtime_configuration[bot_id]['module'], bot_id, run_subcommand, message_action_kind, dryrun, msg)
  File "/home/sebastian/dev/intelmq/intelmq/lib/bot_debugger.py", line 45, in __init__
    self._message(message_kind, msg)
  File "/home/sebastian/dev/intelmq/intelmq/lib/bot_debugger.py", line 80, in _message
    self.instance.send_message(msg)
  File "/home/sebastian/dev/intelmq/intelmq/lib/bot.py", line 332, in send_message
    raise exceptions.ConfigurationError('pipeline', 'No destination pipeline given, '
intelmq.lib.exceptions.ConfigurationError: pipeline configuration failed - No destination pipeline given, but needed

Copy link
Member Author

@e3rd e3rd May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.JSONDecodeError only exists in >= 3.5, below it's ValueError

Thanks, I didn't know that. I think 3.5 is quite spread now, so let's let it like this.

The default type would be either Report or Event, based on the bot's group (the parameter "group"). Then the user does not need to give the message type at all and it's always correct ☺

Am I right that Parser gets Report and others get Event (aside Collectors)? default_type = "Report" if self.runtime_configuration["group"] is "Parser" else "Event"

also, for bot's without destination pipeline this throws

Corrected, even for bots without input queue.

elif message_action_kind == "send":
if msg:
try:
msg = Event(json.loads(msg))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reports are not possible. Change it to MessageFactory.unserialize to support this., takes a string.

@e3rd
Copy link
Member Author

e3rd commented May 15, 2017

Wow, thanks a lot for such a thorough feedback. I'll be working on suggestions now and let you now in the thread!

@e3rd e3rd closed this May 15, 2017
@e3rd e3rd deleted the bot_debugger branch May 15, 2017 19:14
@e3rd e3rd mentioned this pull request May 15, 2017
6 tasks
@ghost
Copy link

ghost commented May 17, 2017 via email

@ghost
Copy link

ghost commented May 17, 2017 via email

@e3rd
Copy link
Member Author

e3rd commented May 17, 2017

Okok, I misunderstood you before, catching ValueError is definetely better!

Everything seems implemented. Please take a look at it :)

@ghost ghost modified the milestones: v1.1 Feature release, v1.0 Stable Release Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants