Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generating json export is very slow with lots of tags #6201

Closed
michael-kerscher opened this issue Aug 17, 2020 · 10 comments
Closed

generating json export is very slow with lots of tags #6201

michael-kerscher opened this issue Aug 17, 2020 · 10 comments
Labels
needs triage This issue has been automatically labelled and needs further triage T: bug Type: bug report: This issue describes unexpected behaviour T: potential bug Type: potential bug. This issue may be a bug

Comments

@michael-kerscher
Copy link

I noticed for a long time now that generating our cached json exports is really, really slow (in the order of several hours) and tried to find out the reason behind the issue.

I saw lots of requests to the MySQL database so I dumped some of the running queries with SHOW FULL PROCESSLIST;
During this "shotgun" debugging I saw this query happening all the time with lots of ids being the same over and over again:

SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`numerical_value` FROM <db>.`tags` AS `Tag`   WHERE `Tag`.`id` = <some_id> AND `exportable` = 1

The query itself is quite efficient on its own and fast but the amount of these queries is killing performance.

I wanted to get a rough overview on what is going on in the database so I tried to dump of all queries during an json export (via jobs) for just a few seconds.

Dumping happened with this:

set global general_log_file = "/var/log/mysql/queries.log";
set global general_log = "ON";
# let it work for a few seconds and turn it off to not burn your free disk space
set global general_log = "OFF";

this grew the log file very quick and I used this command line to analyze the query (and make some queries more generic to be able to count them better). The log below is ~5 secs long during an export

The last query basically kills the performance and makes the caching feature useless for us.
Even if you don't have very much tags this is queried over and over again in a separate query. The overhead of doing these queries slows down the entire caching process and makes it kind of unusable.

We might be using more tags than others (we're basically tagging each and every attribute with multiple tags…) so other's might not even notice but in our setup this is not working anymore.

I'm not sure where to point my finger at in the code as there is lots of magic and abstraction in it but maybe you have an idea on where to maybe cache the tags during an export and reuse these tags

Work environment

Questions Answers
Type of issue Bug
OS version (server) Ubuntu-18.04
OS version (client) not relevant
PHP version 7.2
MISP version / git hash below logs have been created with 2.4.124 but reproduced with 2.4.129
Browser not relevant

Expected behavior

don't query tags again and again

Actual behavior

same tags are queried again and again

Steps to reproduce the behavior

create events with lots of attributes and give each of the attributes a tag (basically lots of tags everywhere). Worst case, tag all the attributes with the same tag -> each tag is queried from the database

Logs, screenshots, configuration dump, ...

$ cat /var/log/mysql/queries.log | grep -v -e "authkey" -e Connect -e Quit -e INSERT -e UPDATE -e LIKE | sed -e "s/[0-9]\+ Query/<QueryID> Query/" -e "s/id\` = '\?[0-9a-f-]\+'\?/id\` = <some_id>/" -e "s/FROM \`[a-z_]\+\`/FROM <db>/" | sort | uniq -c | sort -h

<some random noise by clicking users with less occurence>
     36 		<QueryID> Query	SELECT `Attribute`.`id`, `Attribute`.`event_id`, `Attribute`.`object_id`, `Attribute`.`object_relation`, `Attribute`.`category`, `Attribute`.`type`, `Attribute`.`value1`, `Attribute`.`value2`, `Attribute`.`to_ids`, `Attribute`.`uuid`, `Attribute`.`timestamp`, `Attribute`.`distribution`, `Attribute`.`sharing_group_id`, `Attribute`.`comment`, `Attribute`.`deleted`, `Attribute`.`disable_correlation`, `Attribute`.`first_seen`, `Attribute`.`last_seen`, (CASE WHEN `Attribute`.`value2` = '' THEN `Attribute`.`value1` ELSE CONCAT(`Attribute`.`value1`, '|', `Attribute`.`value2`) END) AS  `Attribute__value` FROM <db>.`attributes` AS `Attribute`   WHERE `Attribute`.`id` = <some_id>   ORDER BY `Attribute`.`event_id` DESC  LIMIT 1
     36 		<QueryID> Query	SELECT COUNT(*) AS `count` FROM <db>.`attributes` AS `Attribute`   WHERE `Attribute`.`id` = <some_id>
     36 		<QueryID> Query	SELECT COUNT(*) AS `count` FROM <db>.`events` AS `Event`   WHERE `Event`.`id` = <some_id>
     41 		<QueryID> Query	COMMIT
     41 		<QueryID> Query	START TRANSACTION
     48 		<QueryID> Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`numerical_value` FROM <db>.`tags` AS `Tag`   WHERE `Tag`.`id` = <some_id> AND `exportable` = 1
  34266 		<QueryID> Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM <db>.`tags` AS `Tag`   WHERE `Tag`.`id` = <some_id> AND `exportable` = 1

To show the amount of redundant data which is fetched I attach a few lines from a testdb where a few tags (1,2,3) are set on some attributes - this is not generified, unsorted and not unique'ified (722 is a mysql id)

		   722 Query	SELECT `AttributeTag`.`id`, `AttributeTag`.`attribute_id`, `AttributeTag`.`event_id`, `AttributeTag`.`tag_id`, `AttributeTag`.`local` FROM `misp_db`.`attribute_tags` AS `AttributeTag`   WHERE `AttributeTag`.`attribute_id` IN (1, 2, 3, 4, 5)
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 2 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 3 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 1 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 2 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 3 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 2 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 3 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 1 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 2 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 3 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 1 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 2 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 3 AND `exportable` = 1
		   722 Query	SELECT `Tag`.`id`, `Tag`.`name`, `Tag`.`colour`, `Tag`.`exportable`, `Tag`.`org_id`, `Tag`.`user_id`, `Tag`.`hide_tag`, `Tag`.`numerical_value` FROM `misp_db`.`tags` AS `Tag`   WHERE `Tag`.`id` = 1 AND `exportable` = 1
		   722 Query	SELECT `ShadowAttribute`.`id`, `ShadowAttribute`.`type`, `ShadowAttribute`.`category`, `ShadowAttribute`.`to_ids`, `ShadowAttribute`.`uuid`, `ShadowAttribute`.`event_uuid`, `ShadowAttribute`.`event_id`, `ShadowAttribute`.`old_id`, `ShadowAttribute`.`comment`, `ShadowAttribute`.`org_id`, `ShadowAttribute`.`proposal_to_delete`, `ShadowAttribute`.`timestamp`, `ShadowAttribute`.`first_seen`, `ShadowAttribute`.`last_seen`, (CASE WHEN `ShadowAttribute`.`value2` = '' THEN `ShadowAttribute`.`value1` ELSE CONCAT(`ShadowAttribute`.`value1`, '|', `ShadowAttribute`.`value2`) END) AS  `ShadowAttribute__value` FROM `misp_db`.`shadow_attributes` AS `ShadowAttribute`   WHERE `ShadowAttribute`.`deleted` = '0' AND `ShadowAttribute`.`event_id` IN (1, 4)
@michael-kerscher michael-kerscher added T: bug Type: bug report: This issue describes unexpected behaviour needs triage This issue has been automatically labelled and needs further triage T: potential bug Type: potential bug. This issue may be a bug labels Aug 17, 2020
@michael-kerscher
Copy link
Author

michael-kerscher commented Aug 18, 2020

this might also affect viewing large events with lots of attribute tags in the webui in a negative way. With debug settings the same pattern is visible in the sql queries.

MISP-slow-export-screenshot

@iglocska
Copy link
Member

Thanks for the thorough investigation, I've added a caching mechanism for this internally so we should only grab each tag once per execution.

@michael-kerscher
Copy link
Author

Thank you for the very quick fix!
I will give it a quick test once the 2.4 branch is working again.

@iglocska
Copy link
Member

It's not working? o.O

@michael-kerscher
Copy link
Author

The Travis check is failing for recent commits and I get the following error in my Apache error log:

[Tue Aug 18 09:53:13.507847 2020] [php7:error] [pid 71] [client 127.0.0.1:41812] PHP Fatal error:  Uncaught InternalErrorException: Internal Server Error in /var/www/MISP/app/Lib/cakephp/lib/Cake/Error/ErrorHandler.php:261\nStack trace:\n#0 /var/www/MISP/app/Lib/cakephp/lib/Cake/Error/ErrorHandler.php(212): ErrorHandler::handleFatalError(256, '[MissingPluginE...', '/var/www/MISP/a...', 138)\n#1 [internal function]: ErrorHandler::handleError(256, '[MissingPluginE...', '/var/www/MISP/a...', 138, Array)\n#2 /var/www/MISP/app/Lib/cakephp/lib/Cake/Error/ErrorHandler.php(138): trigger_error('[MissingPluginE...', 256)\n#3 /var/www/MISP/app/Lib/cakephp/lib/Cake/Error/ErrorHandler.php(269): ErrorHandler::handleException(Object(InternalErrorException))\n#4 /var/www/MISP/app/Lib/cakephp/lib/Cake/Error/ErrorHandler.php(212): ErrorHandler::handleFatalError(1, 'Uncaught Intern...', '/var/www/MISP/a...', 261)\n#5 /var/www/MISP/app/Lib/cakephp/lib/Cake/Core/App.php(970): ErrorHandler::handleError(1, 'Uncaught Intern...', '/var/www/MISP/a...', 261, Array)\n#6 /var/www/MISP/app/Lib/cakephp/lib/Cake/Core/App.php(943): App::_checkFat in /var/www/MISP/app/Lib/cakephp/lib/Cake/Error/ErrorHandler.php on line 261

@iglocska
Copy link
Member

Woah, ok that's not nice. Will look into it

@iglocska
Copy link
Member

@JakubOnderka caught it already, should be fine now!

@michael-kerscher
Copy link
Author

thank you again!
I tested it in my test instance with the small example and it seems to work as expected. Each Tag is only fetched once from the DB 👍 (Small testcase went from 14 tag requests down to 3)
I'm eagerly waiting for this to be in a release so we can use it on our production instances - I expect a massive speed improvement :)

@michael-kerscher
Copy link
Author

To give a small feedback: Last week we updated and the time for generating the cache was reduced to 26% of the previous time. Additionally opening huge events in the webui is now at 50% of the previous time 👍

This was a great improvment!

@iglocska
Copy link
Member

Awesome news, cheers for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage This issue has been automatically labelled and needs further triage T: bug Type: bug report: This issue describes unexpected behaviour T: potential bug Type: potential bug. This issue may be a bug
Projects
None yet
Development

No branches or pull requests

2 participants