New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow CSV export - chunk window too small #5172

Closed
dhohengassner opened this Issue Oct 3, 2018 · 9 comments

Comments

Projects
None yet
7 participants
@dhohengassner

dhohengassner commented Oct 3, 2018

Problem Description

At our System search queries performing pretty ok but csv exports are very slow.

We enabled debugging (was not enabled before) and think our issue is related to a too small scroll chunk size.

2018-10-03T07:46:28.828Z DEBUG [ScrollResult] [fcc06295aa85fe3fc77517a5f09ff229][399] New scroll id <ScrollID>, number of hits in chunk: 10
2018-10-03T07:46:28.828Z DEBUG [SearchResource] [fcc06295aa85fe3fc77517a5f09ff229] Writing scroll chunk with 10 messages
2018-10-03T07:46:28.828Z DEBUG [ScrollChunkWriter] [258] Writing chunk 399

Steps to Reproduce (for bugs)

Using the API we facing slow download speed:

curl -H 'Accept: text/csv' -X GET 'localhost:9000/api/search/universal/relative/export?query=msdhcp&range=0&fields=message' --output test.csv

2018-10-03T08:29:03.181Z DEBUG [ScrollResult] [<ID>] Starting scroll request for query msdhcp
2018-10-03T08:29:03.181Z DEBUG [SearchResource] [<ID>] Scroll result contains a total of 4029 messages
2018-10-03T08:29:03.185Z DEBUG [ScrollResult] [<ID>][0] New scroll id <ScrollID>, number of hits in chunk: 10
2018-10-03T08:29:03.186Z DEBUG [SearchResource] [<ID>] Writing scroll chunk with 10 messages
2018-10-03T08:29:03.187Z DEBUG [ScrollChunkWriter] [261] Writing chunk 0
2018-10-03T08:29:03.207Z DEBUG [ScrollResult] [<ID>][1] New scroll id <ScrollID>, number of hits in chunk: 10
2018-10-03T08:29:03.207Z DEBUG [SearchResource] [<ID>] Writing scroll chunk with 10 messages
2018-10-03T08:29:03.207Z DEBUG [ScrollChunkWriter] [261] Writing chunk 1
.. reading the next chunks
2018-10-03T08:29:05.230Z DEBUG [ScrollResult] [<ID>][402] New scroll id <ScrollID>, number of hits in chunk: 9
2018-10-03T08:29:05.231Z DEBUG [SearchResource] [<ID>] Writing scroll chunk with 9 messages
2018-10-03T08:29:05.231Z DEBUG [ScrollChunkWriter] [261] Writing chunk 402
2018-10-03T08:29:05.234Z DEBUG [ScrollResult] [<ID>] Reached end of scroll results.
2018-10-03T08:29:05.234Z DEBUG [SearchResource] [<ID>] Reached end of scroll result.

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4242    0  4242    0     0   7233      0 --:--:-- --:--:-- --:--:--  7238

If we set a limit the chunk size increases to the limit size - but as expected we just get the limited amount of messages. That means in this case we get only 1 chunk with 18 messages - queried for 4029 messages:

curl -H 'Accept: text/csv' -X GET 'localhost:9000/api/search/universal/relative/export?query=msdhcp&range=0&fields=message&limit=18' --output test.csv

2018-10-03T08:28:34.215Z DEBUG [ScrollResult] [<ID>] Starting scroll request for query msdhcp
2018-10-03T08:28:34.215Z DEBUG [SearchResource] [<ID>] Scroll result contains a total of 4029 messages
2018-10-03T08:28:34.217Z DEBUG [ScrollResult] [<ID>][0] New scroll id <ScrollID>, number of hits in chunk: 18
2018-10-03T08:28:34.217Z DEBUG [SearchResource] [<ID>] Writing scroll chunk with 18 messages
2018-10-03T08:28:34.217Z DEBUG [ScrollChunkWriter] [260] Writing chunk 0
2018-10-03T08:28:34.234Z DEBUG [ScrollResult] [<ID>] clearScroll for query successful: true

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  924k    0  924k    0     0   437k      0 --:--:--  0:00:02 --:--:--  438k

Further investigations

It looks to me this is the place the chunk size gets calculated:
build new chunk
get hits for new chunk
ES API method to get hits

It looks like the 10 messages for chunk size is a ES default and I found no possibility to overwrite this value.

In older issues it seems there was an default for the chunk size of 500.

To me it looks like a bug that we cannot set the chunk size for the csv export and I do not understand the relationship to the limit parameter.

  • Graylog Version: 2.4.4
  • Elasticsearch Version: 5.6.10
  • Operating System: ubuntu 16.04
@universam1

This comment has been minimized.

universam1 commented Oct 3, 2018

Good catch, looks like a bug, I was searching for a way to define the scroll window as well without success.

@dhohengassner dhohengassner changed the title from Slow CSV export to Slow CSV export - chunk window too small Oct 3, 2018

@kroepke kroepke added the bug label Oct 3, 2018

@kroepke

This comment has been minimized.

Member

kroepke commented Oct 3, 2018

@dhohengassner Indeed, this seems to have been broken when switching to the HTTP client.
The relevant change is 4213a22#diff-0535e1ddeae5e21d28fb695a2f63f1f8L215

Previously the 500 was hardcoded, since then it defaulted to the elasticsearch default value, which is obviously way too low.

Thanks for all the work that you put into tracking this down, I'll check with the team for a target version of the fix.

I'm thinking we should default back to 500 and at the same time allowing the API to set the scroll chunk size explicitly.

@dhohengassner

This comment has been minimized.

dhohengassner commented Oct 3, 2018

Thanks for taking a look into that.

I would appreciate a parameter in the config file to adapt this setting.
This would allow a custom setting for UI users I think.

Thanks for your work!

@jonattta

This comment has been minimized.

jonattta commented Oct 3, 2018

Looking forward to that fix as well

@keaoner

This comment has been minimized.

keaoner commented Oct 3, 2018

And me to 👍

@bernd bernd added this to the 2.5.0 milestone Oct 8, 2018

@jalogisch jalogisch added the triaged label Oct 15, 2018

@keaoner

This comment has been minimized.

keaoner commented Oct 22, 2018

hi,
Do you have a release date for graylog version 2.5? because we are waiting for the patch for csv downloads before going into production

best regards

@kroepke

This comment has been minimized.

Member

kroepke commented Oct 24, 2018

I'm sorry, we don't have a date set yet.

@universam1

This comment has been minimized.

universam1 commented Nov 5, 2018

Since this is confirmed bug, does it really need to wait on a version upgrade or cannot it be considered as part of a bugfix release earlier?

bernd added a commit that referenced this issue Nov 17, 2018

Fix slow CSV export by using a bigger default batch size
The default batch size in Elasticsearch is 10. We now use 500 as default
for the newly introduced "batch_size" parameter which we pass to the
search request as limit. (which is the batch size for scroll requests)

Previously we used the "limit" parameter for the batch size AND the
result limit. That was wrong and resulted in truncated results.

Fixes #5172
@bernd

This comment has been minimized.

Member

bernd commented Nov 17, 2018

@keaoner @universam1 The Graylog 2.5 release including this bug fix will be available in the first week of December.

edmundoa added a commit that referenced this issue Nov 21, 2018

Fix slow CSV export by using a bigger default batch size (#5289)
The default batch size in Elasticsearch is 10. We now use 500 as default
for the newly introduced "batch_size" parameter which we pass to the
search request as limit. (which is the batch size for scroll requests)

Previously we used the "limit" parameter for the batch size AND the
result limit. That was wrong and resulted in truncated results.

Fixes #5172

bernd added a commit that referenced this issue Nov 21, 2018

Fix slow CSV export by using a bigger default batch size (#5289)
The default batch size in Elasticsearch is 10. We now use 500 as default
for the newly introduced "batch_size" parameter which we pass to the
search request as limit. (which is the batch size for scroll requests)

Previously we used the "limit" parameter for the batch size AND the
result limit. That was wrong and resulted in truncated results.

Fixes #5172

(cherry picked from commit a77756d)

edmundoa added a commit that referenced this issue Nov 21, 2018

Fix slow CSV export by using a bigger default batch size (#5289) (#5304)
The default batch size in Elasticsearch is 10. We now use 500 as default
for the newly introduced "batch_size" parameter which we pass to the
search request as limit. (which is the batch size for scroll requests)

Previously we used the "limit" parameter for the batch size AND the
result limit. That was wrong and resulted in truncated results.

Fixes #5172

(cherry picked from commit a77756d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment