New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Export issue #4190

Closed
vladostp opened this Issue Sep 29, 2017 · 9 comments

Comments

Projects
None yet
6 participants
@vladostp

vladostp commented Sep 29, 2017

The CSV export fails when I am trying to export a search result from more than 7 indices.

Expected Behavior

The exported CSV files contains the search result data.

Current Behavior

The exported CSV file contains only 10 lines

Possible Solution

Steps to Reproduce (for bugs)

  1. Make a search in more than 7 indices
  2. Try to make an CSV export
  3. The exported CSV file will contain only 10 lines

Context

I am trying to export data to CSV file

Your Environment

  • Graylog Version: 2.3.1+9f2c6ef
  • Elasticsearch Version: 5.6.2
  • MongoDB Version: 2.4.10
  • Operating System: Debian 8.9
  • Browser version: Firefox 55.0.2
@joschi

This comment has been minimized.

Contributor

joschi commented Sep 29, 2017

@vladostp Please attach the contents of the "index_ranges" collection in MongoDB (e. g. using mongoexport) and some example requests including their complete response which show the problem.

@vladostp

This comment has been minimized.

vladostp commented Sep 29, 2017

index_ranges.txt

I am really sorry but I can't post any responses of requests cause it contains some sensitive data.

But I tried different type of searches in different sets of indices and the behaviour was always the same:

  • Search in 7 indices or less => Success
  • Search in 8 or more indices => CSV File with 10 lines
@joschi

This comment has been minimized.

Contributor

joschi commented Sep 29, 2017

I am really sorry but I can't post any responses of requests cause it contains some sensitive data.

You can of course redact the sensitive parts of the requests and responses in a sensible (and consistent) way, but without you providing any details we'll be unable to reproduce the issue and thus won't be able to help you.

@vladostp

This comment has been minimized.

vladostp commented Sep 29, 2017

  1. Search in the last 2 days an ip address
    Result: Found 1,267 messages in 6 ms, searched in 5 indices. => Success CSV Export
    resultgood.csv.zip
  2. Search in the last 5 days an ip address
    Result: Found 2,120 messages in 41 ms, searched in 11 indices => CSV Export Fails (11 lines file)
    resultbad.csv.zip
@joschi

This comment has been minimized.

Contributor

joschi commented Sep 29, 2017

@vladostp Okay, that's the literal CSV file.

Could you try posting the complete HTTP requests and responses (without the response body) when exporting these? You can find them in the Developer Console of your web browser (or just post the output of curl if you're using that).

Also take a look in the logs of your Graylog nodes and check them for useful information.

@vladostp

This comment has been minimized.

vladostp commented Sep 29, 2017

Good Request:

GET /api/search/universal/relative/export?query=X.X.X.X&range=172800&fields=timestamp HTTP/1.1
Host: graylog.domain.com
Authorization: Basic YwFjYzZzODQtNjhmYS00Z2I3LWE4AZEtnDbjNzUxNTgyMTk2OnNlc3Npb24=
User-Agent: curl/7.55.1
Accept: */*
Connection: close

Good Response:

HTTP/1.1 200 OK
Date: Fri, 29 Sep 2017 08:48:49 GMT
Server: Apache/2.4.10 (Debian)
Content-Disposition: attachment; filename=graylog-search-result-relative-172800.csv
X-Graylog-Node-ID: 6f4b7833-e1cc-48a4-b8d8-99d02888609a
X-Runtime-Microseconds: 11817
Content-Type: text/csv
Connection: close
Content-Length: 34896

"timestamp"
"2017-09-28T13:36:19.000Z" + 1,291 lines of result

Bad Request:

GET /api/search/universal/relative/export?query=X.X.X.X&range=432000&fields=timestamp HTTP/1.1
Host: graylog.domain.com
Authorization: Basic YwFjYzZzODQtNjhmYS00Z2I3LWE4AZEtnDbjNzUxNTgyMTk2OnNlc3Npb24=
User-Agent: curl/7.55.1
Accept: */*
Connection: close

Bad Response:

HTTP/1.1 200 OK
Date: Fri, 29 Sep 2017 08:44:29 GMT
Server: Apache/2.4.10 (Debian)
Content-Disposition: attachment; filename=graylog-search-result-relative-432000.csv
X-Graylog-Node-ID: 6f4b7833-e1cc-48a4-b8d8-99d02888609a
X-Runtime-Microseconds: 55360
Content-Type: text/csv
Connection: close
Content-Length: 282

"timestamp"
"2017-09-28T13:36:19.000Z" + 9 lines of result
@olideakin

This comment has been minimized.

olideakin commented Sep 29, 2017

We're seeing similar, requesting the CSV download through the web interface - for me it's the transition for 15 to 16 indices in the query that causes a 10 record CSV to be returned rather than the 10s of thousands expected.

@jalogisch jalogisch added the to-verify label Oct 2, 2017

@nickpeshek

This comment has been minimized.

nickpeshek commented Oct 10, 2017

There definitely does seem to be a problem when the scroll_id gets too large and it switches to POST rather than GET.

2017-10-10T18:43:02.777Z DEBUG [ScrollResult] [3389dae361af79b04c9c8e7057f60cc6][0] New scroll id DnF1ZXJ5VGhlbkZldGNoPAAAAAAAAAw3Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAC2RZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAADDgWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAw5Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAJcxZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXQWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAl2FlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAJdRZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXcWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAw6Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAMOxZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAADDwWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAw9Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAMPhZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAADD8WX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAALaFmo0eGc0dGpPVDhhSGlpYkEzeXgxSncAAAAAAAAC2xZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAADEAWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAxBFl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAMQhZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAACcgWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnGFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJxxZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAACckWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAxEFl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAC3BZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAADEMWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAALdFmo0eGc0dGpPVDhhSGlpYkEzeXgxSncAAAAAAAAMRRZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAACcsWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnEFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJxRZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAACcoWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnMFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJzRZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAADEYWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAxHFl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAJzhZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAACc8WRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnQFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJeBZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXkWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAl6FlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAC3hZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAABYMWZW5xTXhzcDlRaENpWnowTEpWQWpZUQAAAAAAAAWEFmVucU14c3A5UWhDaVp6MExKVkFqWVEAAAAAAAAFhRZlbnFNeHNwOVFoQ2laejBMSlZBallRAAAAAAAABYYWZW5xTXhzcDlRaENpWnowTEpWQWpZUQAAAAAAAAWHFmVucU14c3A5UWhDaVp6MExKVkFqWVEAAAAAAAAJexZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXwWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAl-FlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAJfRZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACX8WUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAmAFlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAJgRZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACYIWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAmDFlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAAHBZkbnJ3dkhEUlFiaUZSN1V5VUJSV0hnAAAAAAAAAt8WajR4ZzR0ak9UOGFIaWliQTN5eDFKdw==, number of hits in chunk: 10

So, scroll is bigger than 1900 so it switches over to POST via: https://github.com/searchbox-io/Jest/blob/v5.3.3/jest-common/src/main/java/io/searchbox/core/SearchScroll.java#L15

2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "POST /_search/scroll?scroll=1m HTTP/1.1[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "Content-Length: 2504[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "Content-Type: application/json; charset=UTF-8[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "Host: 127.0.0.1:9200[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "Connection: Keep-Alive[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_141)[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "Accept-Encoding: gzip,deflate[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "[\r][\n]" 2017-10-10T18:43:02.784Z DEBUG [wire] http-outgoing-196 >> "DnF1ZXJ5VGhlbkZldGNoPAAAAAAAAAw3Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAC2RZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAADDgWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAw5Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAJcxZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXQWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAl2FlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAJdRZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXcWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAw6Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAMOxZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAADDwWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAw9Fl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAMPhZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAADD8WX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAALaFmo0eGc0dGpPVDhhSGlpYkEzeXgxSncAAAAAAAAC2xZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAADEAWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAxBFl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAMQhZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAACcgWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnGFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJxxZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAACckWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAxEFl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAC3BZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAADEMWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAALdFmo0eGc0dGpPVDhhSGlpYkEzeXgxSncAAAAAAAAMRRZfcjNhYjgxaVFHLWU2UkpTNnhoMjd3AAAAAAAACcsWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnEFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJxRZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAACcoWRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnMFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJzRZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAADEYWX3IzYWI4MWlRRy1lNlJKUzZ4aDI3dwAAAAAAAAxHFl9yM2FiODFpUUctZTZSSlM2eGgyN3cAAAAAAAAJzhZGQll6ekNuYlNET2pxblVnMkRwa0F3AAAAAAAACc8WRkJZenpDbmJTRE9qcW5VZzJEcGtBdwAAAAAAAAnQFkZCWXp6Q25iU0RPanFuVWcyRHBrQXcAAAAAAAAJeBZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXkWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAl6FlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAC3hZqNHhnNHRqT1Q4YUhpaWJBM3l4MUp3AAAAAAAABYMWZW5xTXhzcDlRaENpWnowTEpWQWpZUQAAAAAAAAWEFmVucU14c3A5UWhDaVp6MExKVkFqWVEAAAAAAAAFhRZlbnFNeHNwOVFoQ2laejBMSlZBallRAAAAAAAABYYWZW5xTXhzcDlRaENpWnowTEpWQWpZUQAAAAAAAAWHFmVucU14c3A5UWhDaVp6MExKVkFqWVEAAAAAAAAJexZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACXwWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAl-FlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAJfRZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACX8WUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAmAFlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAJgRZROFVmb3lXWlJvbWUzSDFnZC1RS29nAAAAAAAACYIWUThVZm95V1pSb21lM0gxZ2QtUUtvZwAAAAAAAAmDFlE4VWZveVdaUm9tZTNIMWdkLVFLb2cAAAAAAAAAHBZkbnJ3dkhEUlFiaUZSN1V5VUJSV0hnAAAAAAAAAt8WajR4ZzR0ak9UOGFIaWliQTN5eDFKdw=="

That post body needs to be a JSON object like it is on new copies of jest.

My guess is that the code we're pulling in is this: https://github.com/searchbox-io/Jest/blob/v2.4.0/jest-common/src/main/java/io/searchbox/core/SearchScroll.java#L24

EDIT: I've confirmed that replacing the class with one that supports the JSON encoding does indeed fix this issue for us.

@NickMeves

This comment has been minimized.

NickMeves commented Oct 11, 2017

Can confirm -- we are seeing the same thing in our installation after the upgrade to 2.3.1. Haven't tested how many indices are int he query before this breaks down, but our exports only have 10 lines as well.

@joschi joschi self-assigned this Oct 16, 2017

joschi added a commit to graylog-labs/Jest that referenced this issue Oct 16, 2017

joschi added a commit to graylog-labs/Jest that referenced this issue Oct 16, 2017

@joschi joschi added this to the 2.4.0 milestone Oct 17, 2017

@wafflebot wafflebot bot added the in progress label Oct 17, 2017

joschi added a commit that referenced this issue Oct 17, 2017

Fix scroll queries using Searches#scroll()
Scroll queries were broken for a number of different reasons:

* The fork of Jest used by Graylog had a bug regarding scroll queries (searchbox-io/Jest#489, searchbox-io/Jest#491)
* The Elasticsearch Multi Search API doesn't support scroll queries (elastic/elasticsearch#18454)

Due to the default HTTP request length limit of Elasticsearch, `Search#scroll()` resolves the affected index sets
and uses the index wildcards instead of separate index names.

Fixes #4190

@bernd bernd closed this in #4262 Oct 19, 2017

bernd added a commit that referenced this issue Oct 19, 2017

Fix scroll queries using Searches#scroll() (#4262)
Scroll queries were broken for a number of different reasons:

* The fork of Jest used by Graylog had a bug regarding scroll queries (searchbox-io/Jest#489, searchbox-io/Jest#491)
* The Elasticsearch Multi Search API doesn't support scroll queries (elastic/elasticsearch#18454)

Due to the default HTTP request length limit of Elasticsearch, `Search#scroll()` resolves the affected index sets
and uses the index wildcards instead of separate index names.

Fixes #4190

bernd added a commit that referenced this issue Oct 19, 2017

Fix scroll queries using Searches#scroll() (#4262)
Scroll queries were broken for a number of different reasons:

* The fork of Jest used by Graylog had a bug regarding scroll queries (searchbox-io/Jest#489, searchbox-io/Jest#491)
* The Elasticsearch Multi Search API doesn't support scroll queries (elastic/elasticsearch#18454)

Due to the default HTTP request length limit of Elasticsearch, `Search#scroll()` resolves the affected index sets
and uses the index wildcards instead of separate index names.

Fixes #4190

(cherry picked from commit 7239f15)

joschi added a commit that referenced this issue Oct 19, 2017

Fix scroll queries using Searches#scroll() (#4262)
Scroll queries were broken for a number of different reasons:

* The fork of Jest used by Graylog had a bug regarding scroll queries (searchbox-io/Jest#489, searchbox-io/Jest#491)
* The Elasticsearch Multi Search API doesn't support scroll queries (elastic/elasticsearch#18454)

Due to the default HTTP request length limit of Elasticsearch, `Search#scroll()` resolves the affected index sets
and uses the index wildcards instead of separate index names.

Fixes #4190

(cherry picked from commit 7239f15)

bernd added a commit that referenced this issue Oct 19, 2017

Fix scroll queries using Searches#scroll() (#4262) (#4269)
Scroll queries were broken for a number of different reasons:

* The fork of Jest used by Graylog had a bug regarding scroll queries (searchbox-io/Jest#489, searchbox-io/Jest#491)
* The Elasticsearch Multi Search API doesn't support scroll queries (elastic/elasticsearch#18454)

Due to the default HTTP request length limit of Elasticsearch, `Search#scroll()` resolves the affected index sets
and uses the index wildcards instead of separate index names.

Fixes #4190

(cherry picked from commit 7239f15)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment