Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using `Stream#peek` for counting items in paginated result set. #6256

Merged
merged 3 commits into from Aug 7, 2019

Conversation

@dennisoelkers
Copy link
Member

commented Aug 7, 2019

Description

Motivation and Context

Before this change, in PaginatedDbService#findPaginatedWithQueryFilterAndSort a single stream was used to filter (based on the query), paginate (based on page size and page number) and determine the number of total items (before pagination). The latter is performed using Stream#peek, which has the following characteristics:

  1. The docs say: This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline
  2. peek is an intermediate operation on a stream, which means that
    a) it's actual execution is driven by the terminal operation of the stream
    b) it has the benefit in this case that a single stream execution is able to perform multiple tasks

The code before this PR relies on peek being executed for each item on the stream before pagination (skip/limit) is performed. This is the case for OracleJDK up to and including 1.8.0_202.

Unfortunately OpenJDK (tested with 1.8.0_222) is performing an optimization, where the actual stream execution is determined by the terminal operation (see 2a) and peek only sees items after filtering and pagination (Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.).

Therefore, this change is replacing the usage of peek with a simpler variant, which is using two separate stream instantiations for counting the total items and the actual pagination.

Fixes #6254.

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
Avoid using `Stream#peek` for counting items in paginated result set.
Before this change, in
`PaginatedDbService#findPaginatedWithQueryFilterAndSort` a single stream
was used to filter (based on the query), paginate (based on page size
and page number) and determine the number of total items (before
pagination). The latter is performed using `Stream#peek`, which has the
following characteristics:

  1. The docs say: `This method exists mainly to support debugging, where
you want to see the elements as they flow past a certain point in a
pipeline`
  2. `peek` is an intermediate operation on a stream, which means that
    a) it's actual execution is driven by the terminal operation of the stream
    b) it has the benefit in this case that a single stream execution is
able to perform multiple tasks

The code before this PR relies on `peek` being executed for each item
on the stream before pagination (`skip`/`limit`) is performed. This is
the case for OracleJDK up and including 1.8.0_202.

Unfortunately OpenJDK is performing an optimization, where the actual
stream execution is determined by the terminal operation (see 2a) and
`peek` only sees items after filtering _and_ pagination.

Therefore, this change is replacing the usage of `peek` with a simpler
variant, which is using two separate stream instantiations for counting
the total items and the actual pagination.

Fixes #6254.

@dennisoelkers dennisoelkers added this to the 3.1.0 milestone Aug 7, 2019

@dennisoelkers dennisoelkers requested review from bernd, kmerz and mpfz0r Aug 7, 2019

@bernd bernd self-assigned this Aug 7, 2019

@bernd
Copy link
Member

left a comment

Good catch! 👍 A few comments inline.

@bernd
bernd approved these changes Aug 7, 2019
Copy link
Member

left a comment

Thank you! 👍

@bernd bernd merged commit 70dc8e7 into master Aug 7, 2019

4 checks passed

ci-web-linter Jenkins build graylog-pr-linter-check 4074 has succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
graylog-project/pr Jenkins build graylog-project-pr-snapshot 5095 has succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@bernd bernd deleted the issue-6254 branch Aug 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.