New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cases where projections are just buffering #1303

Merged
merged 1 commit into from May 15, 2017

Conversation

2 participants
@pgermishuys
Member

pgermishuys commented May 15, 2017

There exists a couple obscure cases where projections appear to just
buffer events instead of just processing them.

The fixes are as follows

  • Ensure that everything from the Core Projection Queue is cleaned up,
    even the Staged Processing Queue. This could cause some items not to be
    processed and just buffer events to be processed.
  • In the Checkpoint Manager, we might still receive progress updates
    from writes that have completed from before the restart
  • In the Core Projection mark the existing checkpoint work item as
    complete, we have cleared everything else, we cannot expect it to ever
    complete as it failed and we restarted the projection because the
    failure.
  • In the Core Projection, we might still be observing messages from the
    message pump (CoreTick) from prior to restarting the projection, in this
    case it could be while the projection is in a loading state. In this
    case, we just explicitly check for it and mark the tick as having been
    processed to ensure that it's in a prestine state waiting for the next
    processing to happen.

1. Projection writes a checkpoint with a WrongExpectedVersion

Reproduction Steps

  • Manually write a valid checkpoint event to the projection's checkpoint
    stream
  • Write enough events for the projection to checkpoint and watch it
    attempt to restart itself because of the WrongExpectedVersion
  • The Projection will no longer process any events but merely buffer
    them.

2. After restart a projection still attempts to process writes coming

back from before the restart

Reproduction Steps

  • Slow down writes to the projection's designated stream.
    e.g. $et-TakeSomeSpaceEvent in the StorageWriterService

  • Ensure that the EmittedStream for the Projection fails to write a set
    of events due to a WrongExpectedVersion

  • Watch the projection attempt to restart itself and note a couple of
    issues

  • The Checkpoint Manager receives progress updates while it's still
    initializing

  • The Core Projection is on a restart not resetting some of it's state
    such as the fact that it cannot have a pending checkpoint work item as
    we have restarted the projection

  • The core projection pump (CoreTick) could potentially have been
    scheduled which will throw since the CoreProjection is still starting up

  • Where are the tests?

I attempted to write a test but the test setup does not replicate the
issues as described above. This needs to be investigated and corrected
as making these changes without tests is scary

Fix cases where projections are just buffering
There exists a couple obscure cases where projections appear to just
buffer events instead of just processing them.

The fixes are as follows
- Ensure that everything from the Core Projection Queue is cleaned up,
even the Staged Processing Queue. This could cause some items not to be
processed and just buffer events to be processed.
- In the Checkpoint Manager, we might still receive progress updates
from writes that have completed from before the restart
- In the Core Projection mark the existing checkpoint work item as
complete, we have cleared everything else, we cannot expect it to ever
complete as it failed and we restarted the projection because the
failure.
- In the Core Projection, we might still be observing messages from the
message pump (CoreTick) from prior to restarting the projection, in this
case it could be while the projection is in a loading state. In this
case, we just explicitly check for it and mark the tick as having been
processed to ensure that it's in a prestine state waiting for the next
processing to happen.

1. Projection writes a checkpoint with a WrongExpectedVersion

Reproduction Steps
- Manually write a valid checkpoint event to the projection's checkpoint
stream
- Write enough events for the projection to checkpoint and watch it
attempt to restart itself because of the WrongExpectedVersion
- The Projection will no longer process any events but merely buffer
them.

2. After restart a projection still attempts to process writes coming
back from before the restart

Reproduction Steps
- Slow down writes to the projection's designated stream.
e.g. $et-TakeSomeSpaceEvent in the StorageWriterService
- Ensure that the EmittedStream for the Projection fails to write a set
of events due to a WrongExpectedVersion
- Watch the projection attempt to restart itself and note a couple of
issues

- The Checkpoint Manager receives progress updates while it's still
initializing
- The Core Projection is on a restart not resetting some of it's state
such as the fact that it cannot have a pending checkpoint work item as
we have restarted the projection
- The core projection pump (CoreTick) could potentially have been
scheduled which will throw since the CoreProjection is still starting up

- Where are the tests?

I attempted to write a test but the test setup does not replicate the
issues as described above. This needs to be investigated and corrected
as making these changes without tests is scary
@hayley-jean

hayley-jean approved these changes May 15, 2017 edited

Tested this with the following:

  1. Change the projection checkpoint count to 10.
  2. Disable all projections but $by_event_type
  3. Add a line in StorageWriterService that will slow writes down by 500ms for the stream $et-TakeSomeSpaceEvent
  4. Start the event store with the projection running.
  5. Using the testclient, write 18 events - this causes one checkpoint to be written for $et
  6. Write a fake checkpoint event to the $et stream, using the same metadata as the previous event in that stream.
  7. Write a fake emitted event to the $et-TakeSomeSpaceEvent stream, also using the same metadata as the previous event.
  8. Write one more event using the test client.

Before this change, this would cause the projection to fault. After this change, the projection restarts and continues processing and emitting events, as well as writing checkpoints.

I've also verified that if an event is written with incorrect metadata the projection still faults saying that the stream has been written to from the outside.

@hayley-jean hayley-jean merged commit 3221601 into release-v4.0.2 May 15, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
wercker/build-mono4 Wercker pipeline passed
Details

@hayley-jean hayley-jean deleted the projection-buffering-events branch May 15, 2017

pgermishuys added a commit that referenced this pull request May 18, 2017

Merge pull request #1303 from EventStore/projection-buffering-events
Fix cases where projections are just buffering

hayley-jean added a commit that referenced this pull request Jul 31, 2017

Merge pull request #1303 from EventStore/projection-buffering-events
Fix cases where projections are just buffering

pgermishuys added a commit that referenced this pull request Aug 4, 2017

Merge pull request #1303 from EventStore/projection-buffering-events
Fix cases where projections are just buffering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment