Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparently TagPidSequenceNr is reset to 1 when the application is restarted causing duplicates #312

Closed
glammers1 opened this issue Feb 8, 2018 · 6 comments
Labels
Milestone

Comments

@glammers1
Copy link

I'm doing some tests with the default configuration and, in my humble opinion, there're some unexpected behavior.

I have an actor that receives Increment messages to increase a counter. When the application is running from scratch everything works fine, the events Incremented are persisted in the messages table and tag_view table, also tag_write_progress looks pretty good.

(I've simplified the example for the shape of simplicity, just sending 4 Increment messages)

messages:

+-------+-------------+-------+
|  pid  | sequence_nr | tags  |
+-------+-------------+-------+
| myPid |           1 | myTag |
| myPid |           2 | myTag |
| myPid |           3 | myTag |
| myPid |           4 | myTag |
+-------+-------------+-------+

tag_views:

+----------+-------------+---------------------+
| tag_name | sequence_nr | tag_pid_sequence_nr |
+----------+-------------+---------------------+
| myTag    |           1 |                   1 |
| myTag    |           2 |                   2 |
| myTag    |           3 |                   3 |
| myTag    |           4 |                   4 |
+----------+-------------+---------------------+

tag_write_progress:

+-------+-------+-------------+---------------------+
|  pid  |  tag  | sequence_nr | tag_pid_sequence_nr |
+-------+-------+-------------+---------------------+
| myPid | myTag |           4 |                   4 |
+-------+-------+-------------+---------------------+

Next steps are: stop, re-run the application and send a new Increment message. A TagPidSequenceNr with value 1 is assigned to the Ìncremented event. Tables change to:

messages:

+-------+-------------+-------+
|  pid  | sequence_nr | tags  |
+-------+-------------+-------+
| myPid |           1 | myTag |
| myPid |           2 | myTag |
| myPid |           3 | myTag |
| myPid |           4 | myTag |
| myPid |           5 | myTag |
+-------+-------------+-------+

tag_views:

+----------+-------------+---------------------+
| tag_name | sequence_nr | tag_pid_sequence_nr |
+----------+-------------+---------------------+
| myTag    |           1 |                   1 |
| myTag    |           2 |                   2 |
| myTag    |           3 |                   3 |
| myTag    |           4 |                   4 |
| myTag    |           5 |                   1 | <--- 1?
+----------+-------------+---------------------+

tag_write_progress:

+-------+-------+-------------+---------------------+
|  pid  |  tag  | sequence_nr | tag_pid_sequence_nr |
+-------+-------+-------------+---------------------+
| myPid | myTag |           5 |                   1 | <--- 1?
+-------+-------+-------------+---------------------+

Now, if we read the journal with NoOffset, the counter value is 4 (not 5) because the last event is dropped with a log message:

Duplicate sequence number. Persistence id: counter-myPid. Tag: myTag. Expected sequence nr: 5. Actual 1. This will be dropped.

The state of the actor is not correct and, for example, the following 3 Increment messages also will be dropped until the tag_pid_sequence_nr = 5.

@chbatey
Copy link
Member

chbatey commented Feb 9, 2018

Hi @glammers1 This looks broken. I just did your scenario but it worked and we have a test that does this. Is this something that's happening deterministically for you? If so can you send me the logs @ DEBUG level?

@chbatey
Copy link
Member

chbatey commented Feb 12, 2018

The logs i'd be interested in are (we left pretty verbose debug logging in for this initial version):

Seeing the recovery of the persistent actor, what sequence number from:

  • Recovering pid {} from {} to {}
  • Starting recovery with tag progress: {}. From {} to {} (this tells us what tag progress managed to be saved, this is best effort, so may be behind).

Then any of the missing tag writes being sent:

  • Tag write not in progress. Sending to TagWriter. Tag {} Sequence Nr {}.
  • Sequence nr > than write progress. Sending to TagWriter. Tag {} Sequence Nr {}.

Any any tag progress updates

  • Updating tag progress: {}

@chbatey chbatey added the bug label Feb 12, 2018
@chbatey
Copy link
Member

chbatey commented Feb 12, 2018

I think this happens when you have a snapshot for the latest event when you close down.

The journal currently uses the replay messages to restore tag pid sequence numbers and write any missed tag_writes causes by a hard/crash shutdown.

I would suggest not using 0.80 if you use snapshots until we find a solution for this

@glammers1
Copy link
Author

glammers1 commented Feb 12, 2018

Hi,

As you have said the issue is related with snapshotting. You can use this PoC to reproduce the issue (I think is easier to use the PoC in DEBUG log level instead of copying the logs here in order to have full control of the problem, but I don't care to copy them here if you need it).

The project has the following:

  • Api to send commands to CounterPersistentActor.
  • CounterPersistentActor:
    • Persists two tagged events per command received with tags "all" and "myTag".
    • One of the event update the state of the actor and the other one (MessageProcessed) is just information.
    • Saves the snapshot for each persistAll when the MessageProcessed is handled.

Note: if I move the saveSnapshot from the line 33 after the line 30 the problem disappears. i.e:

persistAll(
        List(Tagged(Evt(data + state.counter), Set("all", "myTag")),
             Tagged(MessageProcessed, Set("all", "myTag")))) {
        case Tagged(evt @ Evt(_), _) =>
          updateState(evt)
          saveSnapshot(state)
        case Tagged(_, _) =>
          context.system.log.info("message processed")
      }
  • ProjectionActor with the CassandraReadJournal. Just logs the events read.

Steps to reproduce:

  1. Use docker-compose in docker folder to bootstrap Cassandra.
  2. Execute the app sbt akka-persistence-cassandra-poc/run.
  3. Send some Increment commands (one is enough) using POST verb to localhost:8888/counters/increment.
  4. Stop the application.
  5. Execute again sbt akka-persistence-cassandra-poc/run.
  6. Send other Increment command.
  7. You are done.

@chbatey
Copy link
Member

chbatey commented Feb 12, 2018

Thanks for the comprehensive reproducer.

The reason moving the saveSnapshot fixes it as the bug only happens if the last event you do before shutting down is in the latest snapshot.

I've tested your project with the PR I raised (#316) and it fixes the issue.

@chbatey chbatey added this to the 0.81 milestone Feb 13, 2018
@chbatey
Copy link
Member

chbatey commented Feb 13, 2018

Fixed by #316

@chbatey chbatey closed this as completed Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants