Apparently TagPidSequenceNr is reset to 1 when the application is restarted causing duplicates #312

glammers1 · 2018-02-08T21:53:15Z

I'm doing some tests with the default configuration and, in my humble opinion, there're some unexpected behavior.

I have an actor that receives Increment messages to increase a counter. When the application is running from scratch everything works fine, the events Incremented are persisted in the messages table and tag_view table, also tag_write_progress looks pretty good.

(I've simplified the example for the shape of simplicity, just sending 4 Increment messages)

messages:

+-------+-------------+-------+
|  pid  | sequence_nr | tags  |
+-------+-------------+-------+
| myPid |           1 | myTag |
| myPid |           2 | myTag |
| myPid |           3 | myTag |
| myPid |           4 | myTag |
+-------+-------------+-------+

tag_views:

+----------+-------------+---------------------+
| tag_name | sequence_nr | tag_pid_sequence_nr |
+----------+-------------+---------------------+
| myTag    |           1 |                   1 |
| myTag    |           2 |                   2 |
| myTag    |           3 |                   3 |
| myTag    |           4 |                   4 |
+----------+-------------+---------------------+

tag_write_progress:

+-------+-------+-------------+---------------------+
|  pid  |  tag  | sequence_nr | tag_pid_sequence_nr |
+-------+-------+-------------+---------------------+
| myPid | myTag |           4 |                   4 |
+-------+-------+-------------+---------------------+

Next steps are: stop, re-run the application and send a new Increment message. A TagPidSequenceNr with value 1 is assigned to the Ìncremented event. Tables change to:

messages:

+-------+-------------+-------+
|  pid  | sequence_nr | tags  |
+-------+-------------+-------+
| myPid |           1 | myTag |
| myPid |           2 | myTag |
| myPid |           3 | myTag |
| myPid |           4 | myTag |
| myPid |           5 | myTag |
+-------+-------------+-------+

tag_views:

+----------+-------------+---------------------+
| tag_name | sequence_nr | tag_pid_sequence_nr |
+----------+-------------+---------------------+
| myTag    |           1 |                   1 |
| myTag    |           2 |                   2 |
| myTag    |           3 |                   3 |
| myTag    |           4 |                   4 |
| myTag    |           5 |                   1 | <--- 1?
+----------+-------------+---------------------+

tag_write_progress:

+-------+-------+-------------+---------------------+
|  pid  |  tag  | sequence_nr | tag_pid_sequence_nr |
+-------+-------+-------------+---------------------+
| myPid | myTag |           5 |                   1 | <--- 1?
+-------+-------+-------------+---------------------+

Now, if we read the journal with NoOffset, the counter value is 4 (not 5) because the last event is dropped with a log message:

Duplicate sequence number. Persistence id: counter-myPid. Tag: myTag. Expected sequence nr: 5. Actual 1. This will be dropped.

The state of the actor is not correct and, for example, the following 3 Increment messages also will be dropped until the tag_pid_sequence_nr = 5.

The text was updated successfully, but these errors were encountered:

chbatey · 2018-02-09T13:06:40Z

Hi @glammers1 This looks broken. I just did your scenario but it worked and we have a test that does this. Is this something that's happening deterministically for you? If so can you send me the logs @ DEBUG level?

chbatey · 2018-02-12T07:42:46Z

The logs i'd be interested in are (we left pretty verbose debug logging in for this initial version):

Seeing the recovery of the persistent actor, what sequence number from:

Recovering pid {} from {} to {}
Starting recovery with tag progress: {}. From {} to {} (this tells us what tag progress managed to be saved, this is best effort, so may be behind).

Then any of the missing tag writes being sent:

Tag write not in progress. Sending to TagWriter. Tag {} Sequence Nr {}.
Sequence nr > than write progress. Sending to TagWriter. Tag {} Sequence Nr {}.

Any any tag progress updates

Updating tag progress: {}

chbatey · 2018-02-12T09:25:35Z

I think this happens when you have a snapshot for the latest event when you close down.

The journal currently uses the replay messages to restore tag pid sequence numbers and write any missed tag_writes causes by a hard/crash shutdown.

I would suggest not using 0.80 if you use snapshots until we find a solution for this

glammers1 · 2018-02-12T11:33:24Z

Hi,

As you have said the issue is related with snapshotting. You can use this PoC to reproduce the issue (I think is easier to use the PoC in DEBUG log level instead of copying the logs here in order to have full control of the problem, but I don't care to copy them here if you need it).

The project has the following:

Api to send commands to CounterPersistentActor.
CounterPersistentActor:
- Persists two tagged events per command received with tags "all" and "myTag".
- One of the event update the state of the actor and the other one (MessageProcessed) is just information.
- Saves the snapshot for each persistAll when the MessageProcessed is handled.

Note: if I move the saveSnapshot from the line 33 after the line 30 the problem disappears. i.e:

persistAll(
        List(Tagged(Evt(data + state.counter), Set("all", "myTag")),
             Tagged(MessageProcessed, Set("all", "myTag")))) {
        case Tagged(evt @ Evt(_), _) =>
          updateState(evt)
          saveSnapshot(state)
        case Tagged(_, _) =>
          context.system.log.info("message processed")
      }

ProjectionActor with the CassandraReadJournal. Just logs the events read.

Steps to reproduce:

Use docker-compose in docker folder to bootstrap Cassandra.
Execute the app sbt akka-persistence-cassandra-poc/run.
Send some Increment commands (one is enough) using POST verb to localhost:8888/counters/increment.
Stop the application.
Execute again sbt akka-persistence-cassandra-poc/run.
Send other Increment command.
You are done.

chbatey · 2018-02-12T14:24:37Z

Thanks for the comprehensive reproducer.

The reason moving the saveSnapshot fixes it as the bug only happens if the last event you do before shutting down is in the latest snapshot.

I've tested your project with the PR I raised (#316) and it fixes the issue.

chbatey · 2018-02-13T09:40:53Z

Fixed by #316

chbatey added the bug label Feb 12, 2018

TimMoore mentioned this issue Feb 12, 2018

Upgrade to latest version of akka-persistence-cassandra lagom/lagom#1195

Closed

chbatey added this to the 0.81 milestone Feb 13, 2018

chbatey closed this as completed Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apparently TagPidSequenceNr is reset to 1 when the application is restarted causing duplicates #312

Apparently TagPidSequenceNr is reset to 1 when the application is restarted causing duplicates #312

glammers1 commented Feb 8, 2018

chbatey commented Feb 9, 2018

chbatey commented Feb 12, 2018 •

edited

Loading

chbatey commented Feb 12, 2018

glammers1 commented Feb 12, 2018 •

edited

Loading

chbatey commented Feb 12, 2018

chbatey commented Feb 13, 2018

Apparently TagPidSequenceNr is reset to 1 when the application is restarted causing duplicates #312

Apparently TagPidSequenceNr is reset to 1 when the application is restarted causing duplicates #312

Comments

glammers1 commented Feb 8, 2018

chbatey commented Feb 9, 2018

chbatey commented Feb 12, 2018 • edited Loading

chbatey commented Feb 12, 2018

glammers1 commented Feb 12, 2018 • edited Loading

chbatey commented Feb 12, 2018

chbatey commented Feb 13, 2018

chbatey commented Feb 12, 2018 •

edited

Loading

glammers1 commented Feb 12, 2018 •

edited

Loading