Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to append block because it was to large causes next append to fail #6318

Closed
Zelldon opened this issue Feb 11, 2021 · 5 comments · Fixed by #6681
Closed

Failed to append block because it was to large causes next append to fail #6318

Zelldon opened this issue Feb 11, 2021 · 5 comments · Fixed by #6681
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround support Marks an issue as related to a customer support request

Comments

@Zelldon
Copy link
Member

Zelldon commented Feb 11, 2021

Describe the bug

It can happen that we fail to append a Block in the LeaderRole#appendEntry, which can cause that additional appends fail since the positions are not incremented by one anymore. The position is generated by the dispatcher and continues to grow.

Note: related to a support issue

To Reproduce

Find a way to write a record, which scratches the limit and reaches that via Raft metadata.

Expected behavior

That we reject the write and not generating the position. Or we accept gaps in positions.

Log/Stacktrace

Full Stacktrace

�[36mzeebe_broker0 |�[0m 2020-12-15 02:32:46.322 [] [raft-server-3-raft-partition-partition-6] ERROR io.atomix.raft.roles.LeaderRole - RaftServer{raft-partition-partition-6}{role=LEADER} - Failed to append entry ZeebeEntry{term=13, timestamp=1607999566266, lowestPosition=86985565, highestPosition=87001555}, because it was to large.
�[36mzeebe_broker0 |�[0m io.atomix.storage.StorageException$TooLarge: Entry size 4194325 exceeds maximum allowed bytes (4194304)
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.MappedJournalSegmentWriter.append(MappedJournalSegmentWriter.java:124) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.SegmentedJournalWriter.append(SegmentedJournalWriter.java:54) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.DelegatingJournalWriter.append(DelegatingJournalWriter.java:46) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.tryToAppend(LeaderRole.java:642) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.append(LeaderRole.java:614) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.safeAppendEntry(LeaderRole.java:700) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.lambda$appendEntry$10(LeaderRole.java:674) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) [atomix-utils-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.lang.Thread.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 2020-12-15 02:32:46.325 [] [raft-server-3-raft-partition-partition-6] ERROR io.zeebe.logstreams - Failed to append block with last event position 87001555.
�[36mzeebe_broker0 |�[0m io.atomix.storage.StorageException$TooLarge: Entry size 4194325 exceeds maximum allowed bytes (4194304)
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.MappedJournalSegmentWriter.append(MappedJournalSegmentWriter.java:124) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.SegmentedJournalWriter.append(SegmentedJournalWriter.java:54) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.DelegatingJournalWriter.append(DelegatingJournalWriter.java:46) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.tryToAppend(LeaderRole.java:642) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.append(LeaderRole.java:614) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.safeAppendEntry(LeaderRole.java:700) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.lambda$appendEntry$10(LeaderRole.java:674) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) [atomix-utils-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.lang.Thread.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 2020-12-15 02:32:46.331 [] [raft-server-3-raft-partition-partition-6] ERROR io.zeebe.logstreams - Failed to append block with last event position 87001556.
�[36mzeebe_broker0 |�[0m java.lang.IllegalStateException: Unexpected position 87001556 was encountered after position 86985564 when appending positions <87001556, 87001556>.
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.safeAppendEntry(LeaderRole.java:696) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.lambda$appendEntry$10(LeaderRole.java:674) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) [atomix-utils-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 	at java.lang.Thread.run(Unknown Source) [?:?]
�[36mzeebe_broker0 |�[0m 2020-12-15 02:32:46.483 [Broker-3-LogStream-6] [Broker-3-zb-actors-3] INFO  io.zeebe.logstreams - Close appender for log stream logstream-raft-partition-partition-6
�[36mzeebe_broker0 |�[0m 2020-12-15 02:32:46.620 [Broker-3-LogAppender-6] [Broker-3-zb-actors-3] ERROR io.zeebe.logstreams - Actor Broker-3-LogAppender-6 failed in phase STARTED.
�[36mzeebe_broker0 |�[0m io.atomix.storage.StorageException$TooLarge: Entry size 4194325 exceeds maximum allowed bytes (4194304)
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.MappedJournalSegmentWriter.append(MappedJournalSegmentWriter.java:124) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.SegmentedJournalWriter.append(SegmentedJournalWriter.java:54) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.storage.journal.DelegatingJournalWriter.append(DelegatingJournalWriter.java:46) ~[atomix-storage-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.tryToAppend(LeaderRole.java:642) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.append(LeaderRole.java:614) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.safeAppendEntry(LeaderRole.java:700) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.raft.roles.LeaderRole.lambda$appendEntry$10(LeaderRole.java:674) ~[atomix-cluster-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) ~[atomix-utils-0.26.0-SNAPSHOT.jar:0.26.0-SNAPSHOT]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
�[36mzeebe_broker0 |�[0m 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
�[36mzeebe_broker0 |�[0m 	at java.lang.Thread.run(Unknown Source) ~[?:?]

Environment:
Customer Support

@Zelldon Zelldon added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog support Marks an issue as related to a customer support request Status: Needs Priority severity/critical Marks a stop-the-world bug, with a high impact and no existing workaround labels Feb 11, 2021
@Zelldon Zelldon changed the title Failed to append block because it was to large causes inconsistent log Failed to append block because it was to large causes next append to fail Feb 11, 2021
@Zelldon Zelldon added severity/high Marks a bug as having a noticeable impact on the user with no known workaround and removed severity/critical Marks a stop-the-world bug, with a high impact and no existing workaround labels Feb 11, 2021
@npepinpe
Copy link
Member

Let's drop the limit on the Raft/Journal side - we already have a limit in the dispatcher and it is well handled, and there's no reason anymore to limit it downstream.

@npepinpe
Copy link
Member

Blocked by #6307

@npepinpe npepinpe added this to Planned in Zeebe Mar 24, 2021
@npepinpe npepinpe moved this from Planned to Ready in Zeebe Mar 25, 2021
@deepthidevaki deepthidevaki self-assigned this Mar 29, 2021
@npepinpe npepinpe moved this from Ready to In progress in Zeebe Mar 29, 2021
@npepinpe npepinpe moved this from In progress to Review in progress in Zeebe Mar 29, 2021
@ghost ghost closed this as completed in ed45c41 Mar 30, 2021
Zeebe automation moved this from Review in progress to Done Mar 30, 2021
@Zelldon
Copy link
Member Author

Zelldon commented Apr 7, 2021

@deepthidevaki this was not part of alpha 4 right? Because we have a error in alpha4 which seem to cause this again https://console.cloud.google.com/logs/viewer?expandAll=false&timestamp=2021-04-06T03:31:52.902000000Z&dateRangeStart=2021-04-06T02:31:52.902Z&dateRangeEnd=2021-04-06T04:31:52.902Z&project=camunda-cloud-240911&authuser=1&minLogLevel=0&customFacets=&limitCustomFacetWidth=true&advancedFilter=%0AlogName:%22stdout%22%0Aresource.type%3D%22k8s_container%22%0Aresource.labels.cluster_name%3D%22ultratest%22%0Aresource.labels.namespace_name%3D%2259d9bd73-099a-4605-b0f6-5d87d994be63-zeebe%22%0Aresource.labels.container_name%3D%22zeebe%22&interval=JUMP_TO_TIME&scrollTimestamp=2021-04-06T03:32:32.158671000Z&pinnedLogId=1wjesayg3vidfmg&pinnedLogTimestamp=2021-04-06T03:31:22.901757Z

We can see that a append fails because it is to big and this cause next appends also to fail

E 2021-04-06T03:31:22.901757Z Failed to append block with last event position 41958805. 
E 2021-04-06T03:31:22.902398Z Actor Broker-0-LogAppender-1 failed in phase STARTED. 
D 2021-04-06T03:31:22.902543Z DELETE dir /usr/local/zeebe/data/raft-partition/partitions/1/pending/27339747-3-41932643-41948344 
W 2021-04-06T03:31:22.903153Z Failed to delete pending snapshot FileBasedTransientSnapshot{directory=/usr/local/zeebe/data/raft-partition/partitions/1/pending/27339747-3-41932643-41948344, snapshotStore=io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore@2d22025c, metadata=FileBasedSnapshotMetadata{index=27339747, term=3, processedPosition=41932643, exporterPosition=41948344}} 
D 2021-04-06T03:31:22.903496Z Discard job io.zeebe.logstreams.impl.log.LogStorageAppender$$Lambda$1398/0x0000000840801040 QUEUED from fastLane of Actor Broker-0-LogAppender-1. 
D 2021-04-06T03:31:22.935643Z Failed to append block with last event position 41958806. This can happen during a leader change. 
D 2021-04-06T03:31:22.936057Z Failed to append block with last event position 41958807. This can happen during a leader change. 
D 2021-04-06T03:31:22.936296Z Failed to append block with last event position 41958808. This can happen during a leader change. 
D 2021-04-06T03:31:22.936462Z Failed to append block with last event position 41958809. This can happen during a leader change. 
D 2021-04-06T03:31:22.936619Z Failed to append block with last event position 41958810. This can happen during a leader change. 

Error group https://console.cloud.google.com/errors/CLLVoIbfidP0dQ?service=zeebe&time=P7D&project=camunda-cloud-240911&authuser=1

@deepthidevaki
Copy link
Contributor

Probably not part of alpha4. There is no release label.

@Zelldon
Copy link
Member Author

Zelldon commented Apr 7, 2021

Yes this would be also my assumption, just wanted to be sure.

@KerstinHebel KerstinHebel removed this from Done in Zeebe Mar 23, 2022
github-merge-queue bot pushed a commit that referenced this issue Mar 14, 2024
* fix: disable migrate button if xml could not be fetched

* test: add test for migrate button behaviour on process XML load error
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/high Marks a bug as having a noticeable impact on the user with no known workaround support Marks an issue as related to a customer support request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants