Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code #6504

Closed
pihme opened this issue Mar 8, 2021 · 10 comments · Fixed by #9731
Closed
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug severity/mid Marks a bug as having a noticeable impact but with a known workaround version:1.3.13 version:8.1.0-alpha4 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@pihme
Copy link
Contributor

pihme commented Mar 8, 2021

Describe the bug
Observed in logs: https://console.cloud.google.com/errors/CMHr6Oj_nM3WDQ?service=zeebe&time=P7D&refresh=off&project=camunda-cloud-240911

Log/Stacktrace

Full Stacktrace

java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
	at com.esotericsoftware.kryo.pool.SoftReferenceQueue.offer(SoftReferenceQueue.java:53) ~[kryo-4.0.2.jar:?]
	at com.esotericsoftware.kryo.pool.SoftReferenceQueue.offer(SoftReferenceQueue.java:33) ~[kryo-4.0.2.jar:?]
	at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.release(KryoPoolQueueImpl.java:52) ~[kryo-4.0.2.jar:?]
	at io.atomix.utils.serializer.NamespaceImpl.release(NamespaceImpl.java:321) ~[atomix-utils-0.26.1.jar:0.26.1]
	at io.atomix.utils.serializer.NamespaceImpl.serialize(NamespaceImpl.java:154) ~[atomix-utils-0.26.1.jar:0.26.1]
	at io.atomix.utils.serializer.FallbackNamespace.serialize(FallbackNamespace.java:71) ~[atomix-utils-0.26.1.jar:0.26.1]
	at io.atomix.storage.journal.MappedJournalSegmentWriter.append(MappedJournalSegmentWriter.java:112) ~[atomix-storage-0.26.1.jar:0.26.1]
	at io.atomix.storage.journal.SegmentedJournalWriter.append(SegmentedJournalWriter.java:54) ~[atomix-storage-0.26.1.jar:0.26.1]
	at io.atomix.storage.journal.DelegatingJournalWriter.append(DelegatingJournalWriter.java:46) ~[atomix-storage-0.26.1.jar:0.26.1]
	at io.atomix.raft.roles.PassiveRole.appendEntry(PassiveRole.java:710) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.roles.PassiveRole.appendEntry(PassiveRole.java:665) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.roles.PassiveRole.tryToAppend(PassiveRole.java:643) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.roles.PassiveRole.appendEntries(PassiveRole.java:587) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.roles.PassiveRole.handleAppend(PassiveRole.java:440) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.roles.ActiveRole.onAppend(ActiveRole.java:51) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.roles.FollowerRole.onAppend(FollowerRole.java:195) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.impl.RaftContext.lambda$registerHandlers$14(RaftContext.java:220) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.raft.impl.RaftContext.lambda$runOnContext$21(RaftContext.java:231) ~[atomix-cluster-0.26.1.jar:0.26.1]
	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) [atomix-utils-0.26.1.jar:0.26.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
	at java.lang.Thread.run(Unknown Source) [?:?]

Log

Error 2021-03-05 18:12:07.828 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:08.961 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:10.101 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:11.336 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:11.364 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:11.369 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:11.372 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:11.403 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:11.406 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:12.652 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:14.071 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:15.524 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:16.961 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:16.967 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:16.972 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:16.975 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:16.977 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:16.980 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:17.428 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:18.863 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:20.116 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:21.239 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:21.246 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:21.250 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:21.252 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:21.254 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:23.237 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:23.949 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.471 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.904 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.910 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.913 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.915 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.936 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job." 
Error 2021-03-05 18:12:25.939 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.953 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:25.955 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:27.248 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:28.136 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:29.911 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:29.917 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:30.442 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:30.647 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:30.650 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:30.652 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:30.656 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:30.658 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:30.659 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:32.106 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:33.215 CET "Uncaught exception in 'io.zeebe.snapshots.broker.impl.FileBasedSnapshotStore' in phase 'STARTED'. Continuing with next job."
Error 2021-03-05 18:12:34.031 CET "RaftServer{raft-partition-partition-1} - An uncaught exception occurred, transition to inactive role"
Error 2021-03-05 18:12:37.545 CET "Raft-1 failed, marking it as unhealthy"
Info 2021-03-05 18:12:37.626 CET "Disk space available again. Current available 10878291968 bytes" 

Environment:

  • OS: Linux
  • Zeebe Version: 0.26.1
  • Configuration: Camunda Cloud
@pihme pihme added the kind/bug Categorizes an issue or PR as a bug label Mar 8, 2021
@pihme
Copy link
Contributor Author

pihme commented Mar 8, 2021

No real insight into why this happened. One thing to note is that this happened after: #6505

Not sure if these two are related, but there is a correlation between the two symptoms.

How do memory mapped files work? Could it be that we cannot write to memory because we cannot allocate the disk space that is mapped to memory?

@pihme pihme added Impact: Availability severity/mid Marks a bug as having a noticeable impact but with a known workaround and removed Status: Needs Triage labels Mar 8, 2021
@npepinpe
Copy link
Member

npepinpe commented Mar 8, 2021

That's possible - we don't pre-allocate the file but we already define the mapping length. AFAIK, as dirty pages are flushed to disk, the file is then grown accordingly. It could be that we map the file with mapping length ~128MB with less than that left which results in us running out of disk space. However I would expect the disk watermarks to be breached before that and prevent the disk from really running out of space (barring any bugs). Still, it's a likely explanation.

With network storage, this can also happen if you lose access to the volume where the file was mapped - now GKE PVCs aren't common network storage like an NFS mount, so I'm not sure if that's possible there, but if so then it could be that too.

Could it also happen if we deleted the file in the meantime? That also shouldn't happen barring a bug, but it could be that. I know you can get internal error if you truncate the file, but I'm not sure what happens if you just plain delete it. Would you get an IOException or an InternalError? 🤔

@npepinpe
Copy link
Member

npepinpe commented Mar 9, 2021

@MiguelPires let's have a look at the status of this cluster - if it's not recovered we should treat it as an incident, otherwise we can treat it as a normal bug. Please let me know when you find out.

@MiguelPires
Copy link
Contributor

MiguelPires commented Mar 9, 2021

Status update: this cluster is gone but there's another one e521ec89-4c0d-49e8-b48a-e4c9668dcd4a in ultrachaos that presents the pattern. It goes out of disk space and then the unsafe access happens. The stacktrace is not very useful because the error happens asynchronously so the stack trace points to a log message where there is no direct memory access.
I checked with Immi and he was using both clusters (along with a few others) as test clusters.

Edit: Immi offered to recreate the cluster so we can reproduce this. Currently, I'm working on a support ticket but I'm documenting it here in case someone else picks this up.

@npepinpe
Copy link
Member

We need to decide what we want to do here. My understanding of the problem is that the InternalError is thrown where normally we would get an IOException. What I'd like to investigate is if this is the only possible case, as catching InternalError could be problematic if it's not equivalent to IOException in some cases here.

@npepinpe
Copy link
Member

This does not happen if you just delete the file, as the file descriptor still points to the old inode. However, I can confirm this will happen if you:

Write to a truncated mapped file

    final File f = new File("/home/nicolas/tmp/sandbox/test");
    final MappedByteBuffer buffer = IoUtil.mapNewFile(f, 1024 * 1024, true);
    buffer.position(512 * 1024).putLong(1L);
    try (final FileChannel c = FileChannel.open(f.toPath(), StandardOpenOption.WRITE)) {
      c.truncate(4096);
      c.force(true);
      buffer.putLong(2L);
    } finally {
      f.delete();
    }

Read from a truncated mapped file

    final File f = new File("/home/nicolas/tmp/sandbox/test");
    final MappedByteBuffer buffer = IoUtil.mapNewFile(f, 1024 * 1024, true);
    buffer.position(512 * 1024).putLong(1L);
    try (final FileChannel c = FileChannel.open(f.toPath(), StandardOpenOption.WRITE)) {
      c.truncate(4096);
      buffer.get(5000);
    } finally {
      f.delete();
    }

Insufficient disk space to grow the underlying file

To simulate this, we launch a Java container with a small test file that will try to write 2MB of data using a mapped byte buffer, but we only give it a 1MB disk.

  @Rule public final TemporaryFolder temporaryFolder = new TemporaryFolder();

  @Test
  public void shouldThrowInternalError() throws IOException, InterruptedException {
    final File file = temporaryFolder.newFile("test.java");
    final GenericContainer<?> c =
        new GenericContainer<>(DockerImageName.parse("azul/zulu-openjdk-alpine:11-jre-headless"));
    final Map<String, String> volumeOptions =
        Map.of("type", "tmpfs", "device", "tmpfs", "o", "size=1m");
    final ManagedVolume v =
        ManagedVolume.newVolume(cmd -> cmd.withDriver("local").withDriverOpts(volumeOptions));
    c.withCreateContainerCmdModifier(v::attachVolumeToContainer);
    Files.write(file.toPath(), List.of(
        "import java.io.IOException;",
        "import java.nio.MappedByteBuffer;",
        "import java.nio.channels.FileChannel;",
        "import java.nio.file.Path;",
        "import static java.nio.channels.FileChannel.MapMode.READ_ONLY;",
        "import static java.nio.channels.FileChannel.MapMode.READ_WRITE;",
        "import static java.nio.file.StandardOpenOption.*;",
        "public final class Test {",
          "public static void main(final String args[]) throws IOException {",
            "MappedByteBuffer mappedByteBuffer = null;",
            "try (FileChannel channel = FileChannel.open(Path.of(\"/usr/local/zeebe/data/test\"), CREATE, READ, WRITE)) {",
              "mappedByteBuffer = channel.map(READ_WRITE, 0, 2 * 1024 * 1024);",
              "int position = 0;",
              "while (position < mappedByteBuffer.capacity()) {",
                "mappedByteBuffer.put(position, (byte)0);",
                "mappedByteBuffer.force();",
                "position += 4096;",
              "}",
            "}",
          "}",
        "}"
    ));

    c.withCopyFileToContainer(MountableFile.forHostPath(file.toPath()), "/test.java")
        .withCreateContainerCmdModifier(cmd -> cmd.withEntrypoint("java"))
        .withCommand("/test.java")
        .withLogConsumer(new Slf4jLogConsumer(LoggerFactory.getLogger("test")));
    c.start();

    // give some time for it to compile and run the code, as well as print out the error
    Thread.sleep(10_000);
    c.stop();
  }

In my opinion we should handle these cases. I can't confirm that InternalError cannot be thrown for other cases however, but if we handle it specifically around the write and read cases, as close to the source as possible to avoid any errors. The out of disk space should be obvious, but the truncated file...it's a bit of a pain that we can't differentiate the cases. The OOD is easy to recover from, just retry when there's more disk space. The truncated file...you basically have to go back and start from the last correct offset. We don't truncate files however, so the only expected truncation could occur from the outside at the moment. We can assume for now that if we get this error, it's because of OOD, and we should handle it properly. If we ever do truncate files (maybe to save space or someting), then we should handle this, but that will require more coordination (as you have to basically detect the underlying FD is invalid, remap the file, pass this mapping to existing readers/writers in a thread safe way, etc.).

For now, let's focus on the OOD case, and add a comment that we assume it means OOD as we don't handle truncation.

@deepthidevaki deepthidevaki removed this from the Journal Refactor milestone Mar 29, 2021
@npepinpe npepinpe moved this from Planned to Ready in Zeebe Apr 23, 2021
@npepinpe
Copy link
Member

@deepthidevaki if you have time, could you have a look at this today or next week? Let me know if you have any questions.

@npepinpe npepinpe moved this from Ready to In progress in Zeebe Apr 26, 2021
@npepinpe npepinpe moved this from In progress to Planned in Zeebe Apr 26, 2021
@npepinpe
Copy link
Member

After discussion with @deepthidevaki, we think pre-allocating files is most likely a better option here.

@npepinpe
Copy link
Member

Options for pre-allocation:

  1. JNI bindings to fallocate (i.e. using jnr-ffi or something). Pros: most likely the fastest. Cons: only available on Linux.
  2. Copy a pre-allocated template file. Pros: OS/FS independent, can transfer between FileChannel to avoid going into the JVM. Cons: need to pre-allocate the template file on start up (i.e. if segment size config changed), need to copy the file anyway
  3. "Fill" with zero the way Agrona does - write a single 0 byte every page size and at the end, such that you've touched all expected pages, and force the file. Pros: OS/FS independent, flexible with different segment size. Cons: slowest (most likely), need to flush the whole file.

I would opt for 1, with a fallback when fallocate is not available to 3, primarily because our supported environment is Linux, so the fallback would only be for development/demo use cases, not for production.

@npepinpe
Copy link
Member

npepinpe commented Sep 8, 2021

Should be fixed by #7607

@npepinpe npepinpe removed this from Planned in Zeebe Oct 14, 2021
@npepinpe npepinpe added area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) and removed Impact: Availability labels Apr 11, 2022
This was referenced Jul 5, 2022
zeebe-bors-camunda bot added a commit that referenced this issue Jul 11, 2022
9731: Preallocate segment files r=npepinpe a=npepinpe

## Description

This PR introduces segment file pre-allocation in the journal. This is on by default, but can be disabled via an experimental configuration option.

At the moment, the pre-allocation is done in a "dumb" fashion - we allocate a 4Kb blocks of zeroes, and write this until we've reached the expected file length. Note that this means there may be one extra block allocated on disk.

One thing to note, to verify this, we used [jnr-posix](https://github.com/jnr/jnr-posix). The reason behind this is we want to know the actual number of blocks on disk reserved for this file. `Files#size`, or `File#length`, return the reported file size, which is part of the file's metadata (on UNIX systems anyway). If you mmap a file with a size of 1Mb, write one byte, then flush it, the reported size will be 1Mb, but the actual size on disk will be a single block (on most modern UNIX systems anyway). By using [stat](https://linux.die.net/man/2/stat), we can get the actual file size in terms of 512-bytes allocated blocks, so we get a pretty accurate measurement of the actual disk space used by the file.

I would've like to capture this in a test utility, but since `test-util` depends on `util`, there wasn't an easy way to do this, so I just copied the method in two places. One possibility I thought of is moving the whole pre-allocation stuff in `journal`, since we only use it there. The only downside I can see there is about discovery and cohesion, but I'd like to hear your thoughts on this.

A follow-up PR will come which will optimize the pre-allocation by using the [posix_fallocate](https://man7.org/linux/man-pages/man3/posix_fallocate.3.html) on POSIX systems.

Finally, I opted for an experimental configuration option instead of a feature flag. My reasoning is that it isn't a "new" feature, but instead we want to option of disabling this (for performance reasons potentially). So it's more of an advanced option. But I'd also like to hear your thoughts here.

## Related issues

closes #6504
closes #8099
related to #7607  



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 11, 2022
9731: Preallocate segment files r=npepinpe a=npepinpe

## Description

This PR introduces segment file pre-allocation in the journal. This is on by default, but can be disabled via an experimental configuration option.

At the moment, the pre-allocation is done in a "dumb" fashion - we allocate a 4Kb blocks of zeroes, and write this until we've reached the expected file length. Note that this means there may be one extra block allocated on disk.

One thing to note, to verify this, we used [jnr-posix](https://github.com/jnr/jnr-posix). The reason behind this is we want to know the actual number of blocks on disk reserved for this file. `Files#size`, or `File#length`, return the reported file size, which is part of the file's metadata (on UNIX systems anyway). If you mmap a file with a size of 1Mb, write one byte, then flush it, the reported size will be 1Mb, but the actual size on disk will be a single block (on most modern UNIX systems anyway). By using [stat](https://linux.die.net/man/2/stat), we can get the actual file size in terms of 512-bytes allocated blocks, so we get a pretty accurate measurement of the actual disk space used by the file.

I would've like to capture this in a test utility, but since `test-util` depends on `util`, there wasn't an easy way to do this, so I just copied the method in two places. One possibility I thought of is moving the whole pre-allocation stuff in `journal`, since we only use it there. The only downside I can see there is about discovery and cohesion, but I'd like to hear your thoughts on this.

A follow-up PR will come which will optimize the pre-allocation by using the [posix_fallocate](https://man7.org/linux/man-pages/man3/posix_fallocate.3.html) on POSIX systems.

Finally, I opted for an experimental configuration option instead of a feature flag. My reasoning is that it isn't a "new" feature, but instead we want to option of disabling this (for performance reasons potentially). So it's more of an advanced option. But I'd also like to hear your thoughts here.

## Related issues

closes #6504
closes #8099
related to #7607  



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 13, 2022
9777: [Backports stable/8.0] Preallocate segment files r=npepinpe a=npepinpe

## Description

Backports #9731 to 8.0.x.

## Related issues

closes #6504
closes #8099



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
zeebe-bors-camunda bot added a commit that referenced this issue Jul 13, 2022
9778: [Backports stable/1.3] Preallocate segment files r=npepinpe a=npepinpe

## Description

Backports #9731 to 1.3.x.

## Related issues

closes #6504
closes #8099



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
github-merge-queue bot pushed a commit that referenced this issue Mar 14, 2024
* feat(feature-flagged): add batch modification footer

* feat(feature-flagged): show batch modification notification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/reliability Marks an issue as related to improving the reliability of our software (i.e. it behaves as expected) kind/bug Categorizes an issue or PR as a bug severity/mid Marks a bug as having a noticeable impact but with a known workaround version:1.3.13 version:8.1.0-alpha4 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants