-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code #6504
Comments
No real insight into why this happened. One thing to note is that this happened after: #6505 Not sure if these two are related, but there is a correlation between the two symptoms. How do memory mapped files work? Could it be that we cannot write to memory because we cannot allocate the disk space that is mapped to memory? |
That's possible - we don't pre-allocate the file but we already define the mapping length. AFAIK, as dirty pages are flushed to disk, the file is then grown accordingly. It could be that we map the file with mapping length ~128MB with less than that left which results in us running out of disk space. However I would expect the disk watermarks to be breached before that and prevent the disk from really running out of space (barring any bugs). Still, it's a likely explanation. With network storage, this can also happen if you lose access to the volume where the file was mapped - now GKE PVCs aren't common network storage like an NFS mount, so I'm not sure if that's possible there, but if so then it could be that too. Could it also happen if we deleted the file in the meantime? That also shouldn't happen barring a bug, but it could be that. I know you can get internal error if you truncate the file, but I'm not sure what happens if you just plain delete it. Would you get an IOException or an InternalError? 🤔 |
@MiguelPires let's have a look at the status of this cluster - if it's not recovered we should treat it as an incident, otherwise we can treat it as a normal bug. Please let me know when you find out. |
Status update: this cluster is gone but there's another one Edit: Immi offered to recreate the cluster so we can reproduce this. Currently, I'm working on a support ticket but I'm documenting it here in case someone else picks this up. |
We need to decide what we want to do here. My understanding of the problem is that the InternalError is thrown where normally we would get an IOException. What I'd like to investigate is if this is the only possible case, as catching InternalError could be problematic if it's not equivalent to IOException in some cases here. |
This does not happen if you just delete the file, as the file descriptor still points to the old inode. However, I can confirm this will happen if you: Write to a truncated mapped file final File f = new File("/home/nicolas/tmp/sandbox/test");
final MappedByteBuffer buffer = IoUtil.mapNewFile(f, 1024 * 1024, true);
buffer.position(512 * 1024).putLong(1L);
try (final FileChannel c = FileChannel.open(f.toPath(), StandardOpenOption.WRITE)) {
c.truncate(4096);
c.force(true);
buffer.putLong(2L);
} finally {
f.delete();
} Read from a truncated mapped file final File f = new File("/home/nicolas/tmp/sandbox/test");
final MappedByteBuffer buffer = IoUtil.mapNewFile(f, 1024 * 1024, true);
buffer.position(512 * 1024).putLong(1L);
try (final FileChannel c = FileChannel.open(f.toPath(), StandardOpenOption.WRITE)) {
c.truncate(4096);
buffer.get(5000);
} finally {
f.delete();
} Insufficient disk space to grow the underlying fileTo simulate this, we launch a Java container with a small test file that will try to write 2MB of data using a mapped byte buffer, but we only give it a 1MB disk. @Rule public final TemporaryFolder temporaryFolder = new TemporaryFolder();
@Test
public void shouldThrowInternalError() throws IOException, InterruptedException {
final File file = temporaryFolder.newFile("test.java");
final GenericContainer<?> c =
new GenericContainer<>(DockerImageName.parse("azul/zulu-openjdk-alpine:11-jre-headless"));
final Map<String, String> volumeOptions =
Map.of("type", "tmpfs", "device", "tmpfs", "o", "size=1m");
final ManagedVolume v =
ManagedVolume.newVolume(cmd -> cmd.withDriver("local").withDriverOpts(volumeOptions));
c.withCreateContainerCmdModifier(v::attachVolumeToContainer);
Files.write(file.toPath(), List.of(
"import java.io.IOException;",
"import java.nio.MappedByteBuffer;",
"import java.nio.channels.FileChannel;",
"import java.nio.file.Path;",
"import static java.nio.channels.FileChannel.MapMode.READ_ONLY;",
"import static java.nio.channels.FileChannel.MapMode.READ_WRITE;",
"import static java.nio.file.StandardOpenOption.*;",
"public final class Test {",
"public static void main(final String args[]) throws IOException {",
"MappedByteBuffer mappedByteBuffer = null;",
"try (FileChannel channel = FileChannel.open(Path.of(\"/usr/local/zeebe/data/test\"), CREATE, READ, WRITE)) {",
"mappedByteBuffer = channel.map(READ_WRITE, 0, 2 * 1024 * 1024);",
"int position = 0;",
"while (position < mappedByteBuffer.capacity()) {",
"mappedByteBuffer.put(position, (byte)0);",
"mappedByteBuffer.force();",
"position += 4096;",
"}",
"}",
"}",
"}"
));
c.withCopyFileToContainer(MountableFile.forHostPath(file.toPath()), "/test.java")
.withCreateContainerCmdModifier(cmd -> cmd.withEntrypoint("java"))
.withCommand("/test.java")
.withLogConsumer(new Slf4jLogConsumer(LoggerFactory.getLogger("test")));
c.start();
// give some time for it to compile and run the code, as well as print out the error
Thread.sleep(10_000);
c.stop();
} In my opinion we should handle these cases. I can't confirm that For now, let's focus on the OOD case, and add a comment that we assume it means OOD as we don't handle truncation. |
@deepthidevaki if you have time, could you have a look at this today or next week? Let me know if you have any questions. |
After discussion with @deepthidevaki, we think pre-allocating files is most likely a better option here. |
Options for pre-allocation:
I would opt for 1, with a fallback when |
Should be fixed by #7607 |
9731: Preallocate segment files r=npepinpe a=npepinpe ## Description This PR introduces segment file pre-allocation in the journal. This is on by default, but can be disabled via an experimental configuration option. At the moment, the pre-allocation is done in a "dumb" fashion - we allocate a 4Kb blocks of zeroes, and write this until we've reached the expected file length. Note that this means there may be one extra block allocated on disk. One thing to note, to verify this, we used [jnr-posix](https://github.com/jnr/jnr-posix). The reason behind this is we want to know the actual number of blocks on disk reserved for this file. `Files#size`, or `File#length`, return the reported file size, which is part of the file's metadata (on UNIX systems anyway). If you mmap a file with a size of 1Mb, write one byte, then flush it, the reported size will be 1Mb, but the actual size on disk will be a single block (on most modern UNIX systems anyway). By using [stat](https://linux.die.net/man/2/stat), we can get the actual file size in terms of 512-bytes allocated blocks, so we get a pretty accurate measurement of the actual disk space used by the file. I would've like to capture this in a test utility, but since `test-util` depends on `util`, there wasn't an easy way to do this, so I just copied the method in two places. One possibility I thought of is moving the whole pre-allocation stuff in `journal`, since we only use it there. The only downside I can see there is about discovery and cohesion, but I'd like to hear your thoughts on this. A follow-up PR will come which will optimize the pre-allocation by using the [posix_fallocate](https://man7.org/linux/man-pages/man3/posix_fallocate.3.html) on POSIX systems. Finally, I opted for an experimental configuration option instead of a feature flag. My reasoning is that it isn't a "new" feature, but instead we want to option of disabling this (for performance reasons potentially). So it's more of an advanced option. But I'd also like to hear your thoughts here. ## Related issues closes #6504 closes #8099 related to #7607 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
9731: Preallocate segment files r=npepinpe a=npepinpe ## Description This PR introduces segment file pre-allocation in the journal. This is on by default, but can be disabled via an experimental configuration option. At the moment, the pre-allocation is done in a "dumb" fashion - we allocate a 4Kb blocks of zeroes, and write this until we've reached the expected file length. Note that this means there may be one extra block allocated on disk. One thing to note, to verify this, we used [jnr-posix](https://github.com/jnr/jnr-posix). The reason behind this is we want to know the actual number of blocks on disk reserved for this file. `Files#size`, or `File#length`, return the reported file size, which is part of the file's metadata (on UNIX systems anyway). If you mmap a file with a size of 1Mb, write one byte, then flush it, the reported size will be 1Mb, but the actual size on disk will be a single block (on most modern UNIX systems anyway). By using [stat](https://linux.die.net/man/2/stat), we can get the actual file size in terms of 512-bytes allocated blocks, so we get a pretty accurate measurement of the actual disk space used by the file. I would've like to capture this in a test utility, but since `test-util` depends on `util`, there wasn't an easy way to do this, so I just copied the method in two places. One possibility I thought of is moving the whole pre-allocation stuff in `journal`, since we only use it there. The only downside I can see there is about discovery and cohesion, but I'd like to hear your thoughts on this. A follow-up PR will come which will optimize the pre-allocation by using the [posix_fallocate](https://man7.org/linux/man-pages/man3/posix_fallocate.3.html) on POSIX systems. Finally, I opted for an experimental configuration option instead of a feature flag. My reasoning is that it isn't a "new" feature, but instead we want to option of disabling this (for performance reasons potentially). So it's more of an advanced option. But I'd also like to hear your thoughts here. ## Related issues closes #6504 closes #8099 related to #7607 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
* feat(feature-flagged): add batch modification footer * feat(feature-flagged): show batch modification notification
Describe the bug
Observed in logs: https://console.cloud.google.com/errors/CMHr6Oj_nM3WDQ?service=zeebe&time=P7D&refresh=off&project=camunda-cloud-240911
Log/Stacktrace
Full Stacktrace
Log
Environment:
The text was updated successfully, but these errors were encountered: