-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[C] Fix negative block size validation in datafile reader #3623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[C] Fix negative block size validation in datafile reader #3623
Conversation
The file_read_block_count() function in datafile.c reads block size
using zigzag encoding, which can produce negative numbers from
malicious Avro container files. These negative values were passed
directly to avro_malloc(), causing allocation failures.
This patch adds validation to reject negative block size values with
a clear error message before attempting memory allocation.
Bug: Negative block size from varint decoding causes
allocation-size-too-big when cast to size_t
Impact: DoS via crafted .avro file
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds validation to prevent crashes from malformed Avro container files that contain negative block size values. The fix addresses a security vulnerability where negative values from zigzag-encoded data could be passed to memory allocation functions, causing allocation errors or undefined behavior.
Changes:
- Added validation in
file_read_block_count()to reject negative block size values - Returns
EINVALwith a descriptive error message when invalid block size is detected - Prevents allocation-size-too-big errors from AddressSanitizer
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| check_prefix(rval, rval, | ||
| "Cannot read file block count: "); |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block count (blocks_total) read from the file should also be validated to ensure it's not negative, similar to the block size validation added below. A negative block count could lead to incorrect behavior in the read logic since blocks_read (which starts at 0 and only increments) would never equal a negative blocks_total, potentially preventing proper block reading.
| "Cannot read file block count: "); | |
| "Cannot read file block count: "); | |
| if (r->blocks_total < 0) { | |
| avro_set_error("Invalid block count: %" PRId64, r->blocks_total); | |
| return EINVAL; | |
| } |
| if (len < 0) { | ||
| avro_set_error("Invalid block size: %" PRId64, len); | ||
| return EINVAL; | ||
| } |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a regression test that verifies the negative block size validation. While the fix was verified with AddressSanitizer fuzzing, a unit test with a malformed Avro file containing a negative block size would help prevent regressions. This could follow the pattern of other test files like test_avro_1237.c which test handling of malformed Avro files.
|
Thanks for the reply! Here is the context of the crash we found: DescriptionNegative block size in Avro container file (OCF) causes Version
Steps to ReproduceMethod 1: Using avrocat (Easiest)Step 1: Build Avro C library with AddressSanitizer: git clone https://github.com/apache/avro.git
cd avro/lang/c
mkdir build && cd build
cmake .. \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_C_FLAGS="-fsanitize=address -g -O1" \
-DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address"
make -j$(nproc)Step 2: Create the malicious Avro container file (83 bytes): # One-liner to create poc.avro
echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avroStep 3: Trigger the crash: ./src/avrocat poc.avroMethod 2: Using the fuzzerStep 1: Build Avro C library with AddressSanitizer (same as above). Step 2: Save the fuzzer code below as Step 3: Build the fuzzer: clang -g -O1 -fsanitize=address,fuzzer \
-I../src \
datafile_fuzzer.c \
-L./src -lavro \
-Wl,-rpath,./src \
-o datafile_fuzzerStep 4: Create PoC and run: echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avro
./datafile_fuzzer poc.avroExpected BehaviorThe Avro C library should validate that block size is non-negative and return an error for malformed files. Actual BehaviorRoot Cause AnalysisIn static int file_read_block_count(avro_file_reader_t r)
{
int64_t len;
...
check_prefix(rval, enc->read_long(r->reader, &len),
"Cannot read file block size: ");
if (!r->current_blockdata) {
r->current_blockdata = (char *) avro_malloc(len); // BUG: len can be negative!
...
}
}Control flow for crash file:
File structure of PoC: Impact
Attack vectors:
Suggested Fix
Add validation for negative block size in static int file_read_block_count(avro_file_reader_t r)
{
int64_t len;
...
check_prefix(rval, enc->read_long(r->reader, &len),
"Cannot read file block size: ");
if (len < 0) {
avro_set_error("Invalid block size: %" PRId64, len);
return EINVAL;
}
if (!r->current_blockdata) {
r->current_blockdata = (char *) avro_malloc(len);
...
}
}FuzzerThis issue was discovered using a custom Avro datafile fuzzer: /*
* Copyright 2026 O2Lab @ Texas A&M University
*
* Fuzzer for Avro C DataFile Reader
* Target: avro_file_reader_fp() and avro_file_reader_read_value()
*/
#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <avro.h>
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size < 4) {
return 0;
}
/* Write fuzz data to a temporary file */
char template[] = "/tmp/avro_fuzz_XXXXXX";
int fd = mkstemp(template);
if (fd < 0) {
return 0;
}
ssize_t written = write(fd, data, size);
if (written != (ssize_t)size) {
close(fd);
unlink(template);
return 0;
}
lseek(fd, 0, SEEK_SET);
FILE *fp = fdopen(fd, "rb");
if (fp == NULL) {
close(fd);
unlink(template);
return 0;
}
avro_file_reader_t reader = NULL;
avro_value_iface_t *iface = NULL;
avro_value_t value;
int rc;
rc = avro_file_reader_fp(fp, template, 0, &reader);
if (rc != 0 || reader == NULL) {
fclose(fp);
unlink(template);
return 0;
}
avro_schema_t schema = avro_file_reader_get_writer_schema(reader);
if (schema == NULL) {
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
}
iface = avro_generic_class_from_schema(schema);
if (iface == NULL) {
avro_schema_decref(schema);
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
}
memset(&value, 0, sizeof(value));
rc = avro_generic_value_new(iface, &value);
if (rc != 0) {
avro_value_iface_decref(iface);
avro_schema_decref(schema);
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
}
/* Read up to 100 values */
for (int i = 0; i < 100; i++) {
rc = avro_file_reader_read_value(reader, &value);
if (rc != 0) {
break;
}
avro_value_reset(&value);
}
avro_value_decref(&value);
avro_value_iface_decref(iface);
avro_schema_decref(schema);
avro_file_reader_close(reader);
fclose(fp);
unlink(template);
return 0;
} |
| "Cannot read file block count: "); | ||
| check_prefix(rval, enc->read_long(r->reader, &len), | ||
| "Cannot read file block size: "); | ||
| if (len < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is len == 0 OK here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the read value is positive but again a really big one, like INT64_MAX ?
| } | ||
|
|
||
| if (r->current_blockdata && len > r->current_blocklen) { | ||
| r->current_blockdata = (char *) avro_realloc(r->current_blockdata, r->current_blocklen, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an old code but shouldn't it check that the allocated char* is non-NULL before assigning it to r->current_blockdata ?
Summary
Fix missing validation for negative block size values in
file_read_block_count()function indatafile.c.Problem
The block size is read using zigzag encoding which can decode to negative numbers from malicious Avro container files. These negative values were passed directly to
avro_malloc(), causing:int64_tis cast tosize_t.avrofilesChanges
len < 0check infile_read_block_count()before allocationEINVALwith descriptive error message on invalid inputTesting
Verified with AddressSanitizer fuzzing - crash no longer reproduces.
Generated with Claude Code