Skip to content

Conversation

@OwenSanzas
Copy link

Summary

Fix missing validation for negative block size values in file_read_block_count() function in datafile.c.

Problem

The block size is read using zigzag encoding which can decode to negative numbers from malicious Avro container files. These negative values were passed directly to avro_malloc(), causing:

  • allocation-size-too-big when negative int64_t is cast to size_t
  • Potential crash or undefined behavior when opening malformed .avro files

Changes

  • Add len < 0 check in file_read_block_count() before allocation
  • Return EINVAL with descriptive error message on invalid input

Testing

Verified with AddressSanitizer fuzzing - crash no longer reproduces.

# Before fix:
==ERROR: AddressSanitizer: requested allocation size 0xffffffffffffffff
    #1 in file_read_block_count datafile.c:459

# After fix:
No crash - invalid input rejected with error

Generated with Claude Code

The file_read_block_count() function in datafile.c reads block size
using zigzag encoding, which can produce negative numbers from
malicious Avro container files. These negative values were passed
directly to avro_malloc(), causing allocation failures.

This patch adds validation to reject negative block size values with
a clear error message before attempting memory allocation.

Bug: Negative block size from varint decoding causes
     allocation-size-too-big when cast to size_t

Impact: DoS via crafted .avro file

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds validation to prevent crashes from malformed Avro container files that contain negative block size values. The fix addresses a security vulnerability where negative values from zigzag-encoded data could be passed to memory allocation functions, causing allocation errors or undefined behavior.

Changes:

  • Added validation in file_read_block_count() to reject negative block size values
  • Returns EINVAL with a descriptive error message when invalid block size is detected
  • Prevents allocation-size-too-big errors from AddressSanitizer

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

check_prefix(rval, rval,
"Cannot read file block count: ");
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block count (blocks_total) read from the file should also be validated to ensure it's not negative, similar to the block size validation added below. A negative block count could lead to incorrect behavior in the read logic since blocks_read (which starts at 0 and only increments) would never equal a negative blocks_total, potentially preventing proper block reading.

Suggested change
"Cannot read file block count: ");
"Cannot read file block count: ");
if (r->blocks_total < 0) {
avro_set_error("Invalid block count: %" PRId64, r->blocks_total);
return EINVAL;
}

Copilot uses AI. Check for mistakes.
Comment on lines +454 to +457
if (len < 0) {
avro_set_error("Invalid block size: %" PRId64, len);
return EINVAL;
}
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a regression test that verifies the negative block size validation. While the fix was verified with AddressSanitizer fuzzing, a unit test with a malformed Avro file containing a negative block size would help prevent regressions. This could follow the pattern of other test files like test_avro_1237.c which test handling of malformed Avro files.

Copilot uses AI. Check for mistakes.
@OwenSanzas
Copy link
Author

Thanks for the reply! Here is the context of the crash we found:

Description

Negative block size in Avro container file (OCF) causes allocation-size-too-big crash in file_read_block_count(). The function reads block size using varint/zigzag encoding which can represent negative numbers, but does not validate before passing to avro_malloc().

Version

  • Apache Avro C library
  • Tested on: current main branch
  • Commit: HEAD

Steps to Reproduce

Method 1: Using avrocat (Easiest)

Step 1: Build Avro C library with AddressSanitizer:

git clone https://github.com/apache/avro.git
cd avro/lang/c
mkdir build && cd build
cmake .. \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_C_FLAGS="-fsanitize=address -g -O1" \
    -DCMAKE_EXE_LINKER_FLAGS="-fsanitize=address"
make -j$(nproc)

Step 2: Create the malicious Avro container file (83 bytes):

# One-liner to create poc.avro
echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avro

Step 3: Trigger the crash:

./src/avrocat poc.avro

Method 2: Using the fuzzer

Step 1: Build Avro C library with AddressSanitizer (same as above).

Step 2: Save the fuzzer code below as datafile_fuzzer.c.

Step 3: Build the fuzzer:

clang -g -O1 -fsanitize=address,fuzzer \
    -I../src \
    datafile_fuzzer.c \
    -L./src -lavro \
    -Wl,-rpath,./src \
    -o datafile_fuzzer

Step 4: Create PoC and run:

echo '4f626a0104166176726f2e736368656d611e7b2274797065223a226e756c6c227d146176726f2e636f646563086e756c6c000102030405060708090a0b0c0d0e0f10000102030405060708090a0b0c0d0e0f10' | xxd -r -p > poc.avro

./datafile_fuzzer poc.avro

Expected Behavior

The Avro C library should validate that block size is non-negative and return an error for malformed files.

Actual Behavior

==PID==ERROR: AddressSanitizer: requested allocation size 0xffffffffffffffff (0x800 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0)
    #0 0x... in realloc (...)
    #1 0x... in file_read_block_count /path/to/avro/lang/c/src/datafile.c:459:35
    #2 0x... in avro_file_reader_fp /path/to/avro/lang/c/src/datafile.c:529:9
    ...

SUMMARY: AddressSanitizer: allocation-size-too-big ... in realloc

Root Cause Analysis

In lang/c/src/datafile.c:452-459, the file_read_block_count() function reads block size using zigzag-encoded varint:

static int file_read_block_count(avro_file_reader_t r)
{
    int64_t len;
    ...
    check_prefix(rval, enc->read_long(r->reader, &len),
             "Cannot read file block size: ");

    if (!r->current_blockdata) {
        r->current_blockdata = (char *) avro_malloc(len);  // BUG: len can be negative!
        ...
    }
}

Control flow for crash file:

  1. File header parsed successfully (valid "Obj\x01" magic, schema, codec, sync marker)
  2. file_read_block_count() called at line 529
  3. read_long() reads block_count from byte 0x00 at offset 0x42 → decoded = 0
  4. read_long() reads block_size from byte 0x01 at offset 0x43
  5. Zigzag decode: (1 >> 1) ^ -(1 & 1) = 0 ^ -1 = -1
  6. avro_malloc(-1)avro_malloc(0xFFFFFFFFFFFFFFFF) → CRASH

File structure of PoC:

Offset  Bytes           Description
------  -----           -----------
00-03:  4f 62 6a 01     Magic "Obj\x01"
04-31:  ...             Metadata map (schema={"type":"null"}, codec=null)
32-41:  ...             Sync marker (16 bytes)
42:     00              block_count varint (decoded = 0)
43:     01              block_size varint (zigzag decode = -1) <- TRIGGER

Impact

  • DoS: Application crash via allocation failure
  • CWE-789: Memory Allocation with Excessive Size Value
  • Affected: Any application using Avro C library to read untrusted .avro files

Attack vectors:

  • Data analytics platforms accepting user uploads
  • ETL pipelines processing external data
  • Message queue consumers (Kafka with Avro)
  • Any service that reads Avro container files

Suggested Fix

Note: This is a quick fix for this specific vulnerability. A comprehensive audit of all read_long() call sites that use decoded values for memory allocation is recommended, as similar issues may exist elsewhere in the codebase.

Add validation for negative block size in file_read_block_count():

static int file_read_block_count(avro_file_reader_t r)
{
    int64_t len;
    ...
    check_prefix(rval, enc->read_long(r->reader, &len),
             "Cannot read file block size: ");

    if (len < 0) {
        avro_set_error("Invalid block size: %" PRId64, len);
        return EINVAL;
    }

    if (!r->current_blockdata) {
        r->current_blockdata = (char *) avro_malloc(len);
        ...
    }
}

Fuzzer

This issue was discovered using a custom Avro datafile fuzzer:

/*
 * Copyright 2026 O2Lab @ Texas A&M University
 *
 * Fuzzer for Avro C DataFile Reader
 * Target: avro_file_reader_fp() and avro_file_reader_read_value()
 */

#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <avro.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 4) {
        return 0;
    }

    /* Write fuzz data to a temporary file */
    char template[] = "/tmp/avro_fuzz_XXXXXX";
    int fd = mkstemp(template);
    if (fd < 0) {
        return 0;
    }

    ssize_t written = write(fd, data, size);
    if (written != (ssize_t)size) {
        close(fd);
        unlink(template);
        return 0;
    }

    lseek(fd, 0, SEEK_SET);

    FILE *fp = fdopen(fd, "rb");
    if (fp == NULL) {
        close(fd);
        unlink(template);
        return 0;
    }

    avro_file_reader_t reader = NULL;
    avro_value_iface_t *iface = NULL;
    avro_value_t value;
    int rc;

    rc = avro_file_reader_fp(fp, template, 0, &reader);
    if (rc != 0 || reader == NULL) {
        fclose(fp);
        unlink(template);
        return 0;
    }

    avro_schema_t schema = avro_file_reader_get_writer_schema(reader);
    if (schema == NULL) {
        avro_file_reader_close(reader);
        fclose(fp);
        unlink(template);
        return 0;
    }

    iface = avro_generic_class_from_schema(schema);
    if (iface == NULL) {
        avro_schema_decref(schema);
        avro_file_reader_close(reader);
        fclose(fp);
        unlink(template);
        return 0;
    }

    memset(&value, 0, sizeof(value));
    rc = avro_generic_value_new(iface, &value);
    if (rc != 0) {
        avro_value_iface_decref(iface);
        avro_schema_decref(schema);
        avro_file_reader_close(reader);
        fclose(fp);
        unlink(template);
        return 0;
    }

    /* Read up to 100 values */
    for (int i = 0; i < 100; i++) {
        rc = avro_file_reader_read_value(reader, &value);
        if (rc != 0) {
            break;
        }
        avro_value_reset(&value);
    }

    avro_value_decref(&value);
    avro_value_iface_decref(iface);
    avro_schema_decref(schema);
    avro_file_reader_close(reader);
    fclose(fp);
    unlink(template);

    return 0;
}

"Cannot read file block count: ");
check_prefix(rval, enc->read_long(r->reader, &len),
"Cannot read file block size: ");
if (len < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is len == 0 OK here ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the read value is positive but again a really big one, like INT64_MAX ?

}

if (r->current_blockdata && len > r->current_blocklen) {
r->current_blockdata = (char *) avro_realloc(r->current_blockdata, r->current_blocklen, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old code but shouldn't it check that the allocated char* is non-NULL before assigning it to r->current_blockdata ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants