arrow-csv: integer overflow panic in Reader::records::flush

# arrow-csv: integer overflow panic in `Reader::records::flush`

The cargo-fuzz `csv_reader` harness being prototyped for #5332 found a clean panic in `arrow-csv` after a few minutes of running on `main` HEAD. Filing here because it's an actual bug surfaced by the proposed fuzzing infrastructure.

## Repro

72 bytes:

```
0000  2e 22 3f 0a 31 0a 3f 3f  0a 3c 50 50 0a 3f 0a 31  |."?.1.??.<PP.?.1|
0010  0a 3f 38 0a 3c 0a 3f 0a  3c 50 50 0a 3f 0a 31 0a  |.?8.<.?.<PP.?.1.|
0020  3f 38 0a 0a 2e 22 3f 0a  31 0a 3f 3f 0a ce ce ce  |?8..."?.1.??....|
0030  b1 ce ce ce ce ce ce ce  ce 31 0a 3f 38 0a 3c 0a  |.........1.?8.<.|
0040  3f 0a 3c 0a 3f 0a 3f 69                            |?.<.?.?i|
```

(Same input feeds straight into `ReaderBuilder::new(Arc::new(schema)).with_format(format).build(Cursor::new(data))` and iterates batches; the panic happens inside the reader's flush, not in client code.)

## Panic

```
thread '<unnamed>' panicked at arrow-csv/src/reader/records.rs:207:21:
attempt to add with overflow
   …
   3: <closure> at arrow-csv/src/reader/records.rs:207:21
   4: for_each over chunks_exact_mut row offsets
   5: <closure> at arrow-csv/src/reader/records.rs:206:32
   6: …
   9: arrow_csv::reader::records::*::flush
              at arrow-csv/src/reader/records.rs:204:14
```

The site is the `row.iter_mut().for_each(|x| { *x += offset; row_offset = *x; })` accumulation:

```rust
// arrow-csv/src/reader/records.rs:201-210 (approx)
let mut row_offset = 0;
self.offsets[1..self.offsets_len]
    .chunks_exact_mut(self.num_columns)
    .for_each(|row| {
        let offset = row_offset;
        row.iter_mut().for_each(|x| {
            *x += offset;          // <-- panic here on overflow
            row_offset = *x;
        });
    });
```

`*x` and `offset` are both `usize`. With pathological CSV input the per-row offsets grow without bound and `*x + offset` wraps past `usize::MAX`. The crash is reachable from any caller of `Reader::next()` on attacker-controlled bytes, so it's a DoS surface for any service that accepts CSV through `arrow-csv` directly.

## Suggested fix sketch

Either:

- **Bound the cumulative offset by the actual data length**: each emitted offset can never legitimately exceed `self.data_len`, so an explicit cap prior to the addition turns the overflow into a clean `ParseError`.
- **Use `checked_add` on the accumulator** and surface the overflow as a `ParseError("CSV record offsets overflowed")` rather than letting it land at `panic!` in release mode (where overflow checks would otherwise be off and we'd hit silent UB / wrap-around).

I haven't sent a PR for this since I have two related parquet allocation-bound PRs (#9883, #9884) already in flight and didn't want to flood the queue. Happy to send a fix as a follow-up once those land if no one else picks it up first.

The 72-byte repro and the `arrow-csv/fuzz` harness that produced it live on my fork at [`fuzz/initial-harnesses`](https://github.com/masumi-ryugo/arrow-rs/tree/fuzz/initial-harnesses) (per #5332). The harness reproduces this in single-digit minutes from an empty corpus.

xref #5332 #9883 #9884


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arrow-csv: integer overflow panic in Reader::records::flush #9885

arrow-csv: integer overflow panic in `Reader::records::flush`

Repro

Panic

Suggested fix sketch

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

arrow-csv: integer overflow panic in Reader::records::flush #9885

Description

arrow-csv: integer overflow panic in Reader::records::flush

Repro

Panic

Suggested fix sketch

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

arrow-csv: integer overflow panic in `Reader::records::flush`