Skip to content

Protecting against bomb-like inputs #129

@paulkernfeld

Description

@paulkernfeld

It is possible to use a simple streaming input to make rust-csv allocate an infinite amount of memory parsing a single CSV line:

extern crate csv;
extern crate serde;

#[cfg(test)]
mod tests {
    use csv::Reader;
    use std::io::repeat;

    #[test]
    fn test_csv_bomb() {
        let mut rdr = Reader::from_reader(repeat(b','));

        // This line runs forever and keeps allocating memory
        rdr.deserialize::<f64>().next();
    }
}

As a user, I had expected that rust-csv would know that each line is supposed to be deserialized into a single f64 and therefore only parse a single field before returning an error.

This would make it risky to use rust-csv to parse data from untrusted streaming sources, even if the user is expecting a finite number of records where each record has a finite size. I can imagine someone writing a server that parses a CSV as it is uploaded, which would be vulnerable to this kind of issue.

I originally thought of this while working on ndarray-csv; it's pretty easy to deal with arrays that have too many rows, but I'm not sure what to do about arrays with too many columns.

By the way, the performance of this is awesome! I am able to fill up about a gigabyte of memory per second on my laptop! 🤣

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions