-
-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protecting against bomb-like inputs #129
Comments
Interesting bug report! Unfortunately, I don't think I agree with your framing/analysis of this issue. Firstly, without setting some kind of limit, I don't think you have any reasonable expectation that something consuming an unbounded input stream would itself not also use unbounded resources, unless it's explicitly documented. Secondly, the framing of this issue is somewhat off. Namely, if you really want to feed untrusted input data to this library, then this case is just the tip of the iceberg. I note that this program will not terminate either, and there is no reasonable expectation for the deserializer to "know" there is only one field because it never gets done parsing a single field: extern crate csv;
extern crate serde;
use std::io;
use csv::Reader;
fn main() {
let mut rdr = Reader::from_reader(io::repeat(b'0'));
rdr.deserialize::<f64>().next();
} A csv library that can execute safely on untrusted streams of data (including unbounded streams) would have a fairly different design IMO. I'm not keen on bolting something like that on to this library. In particular, how do other CSV libraries deal with untrusted data, if at all? If you want to use this library on untrusted data that can be unbounded and you don't want to use memory proportional to that stream, then I think you should be able to write your own |
Okay, that makes sense! I like your I have not done research on untrusted CSV data; I'm guessing that most CSV libraries don't even try to deal with untrusted data. Your suggestion of wrapping Would it be possible to describe what kinds of inputs are and are not safe to parse with rust-csv? E.g. is it safe to accept untrusted inputs if the line length is limited and you parse a limited number of lines? |
I do think it might be worthwhile to document this. Before doing so, I think it would be worth an audit of the code, but I am fairly certain that if you limit both the line length and the number of lines, then you should be guaranteed to only use resources proportional to the size of the limits. That is, there shouldn't be an quadratic blow up (or worse) in parsing time or memory use. |
It is possible to use a simple streaming input to make rust-csv allocate an infinite amount of memory parsing a single CSV line:
As a user, I had expected that rust-csv would know that each line is supposed to be deserialized into a single
f64
and therefore only parse a single field before returning an error.This would make it risky to use rust-csv to parse data from untrusted streaming sources, even if the user is expecting a finite number of records where each record has a finite size. I can imagine someone writing a server that parses a CSV as it is uploaded, which would be vulnerable to this kind of issue.
I originally thought of this while working on ndarray-csv; it's pretty easy to deal with arrays that have too many rows, but I'm not sure what to do about arrays with too many columns.
By the way, the performance of this is awesome! I am able to fill up about a gigabyte of memory per second on my laptop! 🤣
The text was updated successfully, but these errors were encountered: