Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the size limits necessary? #11

Closed
bitonic opened this issue Jan 7, 2015 · 22 comments
Closed

Are the size limits necessary? #11

bitonic opened this issue Jan 7, 2015 · 22 comments

Comments

@bitonic
Copy link
Contributor

bitonic commented Jan 7, 2015

I was about to update the library to the latest version of rustc-serialize (now with associated types), and I noticed the last commit.

Is it necessary, considering that the size limit can be easily enforce through the writer -- e.g. with a BufWriter?

@bitonic
Copy link
Contributor Author

bitonic commented Jan 7, 2015

And this is not really an issue, but I thought it was the best place to discuss it :)

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 7, 2015

This is totally the best place to discuss it. I think that having size limits built into encoding and decoding is important mostly to preserve writing and reading speed.

While you can use BufWriter, I don't want people to have to go through that when they want size limits (which should be almost always).

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 7, 2015

The reason that size limits are important is because they can be used in a denial of service attack if you aren't in complete control of both endpoints.

Imagine a game where players send a Vec of movements to the server. A hacked client could say that the Vec contains 4,000,000,000 elements, and then just flood the server with data; The server will have to store the data during deserialization, and it will run out of memory and explode.

There needs to be a way for the server to bail out if it reads bad data and the way that I think is best for this project is to have limits on the max-size of messages. This is why DecoderReader has max-size limits. The reason that EncoderWriter has the limit as well is so that you can catch yourself if you try to write messages that are too big and hopefully catch that message earlier.

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 7, 2015

I hope that reason makes sense, and thanks for your effort in keeping the library up to date!

@bitonic
Copy link
Contributor Author

bitonic commented Jan 7, 2015

Size limit are important, no doubt. But imo the natural way to enforce that is to create a static buffer for your encoder and work with that -- e.g. with BufWriter.

Although thinking about it, I can very well imagine a streaming reader with hard limits. So maybe it's not a bad idea :).

@bitonic bitonic closed this as completed Jan 7, 2015
@TyOverby
Copy link
Collaborator

TyOverby commented Jan 7, 2015

Yeah, most of the places that I use bincode are with TCP and UDP sockets where I can't afford to write to a temporary buffer. Also, since you should really be using size limits in like 90% of use cases, I think this is a better API than requiring the user to explicitly wrap the API.

@bitonic
Copy link
Contributor Author

bitonic commented Jan 7, 2015

Well, I'm using it with UDP as well and I just have a static buffer -- which is reasonably small since UDP packets can't be big anyhow -- and I use that to encode and then send and to recv and then decode, with the minimum amount of copying involved.

But I agree that you almost always want hard limits.

@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

I thought a bit more about this, and I still think it's wrong to pollute the interface and the implementation with the limits.

I think the right way to go is to defined two new types

struct BoundedReader<R> { ... }

impl<R> BoundedReader<R> {
  fn new(r: R, limit: uint) -> BoundedReader<R> { .. }
}

impl<R: Reader> Reader for BoundedReader<R> { .. }

struct BoundedWriter<R> { ... }

impl<R> BoundedWriter<R> {
  fn new(r: R, limit: uint) -> BoundedWriter<R> { .. }
}

impl<R: Writer> Writer for BoundedWriter<R> { ... }

And provide it to the user to achieve the hard limits. We have an abstraction that lets us do this and we should use it, instead of mixing concerns.

@bitonic bitonic reopened this Jan 8, 2015
@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

Actually, this already exists: http://doc.rust-lang.org/std/io/util/struct.LimitReader.html . Puzzling just for Reader!

@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

Thinking about it a bit more, it's not that surprising: while the size of the thing to encode comes from the application, there is no need to bound it. In fact for example in most of the HTTP servers or frameworks the size of the incoming body is usually bounded, but when you write out the response body the size is just calculated.

That said, using LimitReader is a bit annoying because it has the buffer by value in the the struct. In any case, I think the best option is to revert the bincode API as it was, and advise the user to use LimitReader if the size of thing to decode is a concern.

bitonic added a commit that referenced this issue Jan 8, 2015
@TyOverby
Copy link
Collaborator

TyOverby commented Jan 8, 2015

I thought about doing this too, but LimitReader doesn't distinguish between an actual EOF and an early termination due to a limit being reached, which I think is important for error reporting. How awful would it be for a user to think that their TCP code was broken, when in fact they were just sending large messages?

I also still think that having the limits in the basic bincode methods is a good idea.

@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

I don't think it's that awful -- it's the same effect you get if you were using a BufWriter. But I can see that having it as a decoding error is nicer.

What I really don't like about backing that functionality in is that the decoder does not need to concern itself with that, since the Reader abstraction is capable of doing so.

In any case, do you agree that we do not need it for encoding?

@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

To expand on why is it not awful: in practice, a user of the decoding interface is going to have to deal with EndOfFile errors anyway, too big messages being another instance of that.

If anything, having another error type forces the user to write the same code twice.

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 8, 2015

Yeah, having it as a DecoderError makes it obvious that the error is not an IoError which would indicate to me that there is a problem with the underlying IO target (file, network, etc...).

While it would be nice just to plug in a LimitReader and not have to duplicate the functionality, using it wouldn't let us have the limit error be a DecoderError. As far as I'm concerned, having the limit be an IoError is a bad choice for the user.

To expand on why is it not awful: in practice, a user of the decoding interface is going to have to deal with EndOfFile errors anyway, too big messages being another instance of that.
If anything, having another error type forces the user to write the same code twice.

Good, they should be writing that part twice. The IoDevice returning EOF and the limit being hit are two completely different errors and should mean completely different things to the user of this library.

In any case, do you agree that we do not need it for encoding?

Having it for encoding isn't strictly speaking necessary, but for the person writing the encoding, knowing that messages will not hit the limits that are in place on the decoding side is a good thing ™️

@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

I guess it comes down to the fact that I think that the simpler interface and implementation is vastly preferrable to the (one) improvement of baking the limit in: you're able to distinguish between the provenance of EOFs.

For what concerns the static limits when writing, I don't think it's needed, but it's a pretty orthogonal issue.

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 8, 2015

I don't consider simpler implementation to be a benefit; if this library was 100k LOC but served the users better, I'd consider that a plus. I also don't think that a 2-arg method vs a 1-arg method is worth it when we can make safety an easy default and get better errors to the user.

If Rust had optional arguments or function overloading, I'd use them in the writer function. As it is, I don't think that passing SizeLimit::Infinite to the writer to signify that you don't care is much of a hassle.

@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

It's not a matter of size, it's a matter of going from a code base which is blindingly obvious (which is great) to one that isn't, uses subtle tricks to get the size of things -- and does so in every single place instead that only in the read method, as it is in LimitReader, etc.

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 8, 2015

Using the sizeof function is a subtle trick?

@bitonic
Copy link
Contributor Author

bitonic commented Jan 8, 2015

Yeah, it is like that. You need to make sure you're passing the right type parameter to the function -- it won't complain if you don't -- you're using an unsafe, and you're invoking it in every instance.

This is opposed to a simple check in a single place, that will automatically make sure that the limit is respected: https://github.com/rust-lang/rust/blob/master/src/libstd/io/util.rs#L44

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 8, 2015

Making sure that I'm passing the right arguments to functions basically 100% of what this library is. I've made mistakes in other parts of this lib example1 example2 and I'm already writing a commit that fixes a bug in checking the size of char reading. I'm much more worried about making an API that is easy to use correctly.

I'll change std::intrinsics::size_of to std::mem::size_of. mem::size_of is not unsafe.

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 8, 2015

To be more clear, my main points are this:

  1. Having the size limit as part of the API of the library is a good thing because it forces users to think about their safety and make an informed decision early on. If they don't care about the sizes, then SizeLimit::Infinite isn't much typing and is also very explicit.
  2. Implementing the size limits by internally wrapping the reader in a LimitReader is bad because the user should know whether the error originates with the Reader that they passed in, or with the size limiting.

@bitonic
Copy link
Contributor Author

bitonic commented Jan 9, 2015

I'm sympathetic to point 1, but it's not really related to the issue here because it does not require the size checking to be in the decoding code -- we could simply provide a utility function that wraps the reader.

Point 2 is the point of debate, and I guess we'll have to agree to disagree -- you think it's worth it, I think it isn't :).

@bitonic bitonic closed this as completed Jan 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants