Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report consumed ammount of bytes in on_parse_complete (or limit the JSON max size) #47

Open
ibc opened this issue Mar 4, 2011 · 4 comments

Comments

@ibc
Copy link

ibc commented Mar 4, 2011

I'm coding a JSONRPC server in EventMachine using yajl-ruby. It already works (similar as the provided in README documentation). However there is something I cannot figure how to achieve: avoid DoS attacks.

This is, imagine an attacker initially sends:

{ "method": "attack

This is processed in EM receive_data(data) in which we run @parser << data.

After a while (using same connection) the attacker sends:

lalalalala

It's again processed and appended to the parser. The attacker sends lalala continually (in different TCP packets, which would cause a leak in the server.

A half-solution could be counting data.bytesize and close connection if it reachs some limit. But this is a wrong solution: in case the same TCP data contains a valid JSON data followed by valid but non complete JSON data, we have no way to know when to reset the data size counter.

Two possible solutions I suggest:

Option :maxsize in Yajl::Parser.

@parser = Yajl::Parser.new(:maxsize => 5000)

In case the accumulated buffer in the Parser object exceeds such limit, the parser would raise some exception as Yajl::MaxSizeExceeded.

Report consumed bytes when calling on_parse_complete method

@parser.on_parse_complete = method(:data_parsed)

def data_parsed(obj, consumed_bytes)
  [...]
end

Getting this value, we can substract it from the above data size counter, so it would contain just the size of the incomplete data the parser contains. Of course, first solution seems much easier.

@brianmario
Copy link
Owner

Yajl doesn't actually buffer anything while parsing, it just fires callbacks with pointers into the actual stream being parsed - once the entire length of the input buffer has been parsed, it just waits for the next round of parsing (keeping track of it's parsing state). But none of the actual input buffer is stored anywhere inside Yajl itself.
EventMachine will give you packets that are a max of 16kb at a time (hard-coded in EM itself) so after each round of parsing, the Ruby string EM gave you would be thrown away on the next GC run.

That being said, someone could just continually send data infinitely and parsing would never complete, though this shouldn't eat up any more ram than normal on the server.

I'm not completely opposed to the idea, but I think being a streaming parser sort of gets around these types of things by design? I've had cases of people parsing massive JSON documents from disk, so if an option like this were to be added I'm not sure it should be enabled by default.

@ibc
Copy link
Author

ibc commented Mar 4, 2011

Thanks for your fast response, but I cannot understand:

In my above example an attacker sends first a TCP packet with data = {"method": "attack. Such data is passed to Yajl parser which doesn't fire the on_parse_complete method as parsing is not completed.

Obviously this data must be stored somewhere within Yajl-ruby or the yajl C library. The variable data is of course GC'd but its content must be stored somehow in yajl. If not, how will yalj create the Hash later when remaining data arrives?

So the attacker sends a new TCP packet (or UDP or unixSocket...) containing lalala string, which is given to Yajl parser. Obviously Yajl must store such data as it's part of an incomplete string (the method name). If the attacker sends more and more data without finishing the JSON object the server memory would increase.

Please correct me if I'm wrong, but for sure such situation would leak the server, am I right?

In the other side, is the option :maxsize is implemented it could be optional (just is taken into account when present in new method). So by default it would be dissabled (IMHO).

Thanks a lot.

@brianmario
Copy link
Owner

Oh right I get what you're saying now, sorry :P (been a long week)

I think I like the :maxsize solution the best, and it should be fairly easy to implement. I may not have time to work on it until I finish up another project I'm on, but will try to soon.

@ibc
Copy link
Author

ibc commented Mar 4, 2011

Really thanks a lot for you very good work ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants