Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No size limit #22

Open
MattMenke2 opened this issue Feb 22, 2021 · 10 comments
Open

No size limit #22

MattMenke2 opened this issue Feb 22, 2021 · 10 comments

Comments

@MattMenke2
Copy link

The algorithm doesn't seem give a size limit on how much body is loaded into memory to try and parse as JSON. Caching arbitrary amounts of data in memory to run through a parser, particularly outside of the renderer process, seems not great, particularly when we're not sure if the data is JSON or JS or not.

Admittedly, I don't have full context here, so I may be missing something.

@annevk
Copy link
Owner

annevk commented Feb 22, 2021

If we can agree on a limit I agree that would be good.

cc @chihweitung

@MattMenke2
Copy link
Author

Also, relatedly, it would be good to decide what to do if the limit is exceeded (Just pass the body along anyways, I assume, rather than fail the request / eat the body?)

@annevk
Copy link
Owner

annevk commented Feb 22, 2021

We would need some telemetry, but ideally we would fail in that case. Sites can specify Content-Type: text/javascript correctly if they need to fetch JavaScript that big. Otherwise we would only protect smaller resources from entering the attacker process.

@chihweitung
Copy link

If we want to collect the telemetry for the sizes of responses, we need to be careful since it may cause fingerprinting issues.

@anforowicz
Copy link
Collaborator

I am not sure if it helps with fingerprinting concerns, but I want to observe that we don't necessarily need to track sizes accurate to a every single byte. Counting responses falling into broad, logarithmic buckets should be sufficient (e.g. 256-511 kB, 512-1023 kB, 1024-2047 kB, 2048-4095 kB, etc.).

@anforowicz
Copy link
Collaborator

If the parser needs to look at the whole response, then I agree with #22 (comment) that the secure fallback would be to block responses bigger than X MB. OTOH, maybe there is some wiggle room here to mitigate some of the risk from the initial launch of ORB. Some (half-baked) ideas:

  • Start with passing through responses > X MB, but blocking responses > Y MB. Start with Y = infinity, and later ratchet down until Y = X.
  • Consider different fallback decisions depending on the MIME type (text/html seems more security sensitive / we may want to block for parity with CORB; OTOH maybe it is less problematic to initially allow through things like application/octet-stream).
  • I haven't yet looked into how Javascript parsing would work, but I wonder if the parser can report status for partial input. If there are no Javascript parsing/syntax errors in the first X MB, then maybe it is okay to pass the response body to the renderer process?

/cc @csreis

@annevk
Copy link
Owner

annevk commented May 17, 2021

I don't think it's possible to get a reliable answer from the JavaScript parser for partial input in a way that can be standardized without a ton effort, but perhaps it's good to get a second opinion on that. @syg @littledan thoughts? In particular, how realistic is it to determine if a part of a file (say the first 4 kibibytes) is JavaScript and get that standardized?

@syg
Copy link

syg commented May 17, 2021

Hm, my hunch is that it's kind of hard at the spec level. For the encoding, are you imagining this would require a particular encoding? UTF8, 16? What if the partial payload falls on a surrogate pair boundary, for example?

For parsing, one problem is that the specification is specified with "cover grammars", where an ambiguous production is parsed into a tree (e.g. is it the start of an arrow function or is it a comma expression?) that's disambiguated (from the spec's perspective) after the whole tree has been parsed. Not sure how we would refactor the grammar so there's more useful information for partial input if we run into a cover grammar.

Additionally, what kind of implementation latitude are you looking for for error timing? If it's acceptable that an implementation might report certain class of early errors only after the entire file is parsed, even if they can be detected earlier, that'd probably be much easier to spec (and implement).

@annevk
Copy link
Owner

annevk commented May 17, 2021

Decoding is tracked by #7 and I think I have a solution that works. And yeah, those are the kind of problems I heard about when I asked others about this. What we'd basically want is a function that takes x kibibytes of JavaScript and returns probably okay/not okay. The x kibibytes are known to maybe EOF at an unusual point because it's (sometimes) the start of a file. For cases where you'd need the entire file to determine if there's an error, we would make a judgment call of sorts. As we'd have to decide whether to allow/deny based on the x kibibytes alone.

(The alternative is to always parse the whole file or some kind of hybrid approach as @anforowicz outlines above. That's the plan of record.)

@annevk
Copy link
Owner

annevk commented May 17, 2022

Unless there's credible alternative to looking at the whole response, I suggest we mark this as something to look into in the future. It's not clear to me it has to block the mvp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants