Improve logic for media elements #16

annevk · 2021-01-15T13:36:24Z

Closes #4.

README.md

jakearchibald · 2021-01-15T16:21:38Z

Here are the attacks I came up with for partial responses and media elements whatwg/fetch#144 (comment). In summary:

A media element cannot mix opaque and non-opaque data, otherwise you may be able to learn something about the opaque data.
A media element cannot mix opaque data from different places, otherwise you could learn something about unknown opaque data by mixing it with known opaque data.
The content-range must start at the same offset as the requested range, although a 200 or 0- response is ok as long as it's treated as a restart.
Responses shouldn't be combined if it looks like the underlying data has changed. This prevents old cached data being mixed with new data.

I think a media element needs a valid opaque response url, which is the URL it'll accept opaque data from. It can be null (unset), "none", or a URL.

When the media element performs its first fetch for a particular piece of media, if the response is not-opaque, the valid opaque response url becomes "none", otherwise it becomes the final response URL.

Then, subsequent opaque responses must match the valid opaque response url.

I can't decide if it's ok to use two opaque responses that have the same final response url, even if they used a different set of redirects to get there. If it isn't ok, then the valid opaque response url needs to be a list of URLs, which must match the response url list exactly.

https://wicg.github.io/background-fetch/#validate-partial-response-algorithm has some basic validation of range-requested vs range given. Basically, if the response content-range start point is different to the range request, it should be considered an error. It also checks that content-length and/or etags haven't changed, as this suggests the underlying content has changed.

If the media element is reinitialized (eg, the src is changed) then the valid opaque response url is reset to null.

annevk · 2021-01-18T16:07:21Z

Thanks, making those changes will help a lot I think. We can also put this valid opaque response URL field on the request, allowing us to drop the side table entirely from this algorithm. A problem here is getting implementations aligned on their media element fetching logic as that currently seems to be a mess. Some stuff I read on this (and resources linked from there):

Closes #4.

README.md

annevk · 2021-10-04T10:07:49Z

I updated this and would appreciate another look from @anforowicz and @jakearchibald. I think it would be good to land this as it clarifies a number of things.

README.md

anforowicz

AFAIU the proposal here changes how the are-206-reponses-allowed state is handled. Before the changes the state was 1a) stored globally and 1b) stored a set of initiator/media-url pairs. After the changes the state is 2a) stored per request and 2b) only stores 3 states of a request.

In Chrome implementation a request (e.g. network::ResourceRequest, or network::URLLoader or net::URLRequest) is transient - the request goes away after getting a response (200, 206, or some other response kind). Therefore it doesn't seem possible to inspect a previous request's state when deciding whether to block the next request. Maybe I don't understand what is meant by "a request" (e.g. in "a request has an associated no-cors media URL") in the proposed changes.

FWIW, I've just started trying to implement ORB, and for the most part the "before" state of the spec looks good to me. One possible change / clarification is that in Chrome the state (the set of initiator/media-url pairs) would be stored in network::URLLoaderFactory (roughly per HTML document / per script execution context). Such state lifetime would be slightly tighter than what the spec says today (global / per user-agent state) although theoretically the lifetime could be even tighter (per media element).

annevk · 2021-10-05T10:08:47Z

@anforowicz this PR makes it essentially per media element. The media element keeps track of its requests. The initial request it does is marked with "initial-request" and subsequent are marked with the URL (though as I commented in my own review comment that's not strictly necessary).

It also notes that in a user agent that cares about compromised content processes the media element cannot be trusted so that state has to be managed in a different way, but I don't think that means we need to write it down in that way. Does that make sense?

README.md

anforowicz

Thanks for helping me understand this better. Overall changes LGTM, but let me try to propose a few tweaks to (IMO) make things easier to understand.

annevk · 2021-10-06T07:44:48Z

@anforowicz appreciate the careful review! I believe I have now incorporated all your feedback. The reason I went with "subsequent" instead of "repeated" is because the latter did not feel quite right to me as the individual fetches are different and not the same as a previous one. Hope that makes sense.

README.md

anforowicz · 2021-10-06T16:07:36Z

README.md

@@ -65,14 +74,17 @@ To determine whether to allow response _response_ to a request _request_, run th
 1. If _mimeType_ is not failure, then:
   1. If _mimeType_ is an opaque-safelisted MIME type, then return true.
   1. If _mimeType_ is an opaque-blocklisted-never-sniffed MIME type, then return false.
-   1. If _response_'s status is `206` and _mimeType_ is an opaque-blocklisted MIME type, then return false. TODO: is this needed with the requesters set?
+   1. If _response_'s status is 206 and _mimeType_ is an opaque-blocklisted MIME type, then return false.


This step seems a bit undesirable

This special case for 206 seems to mean that a video served as text/html will not work with 206 responses, right? This seems undesirable.

Maybe we can remove this special case?

For non-media, "no-cors media request state" will be "N/A" and ORB will block the response in the following 2 new-ish steps:

If request's no-cors media request state is "subsequent", then return true.

If response's status is 206 and validate a partial response given 0 and response returns invalid, then return false.

If 206 is for the beginning of the body, ORB would still block the response after it doesn't sniff as image/audio/video/javascript (in later steps).

This step seems a bit undesirable

This step stems from https://fetch.spec.whatwg.org/#corb-check. Did Chrome end up removing it?

Maybe we can remove this special case?

I'm not sure how the first quoted step is relevant.

As for the second step, earlier you mentioned that this was useful to have and would happen in the implementation before sniffing. Are you suggesting to only do it when no-cors media request state is "initial"? I suppose we could make that change. It would allow the server to serve random 206 responses as JavaScript (if they happen to parse), but I guess that's okay.

This step stems from https://fetch.spec.whatwg.org/#corb-check.

Doh... I forgot about it. Yes, Chrome's CORB implementation still blocks 206 responses with html/json/xml MIME types. Given this, I think we should just land the PR in its current shape.

Are you suggesting [...]

I was just trying to argue why I think the earlier step (blocking 206 for html/xml/json) might be redundant, because latter steps (without any changes) would block such html/xml/json responses.

anforowicz

Thanks @annevk! I have one more question (sorry for missing it earlier) about blocking 206 for html/xml/json mime types (see my new inline comment).

annevk commented Jan 15, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

annevk mentioned this pull request Jan 22, 2021

Content-Range header parsing WICG/background-fetch#153

Open

annevk mentioned this pull request Jan 29, 2021

WIP web-platform-tests/wpt#27272

Draft

annevk added 2 commits October 4, 2021 11:35

Improve logic for media elements

c41b654

Closes #4.

clarify a couple things

fcc5a88

annevk force-pushed the annevk/range-requests branch from 256ce72 to fcc5a88 Compare October 4, 2021 10:03

annevk commented Oct 4, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

annevk requested a review from anforowicz October 4, 2021 10:06

anforowicz reviewed Oct 4, 2021

View reviewed changes

README.md Show resolved Hide resolved

anforowicz reviewed Oct 4, 2021

View reviewed changes

README.md Show resolved Hide resolved

anforowicz reviewed Oct 4, 2021

View reviewed changes

anforowicz reviewed Oct 5, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved