Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logic for media elements #16

Merged
merged 3 commits into from Oct 7, 2021
Merged

Improve logic for media elements #16

merged 3 commits into from Oct 7, 2021

Conversation

annevk
Copy link
Owner

@annevk annevk commented Jan 15, 2021

Closes #4.

README.md Outdated Show resolved Hide resolved
@jakearchibald
Copy link
Contributor

Here are the attacks I came up with for partial responses and media elements whatwg/fetch#144 (comment). In summary:

  • A media element cannot mix opaque and non-opaque data, otherwise you may be able to learn something about the opaque data.
  • A media element cannot mix opaque data from different places, otherwise you could learn something about unknown opaque data by mixing it with known opaque data.
  • The content-range must start at the same offset as the requested range, although a 200 or 0- response is ok as long as it's treated as a restart.
  • Responses shouldn't be combined if it looks like the underlying data has changed. This prevents old cached data being mixed with new data.

I think a media element needs a valid opaque response url, which is the URL it'll accept opaque data from. It can be null (unset), "none", or a URL.

When the media element performs its first fetch for a particular piece of media, if the response is not-opaque, the valid opaque response url becomes "none", otherwise it becomes the final response URL.

Then, subsequent opaque responses must match the valid opaque response url.

I can't decide if it's ok to use two opaque responses that have the same final response url, even if they used a different set of redirects to get there. If it isn't ok, then the valid opaque response url needs to be a list of URLs, which must match the response url list exactly.

https://wicg.github.io/background-fetch/#validate-partial-response-algorithm has some basic validation of range-requested vs range given. Basically, if the response content-range start point is different to the range request, it should be considered an error. It also checks that content-length and/or etags haven't changed, as this suggests the underlying content has changed.

If the media element is reinitialized (eg, the src is changed) then the valid opaque response url is reset to null.

@annevk
Copy link
Owner Author

annevk commented Jan 18, 2021

Thanks, making those changes will help a lot I think. We can also put this valid opaque response URL field on the request, allowing us to drop the side table entirely from this algorithm. A problem here is getting implementations aligned on their media element fetching logic as that currently seems to be a mess. Some stuff I read on this (and resources linked from there):

README.md Outdated Show resolved Hide resolved
@annevk annevk requested a review from anforowicz October 4, 2021 10:06
@annevk
Copy link
Owner Author

annevk commented Oct 4, 2021

I updated this and would appreciate another look from @anforowicz and @jakearchibald. I think it would be good to land this as it clarifies a number of things.

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link
Collaborator

@anforowicz anforowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU the proposal here changes how the are-206-reponses-allowed state is handled. Before the changes the state was 1a) stored globally and 1b) stored a set of initiator/media-url pairs. After the changes the state is 2a) stored per request and 2b) only stores 3 states of a request.

In Chrome implementation a request (e.g. network::ResourceRequest, or network::URLLoader or net::URLRequest) is transient - the request goes away after getting a response (200, 206, or some other response kind). Therefore it doesn't seem possible to inspect a previous request's state when deciding whether to block the next request. Maybe I don't understand what is meant by "a request" (e.g. in "a request has an associated no-cors media URL") in the proposed changes.

FWIW, I've just started trying to implement ORB, and for the most part the "before" state of the spec looks good to me. One possible change / clarification is that in Chrome the state (the set of initiator/media-url pairs) would be stored in network::URLLoaderFactory (roughly per HTML document / per script execution context). Such state lifetime would be slightly tighter than what the spec says today (global / per user-agent state) although theoretically the lifetime could be even tighter (per media element).

@annevk
Copy link
Owner Author

annevk commented Oct 5, 2021

@anforowicz this PR makes it essentially per media element. The media element keeps track of its requests. The initial request it does is marked with "initial-request" and subsequent are marked with the URL (though as I commented in my own review comment that's not strictly necessary).

It also notes that in a user agent that cares about compromised content processes the media element cannot be trusted so that state has to be managed in a different way, but I don't think that means we need to write it down in that way. Does that make sense?

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@anforowicz anforowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping me understand this better. Overall changes LGTM, but let me try to propose a few tweaks to (IMO) make things easier to understand.

@annevk
Copy link
Owner Author

annevk commented Oct 6, 2021

@anforowicz appreciate the careful review! I believe I have now incorporated all your feedback. The reason I went with "subsequent" instead of "repeated" is because the latter did not feel quite right to me as the individual fetches are different and not the same as a previous one. Hope that makes sense.

README.md Show resolved Hide resolved
@@ -65,14 +74,17 @@ To determine whether to allow response _response_ to a request _request_, run th
1. If _mimeType_ is not failure, then:
1. If _mimeType_ is an opaque-safelisted MIME type, then return true.
1. If _mimeType_ is an opaque-blocklisted-never-sniffed MIME type, then return false.
1. If _response_'s status is `206` and _mimeType_ is an opaque-blocklisted MIME type, then return false. TODO: is this needed with the requesters set?
1. If _response_'s status is 206 and _mimeType_ is an opaque-blocklisted MIME type, then return false.
Copy link
Collaborator

@anforowicz anforowicz Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems a bit undesirable

This special case for 206 seems to mean that a video served as text/html will not work with 206 responses, right? This seems undesirable.

Maybe we can remove this special case?

For non-media, "no-cors media request state" will be "N/A" and ORB will block the response in the following 2 new-ish steps:

  1. If request's no-cors media request state is "subsequent", then return true.
  2. If response's status is 206 and validate a partial response given 0 and response returns invalid, then return false.

If 206 is for the beginning of the body, ORB would still block the response after it doesn't sniff as image/audio/video/javascript (in later steps).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems a bit undesirable

This step stems from https://fetch.spec.whatwg.org/#corb-check. Did Chrome end up removing it?

Maybe we can remove this special case?

I'm not sure how the first quoted step is relevant.

As for the second step, earlier you mentioned that this was useful to have and would happen in the implementation before sniffing. Are you suggesting to only do it when no-cors media request state is "initial"? I suppose we could make that change. It would allow the server to serve random 206 responses as JavaScript (if they happen to parse), but I guess that's okay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step stems from https://fetch.spec.whatwg.org/#corb-check.

Doh... I forgot about it. Yes, Chrome's CORB implementation still blocks 206 responses with html/json/xml MIME types. Given this, I think we should just land the PR in its current shape.

Are you suggesting [...]

I was just trying to argue why I think the earlier step (blocking 206 for html/xml/json) might be redundant, because latter steps (without any changes) would block such html/xml/json responses.

Copy link
Collaborator

@anforowicz anforowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @annevk! I have one more question (sorry for missing it earlier) about blocking 206 for html/xml/json mime types (see my new inline comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restrict opaque-safelisted requesters set more?
3 participants