New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is origin in epub context? #873
Comments
There is guidance to this effect under the security considerations section of the Content Documents specification:
and
http://www.idpf.org/epub/31/spec/epub-contentdocs.html#sec-scripted-content-security |
Well, it looks like guidance might not be enough since there was the same issue reported on the Readium repo in August: readium/readium-js-viewer#559 (iframe not sandboxed). Now, I would understand something more would be “out of scope”. |
Due to various technical constraints depending on target platforms (e.g. cloud / web browser -based reader vs. native app web view), Readium's core "engine" is not always capable to implement totally watertight sandboxing (in Readium's case: HTML resources displayed inside iframes). Content Documents are served from different origins, and not always through HTTP (e.g. custom URL protocols on Chrome extension or Electron or Cordova). In some cases, the reading system can only inject "behaviours" such as media overlays playback, highlights/annotations, etc. into EPUB content when both the app and the content are in the same domain. As for LocalStorage, there's also the inverse problem to "everybody can see my data": in some cases the content URLs' domains vary from one reading session to the next (e.g. random HTTP port number), resulting in a user's recorded data not being persistent (e.g. EPUBs that contain scripts that track some sort of activity progress, or that memorise user preferences). |
cc @rkwright |
The issue was discussed in a meeting on 2021-02-11
View the transcript2. OriginSee github issue #873. Dave Cramer: i don't feel like we're going to resolve on this today, but i think of this as one of the big issues around epub that we've been deferring
Dave Cramer: first of all, i think the statement is problematic. Domain is not the right concept here Ivan Herman: do we have a kind of proper overview of what happens in RS today?
Ivan Herman: we should not reinvent the wheel if possible, just standardize or unify on what the practice is Dave Cramer: comment describes some of what happens in one RS, the type of complexity that we face Ivan Herman: is that in line with other RS? Dave Cramer: less certain on that point Brady Duga: preface, i am not an expert in this area Dave Cramer: that was very helpful Tzviya Siegman: i'm aware that the fetch API as supplanting CORS Dave Cramer: i think fetch is how things work now Brady Duga: i thought fetch still relies on CORS? Dave Cramer: yes, i think the concepts of origin and cross-origin are still there Ivan Herman: first of all, yes. I think we should ask. Sooner the better. Dave Cramer: i'm not sure that mental model holds up when we are faced with the task of implementing an RS Ivan Herman: at the moment we are silent on script behaviour aside from announcing scripts are present Dave Cramer: I think next step is reaching out to TAG Tzviya Siegman: i can also help with that Gregorio Pellegrino: question for ivan, if you open two epubs, do they run on the same host? Ivan Herman: yes, the second Brady Duga: i think short answer, the mental model we're discussing here is right - epub is website in a box Ivan Herman: i said localhost with a different port for each epub, so logically speaking, they would be independent Dave Cramer: i wonder if each epub should be its own opaque origin Brady Duga: the other problem is that the epub often isn't local, and the RS might actually be making calls to a server, and may not have control over what the URL is, what the port is, etc. Dave Cramer: to get the results we want RS may have to do a lot of trickery to disguise how they actually implement things in order to present a unified experience George Kerscher: do we need a normative/descriptive piece in our spec, and be silent on this otherwise Ivan Herman: we are pretty silent already Matt Garrish: to date we've left it to the RS to decide whether to support scripting, and to what extent
Dave Cramer: i don't want to go backwards, saying nothing is problematic
Ivan Herman: will you put together something for TAG? Might be worth sharing with some implementers before you go to TAG
Tzviya Siegman: and I can help with that Dave Cramer: maybe we can open an issue in our repo with the problem statement, to collect insight from this WG before we formally contact TAG |
My comment above dates back to a few years ago. I wrote a more up to date analysis for Thorium (iframe, sandboxing, origin, etc.): |
The issue was discussed in a meeting on 2021-02-18 List of resolutions:
View the transcript3. Origin, cont'dWendy Reid: this is continuing from last week's meeting Dave Cramer: i think most of the discussion is in issue 1153 Leonard Rosenthol: the thing that is most problematic is the difference between actually doing this in a browser with a content hosted on a real domain vs doing this on a device (mobile, desktop, etc.) Dave Cramer: i hear you Leonard Rosenthol: the problem is that you can't do that Dave Cramer: could you solve that problem with different subdomains for each title? Leonard Rosenthol: yes, but only in a world where all the epubs come from the same publisher Dave Cramer: you're kind of creating a non-conforming RS in this example Leonard Rosenthol: that would make all web-based RS non-conforming Wendy Reid: I think dropbox actually does have an ebook reader.... Leonard Rosenthol: they're probably taking advantage of no scripting then Wendy Reid: i think the solution that most RS have come to is just to avoid scripting entirely Leonard Rosenthol: that doesn't solve other things, e.g. referencing Brady Duga: this really seems like a scripting issue Dave Cramer: Jiminy has real world examples of this sort of stuff Brady Duga: maybe? It depends on the RS and the content Leonard Rosenthol: if, say, you're building your own software and documents, and you control the entire system there's no reason why you wouldn't want to do it that way Dave Cramer: one thing to do is go back to our current language Leonard Rosenthol: can probably change that so that each epub is its own origin, like you said earlier Matt Garrish: the original wording came at a time when we were just starting to open epub to scripting Dave Cramer: to me i feels like a little bit of progress if we relax the current language to say "per epub" instead of "per content document" Brady Duga: right now the spec is more restrictive, but we're already finding examples IRL where RS are not honoring it Matt Garrish: depends where we are going with this Dave Cramer: given all that, should we take the baby step of updating the non-normative guidance that the boundary should be "per epub"?
Brady Duga: does that include changing from "domain" to "origin"? Dave Cramer: yes, i think so
Wendy Reid: that's everything that was on the agenda tonight Dave Cramer: i think i do have an action item to talk to TAG about the general ideas around epub security Wendy Reid: there is most likely going to be a special session at the business group next week about WCAG3 |
OK so this might be a security issue to some extent.
As far as I know, there’s nothing about “origin” in the EPUB spec.
Why is this an issue?
Because
localStorage
. See https://html.spec.whatwg.org/multipage/webstorage.html#the-localstorage-attribute and https://html.spec.whatwg.org/multipage/browsers.html#concept-originIn other words, Reading System as the origin is valid, which means you can retrieve every item stored in the RS and not only the local storage area for one EPUB file.
Now, at the moment, it appears you can get items set in other EPUB files in some RS. See following screenshot (width was set in one file, we
getItem
using JavaScript in another file)Here, we actually retrieve the whole storage using a loop (every item set in different files before running the script can be accessed)
I must admit I would be much more comfortable if origin = each EPUB file and not the whole RS.
If someone set sensitive data in localStorage at some point, you could theoretically access it from another file and it would be valid per spec.
The text was updated successfully, but these errors were encountered: