Skip to content
This repository has been archived by the owner on Jun 24, 2022. It is now read-only.

Existing intervention: user gesture required for sensitive operations #12

Closed
RByers opened this issue Mar 11, 2016 · 10 comments
Closed

Comments

@RByers
Copy link
Member

RByers commented Mar 11, 2016

Chromium has a notion of a "user gesture" which indicates that we believe the user is explicitly interacting with the page (eg. mouse click, but not mouse move or wheel). Then certain sensitive operations are restricted to apply only if it can "consume" a user gesture (eg. one successful window.open call per mousedown/up/click sequence). Some of this maps to the pop-up blocking algorithm in the HTML5 spec. But I'm not sure how well spec'd the details are, what tests exist, and how much interoperability there is between browsers on this. Perhaps we should try to expand this issue with references / details?

Here's a (mostly complete) list of the things that require a user gesture in chromium:

  • Allowing pop-ups
  • Going full screen (requestFullscreen)
  • Writing to the clipboard
  • Some scenarios of form submission and requestAutocomplete
  • PresentationRequest::start
  • Enabling mouse lock
  • Various similar operations inside of plugins (fullscreen, mouse lock, etc.)
  • Buffering aggressively when media is paused, and potentially auto-playing media
  • ‘color’ and ’file’ input types responding to an activation event
  • Showing an IME (eg. on screen keyboard) on element focus (permitted anytime after a user gesture has occurred since page load)
  • WebBluetooth and WebUSB requestDevice (experimental)
@toddreifsteck
Copy link

Microsoft Edge has found that the "user input flag" is flowing to some types of callbacks on mobile which was causing significant interop issues due to the lack of a public spec and agreement.

(My personal theory is that the history was to make video autoplay "work" for libraries originally built/tested on desktop, but I'll defer to Chrome experts who will be more familiar with the history.)

We’ve observed it flowing through all of the following in Chrome on Android:

  • setTimeout
  • setInterval (the 1st interval, but not any future intervals)
  • window.postMessage

We have observed it does not flow for:

  • Promises
  • RAF

Microsoft Edge's position is that the user flag should either flow to all callbacks OR should be blocked for all callbacks.

We are actively implementing a fix in Edge 14 in internal builds to flow the user input flag to setTimeout/setInterval/setImmediate to unblock a few sites that have issues

@RByers
Copy link
Member Author

RByers commented Apr 13, 2016

Interesting, thanks! Can you give us some data on which sites are affected by this? If Edge has never needed this before, then perhaps it's not worth the complexity and Chrome should just change to be simpler too?

What about for pop-up blocking - do you use a similar algorithm? Does it flow across setTimeout?

@jeisinger
Copy link

Sadly, the "User gesture" concept is not well defined. In WebKit and Blink, we implemented forwarding of the "user gesture" state to the first level of setTimeout calls with a 1s timeout, i.e. if a setTimeout handler invokes setTimeout again, the user gesture won't be forwarded twice, and if the timeout is >1s it won't be forwarded either.

We don't always forward the user gesture via postMessage - it is not forwaded across processes.

I agree that promises could forward the gestures, but why RAF?

What about stuff like XHR events (or IDB events etc.)

In general, the user gesture thing is a bit tricky to handle, as it has this 1s timeout, so if your XHR doesn't come back in time, you'd have lost the gesture. Not exactly developer friendly :(

@domenic
Copy link
Collaborator

domenic commented May 31, 2016

So the spec defines this currently: https://html.spec.whatwg.org/multipage/browsers.html#allowed-to-show-a-popup

I have filed two issues on the spec related:

The latter in particular could use implementer feedback on whether the spec aligns with implementations or not.

@jeisinger
Copy link

Should we also spec that certain operations destroy a usr gesture (opening a window in chrome does that).

@RByers
Copy link
Member Author

RByers commented Jun 2, 2016

This is a good improvement, thanks @domenic!

There's definitely a variety of ways implementation doesn't match the spec here. I'd like Microsoft's (eg @toddreifsteck's) input so let's discuss those details here.

Yes the list of triggering events is too small (eg. should also contain keydown, mousedown), but it's more complex than that - there's not a 1:1 mapping from event to gesture. For example, on a mousedown mousemove* mouseup sequence we take a single UserGesture - so you can open exactly one pop-up from any of those listeners (not one pop-up per movement). What complexity is actually required here for web compat / good user experience is really hard to say - I'd look to Edge's experience (trying to be compatible with Chrome). If they've got examples where they have been successful with something simpler, I'd be open to trying to change Chrome to match.

@RByers
Copy link
Member Author

RByers commented Oct 6, 2016

As part of rationalizing this intervention, we should really also expose an API indicating whether a user gesture is currently in progress. Eg. @dvoytenko has a scenario in AMP that is really no different than the built-in browser scenarios - an untrusted iframe does a postMessage to the main document requesting an action they only want to do in response to a user actually interacting with the frame. I'd argue we should just expose some simple userActivationInProgress bit somewhere.

@dvoytenko
Copy link

Yes, our security model is that we typically allow more changes to an AMP document if we can confirm user action. For instance, we only allow iframes to resize themselves on user action. If we didn't, the page would jump and auto-risize itself without any constraints thus completely obliterating user experience. There are many other features that are only allowed on user action. Currently, we polyfill this functionality via focused state and soon we will also deploy polyfill based on clipboard. But these are not ideal.

@greggman
Copy link

I'm not sure where to bring this up but, speccing which gestures. How about the drag and drop events? There are pages that say "drop an mp3 here" and they'd like to load and play the sound the moment the mp3 is dropped.

@domenic
Copy link
Collaborator

domenic commented Apr 1, 2022

It's amazing coming back to this repository and issue and recalling that at one time, our user activation concept was called "allowed to show a popup" and only applied to window.open()!

These days we have a well-defined concept of user activation. (Well, three-ish, actually: user activation consumption, transient user activation checking, and sticky user activation checking.) And it's used by pretty much everything Rick lists in the original post here, with the exception of showing the IME (not really specced anywhere) and some stuff that died (requestAutocomplete(), plugins). Big kudos to @mustaqahmed for all the work on that over the years.

So we'll close out this issue, as part of the larger project of archiving this repository (#72). As soon as I get write access to this repository.

@domenic domenic closed this as completed Apr 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants