Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2I: amp-form: Initialize form from URL #24876

Closed
mattwomple opened this issue Oct 3, 2019 · 24 comments · Fixed by #24671
Closed

I2I: amp-form: Initialize form from URL #24876

mattwomple opened this issue Oct 3, 2019 · 24 comments · Fixed by #24671
Labels
Component: amp-form Customer: Developer INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code P2: Soon Type: DevX issues impacting developer experience WG: components

Comments

@mattwomple
Copy link
Member

Summary

Introduce amp-form attribute initialize-from-url to populate form inputs
on page load from URL query parameter values.

Feature request (#24533) and work-in-progress pull request (#24671) already exist, but it was suggested this is significant enough of a change to warrant an I2I.

Design document

(none)

Motivation

When a user navigates to a search results URL like /search?q=my+search, it is typically expected that the search text input on the page is populated with the search term, in this case my search. It is already possible to populate an <amp-list> via query parameters thanks to AMP variable substitution with QUERY_PARAM, so it would be great if AMP allowed setting default values of form inputs by checking the URL. For example, if new <amp-form> attribute initialize-from-url existed, then inputs under the form could have their default value set from the URL GET parameters.

Alternatives considered

It is possible output each form input using amp-list. The value attributes of inputs can then be set with mustache, e.g. <input name="search" type="text" value="{{search}}">. This requires a special endpoint created to echo back query parameters. Example: https://ampsupport.wompmobile.com/echoquery/dictionaryformat?q=QUERY_PARAM(q). This has obvious disadvantages:

  • Server call necessary just to render an input
  • placeholder and fallback in amp-list cannot contain the same input or else the form risks having multiple inputs with the same name attribute (they are all in the DOM even if they are not all visible)

Additional context

Minimal example with <amp-list> workaround incorporated:
android.cmphys.com/temp/amp-form-ex.html?q=my+search

Worth noting, the reason the inputs are initialized to my+search is due to bug #24534 which has PR #24626 waiting for review.

/cc @ampproject/wg-approvers @caroqliu

@mattwomple mattwomple added the INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code label Oct 3, 2019
@cramforce
Copy link
Member

Does applying the URL values trigger change events on the inputs that can be reacted to via amp-bind/amp-script?

@mattwomple
Copy link
Member Author

@cramforce Yes. See PR #24671 . If any form input is updated the following event is fired:

const formChangeEvent = createCustomEvent(this.win_, AmpEvents.FORM_VALUE_CHANGE, null, {bubbles: true});
targetChange.dispatchEvent(formChangeEvent);

One of the TODO items on that PR is "Verify it is sufficient to fire one amp form update event at the end of form initialization instead of one per input updated".

I'm still not sure that this change is significant enough to warrant a full design document, but I'm at least familiarizing myself with the expectations and can write one if it would help answer questions like these.

@cramforce
Copy link
Member

I'm worried that this leads to jumping on page load.
CC @choumx @dvoytenko

@dreamofabear
Copy link

Yea, changing input values is ok as long as they don't affect size or position. But anything that can cause content jumping must not be allowed.

This is also probably the wrong usage of FORM_VALUE_CHANGE, which is currently used to indicate form "dirtiness" AKA modified by user. A dynamic default value is not dirty.

Re: the amp-list alternative, we can avoid the network call if we support rendering from local data. This would also address your other FR #24832.

Something like:

<amp-state data-amp-replace="QUERY_PARAM" id=local_data>
  {
    "foo": "QUERY_PARAM(q)"
  }
</amp-state>

<amp-list src="amp://local_data">
  <template type=amp-mustache>
    <input value="{{foo}}">
  </template>
</amp-list>

Probably needs more thought to discourage excessive client-side rendering.

@dreamofabear
Copy link

dreamofabear commented Oct 3, 2019

Just saw your PR. A targeted, form-specific feature seems simpler and reasonable too.

Though we need to watch out for content jumping due to CSS selectors like :checked.

@mattwomple
Copy link
Member Author

mattwomple commented Oct 3, 2019

Thanks for your input @choumx .

Interesting, I intentionally used FORM_VALUE_CHANGE because I do consider the form dirty once inputs are modified from their default values -- default being the values hard-coded in the HTML.

I appreciate the idea in your <amp-list> workaround, but it's probably still not optimal. You cannot include a placeholder with inputs of the same name, so you will certainly see missing content while the <amp-list> component loads and runs.

@nainar
Copy link
Contributor

nainar commented Oct 4, 2019

cc @ampproject/wg-ui-and-a11y and @caroqliu

@caroqliu
Copy link
Contributor

@ampproject/wg-caching What are the considerations regarding query params leaking to forms? Is this a security concern here?

@honeybadgerdontcare
Copy link
Contributor

@ampproject/wg-caching What are the considerations regarding query params leaking to forms? Is this a security concern here?

Interesting, I don't think I'm knowledgeable enough to answer that fully.

Do components have access to the URL already? IOW, does this change give components access to data it doesn't already have?

It seems from the design review that there is concern that it should be given a security review.

@mattwomple
Copy link
Member Author

Do components have access to the URL already? IOW, does this change give components access to data it doesn't already have?

@honeybadgerdontcare Others may be able to provide additional details, but I believe this summarizes the discussion from the design review:

Components have had access to certain parameters in the URL for quite a while thanks to variable substitutions: QUERY_PARAM(abc)

  • amp-list, amp-pixel, and other components can forward query parameters to another server without user interaction on page load
  • input[type=hidden][data-amp-replace] in forms requires form submission to send param values to another server

Where a security review could be warranted is that the suggested implementation <form data-initialize-from-url> no longer individually identifies query parameters that can be extracted from the URL and sent with a form submission. Rather, all supported fields (<select>, <textarea>, and most <input>s that can be set with .value) under the form that have an input with name attribute matching a URL parameter could be pre-populated.

@honeybadgerdontcare
Copy link
Contributor

@mattwomple could we require a specific prefix for these query parameters, e.g. amp-form-, for substitution to be done to their corresponding fields? It's unlikely that such a prefix would be used elsewhere as query parameters and that the parameter is intentionally being used for this purpose.

@mattwomple
Copy link
Member Author

could we require a specific prefix for these query parameters?

@honeybadgerdontcare I suppose I'm not sure who this benefits.

  1. When the developer flags a form with data-initialize-from-url, they are already making the statement that each field within the form is fair game for populating via URL. If anything, I'd suggest we introduce the opposite flag for individual fields: data-omit-from-url-initialization. It seems to me that a blacklist would be less cumbersome to implement than a whitelist (if needed at all).
  2. AMP caches have good reason to insist on AMP pages being safe to serve from alternate domains, but I don't see how marking individual fields as safe makes a difference. This goes back to the first point, if a form is configured to use this new feature, the developer is already making a statement that all of the inputs are safe for pre-population.
  3. The end user doesn't have any control over the fields that can be pre-populated; they get what's served to them.

@honeybadgerdontcare
Copy link
Contributor

Perhaps I misunderstand the implementation here. I was under the impression that there may be URL query parameters (on the original document) that shouldn't be accessible to components. For example a unique id for analytics but shouldn't be shared to form fields that could be sent to another server. By using an explicit prefix then those query parameters are stating they know they may be accessed by components such as amp-form and it's fields and potentially shared to another URL.

@mattwomple
Copy link
Member Author

Ah, sure, good example. Then yes, either whitelisting or blacklisting individual fields would be worthwhile!

@cramforce
Copy link
Member

One question I had: Should this be turned on on the AMP Cache at all?

@mattwomple
Copy link
Member Author

One question I had: Should this be turned on on the AMP Cache at all?

It's reasonable to suggest that this feature doesn't need to work when pages are served from the AMP cache. No publisher is going to want to share a link like https://www.bing.com/amp/s/example.com/search?q=my+search over something from their own domain. The only use case I can see for this feature on the AMP cache is that a page is designed to submit a form to itself (relative action), but since the AMP cache rewrites relative actions to be absolute, this is a no-go anyways. I'm fine with limiting this feature to publisher domains.

@mattwomple
Copy link
Member Author

Follow-up to Design Review 2019-10-23 (#24591) and other comments

  • Is the constructor in extensions/amp-form/0.1/amp-form.js the right place to handle form initialization if attribute data-initialize-from-url is present?
    • This is appropriate since there is no layout/unlayout for amp-form; it is not an AMP custom element.
  • Are query parameters the right choice for this feature, or would a fragment work?
    • Including parameters in a fragment would not handle the scenario where an AMP page is the action target of a GET form submission. For example, if page /search includes a GET form that submits to /search/results, then that form submission will append query parameters, not a fragment, to the action.
  • <input type=checkbox> and <input type=radio> checked-state can be detected with CSS :checked. What about CSS selectors [value=], :empty, :valid, and :invalid for other form fields?
    • Setting .value on supported form fields does not alter those nodes' attributes or children, with the exception of <input type="hidden">. So, :empty CSS selectors will not match any initialized fields and [value=] could only match <input type="hidden">. :valid and :invalid selectors do match fields modified with .value, so page content could be altered by specially designed CSS on page load. Given that pseudo selectors :valid and :invalid are as powerful as :checked in writing CSS logic, there seems no reason to omit <input type=checkbox> and <input type=radio> from the initialize-from-url feature (if it is approved).
  • Is it desirable to have search engines index distinct content for a single AMP page based on query string?
    • This is common practice in e-commerce, to have popular product searches indexed by search engines. amp-list can already change the page content based on query param with src="/results?q=QUERY_PARAM(q)". Further, only crawlers willing to run JavaScript will ever see the distinct content.
  • Could we prefix form field names with amp-form- so that it is clear they are designed to be initialized from URL? Or could we use per-element opt-in via a new attribute like data-allow-initialization?
    • Whitelisting is a good idea. Consider a form that contains a captcha or unique id for analytics. It would not be appropriate to initialize those fields from the URL.
  • Should this be turned on on the AMP Cache at all?
    • Nah.

The example page included in PR #24671 has been updated to match these design considerations. Direct link to sample page using pr-deploy-bot. Notably, it shows some sample layout hacking on page load that could be (ab)used if this feature is accepted.

@twifkak
Copy link
Member

twifkak commented Oct 30, 2019

  1. I'm not sure how to reconcile these two statements:
    a. It is desirable to have search engines index the same page with different query strings.
    b. It is not necessary to enable this on the cache.
    If you include a link to /results?q=couches and a search engine indexes that and wants to display the results in a viewer, then we would need this feature enabled on the cache, no?

  2. Is there a particular harm to letting this run on the cache (or even an unparticular "feels a bit fishy")? If so, then I guess we have two options; which one of these is preferred?
    a. Write a cache-side transformer to remove this attribute
    b. Runtime disables this feature when it senses it is running on a cache

  3. Regarding security implications, I'm not a security expert. But I'm reminded of the "mass assignment" feature in Ruby on Rails, which I believe has been a source of vulnerabilities in the past. Opting into the feature (with the form attribute) is a minimum, but it may be worth exploring ideas for how to make sure this isn't a footgun (e.g. definitely don't modify <input type=hidden>s). Sorry I don't have more time to think about this; gotta run. Maybe there are some good ideas to borrow from Rails history.

@mattwomple
Copy link
Member Author

  1. If you include a link to /results?q=couches and a search engine indexes that and wants to display the results in a viewer, then we would need this feature enabled on the cache, no?

Well, remember that desktop computers are still a thing, so that's a reasonable market share that depends on indexing without caring about AMP caches. And if you live in my fantasy world where signed exchanges will – in reasonable time? – have widespread support, then the AMP cache concern becomes moot; the amp page is no longer tied to the search domain.

  1. Is there a particular harm to letting this run on the cache (or even an unparticular "feels a bit fishy")? If so, then I guess we have two options; which one of these is preferred?

The approach I've taken in PR #24671 is to bail out if a proxy (cache) origin is detected:
17caa8f#diff-4f7d7ef94609d4d079c3fde230f0d924R1222

  1. Regarding security implications, I'm not a security expert. But I'm reminded of the "mass assignment" feature in Ruby on Rails, ... (e.g. definitely don't modify <input type=hidden>s)

The stakes are lower here since we're only talking about client-side programming. The only real concern is "can a user submit a form without knowing what they are submitting?" I argue that this is already possible today, and in AMP. Drop an amp-list inside a form and let it generate <input type="hidden" name="secret" value="{secret}"> via template. All you need to support this amp-list is an echo endpoint, e.g.

Similarly, there already exists a feature in AMP to populate hidden inputs just before form submission; see amp-form: Variable Substitutions.

Outside of AMP, this same feature can be accomplished with a few lines of JavaScript (error handling omitted for brevity):

let url = new URL(location.href);
for (let name of url.searchParams.keys())
  document.querySelector(`[name=${name}]`).value = url.searchParams.get(name);

So, I really don't think we're letting any new dragons loose with this feature request.

@twifkak
Copy link
Member

twifkak commented Oct 31, 2019

  1. If you include a link to /results?q=couches and a search engine indexes that and wants to display the results in a viewer, then we would need this feature enabled on the cache, no?

Well, remember that desktop computers are still a thing, so that's a reasonable market share that depends on indexing without caring about AMP caches. And if you live in my fantasy world where signed exchanges will – in reasonable time? – have widespread support, then the AMP cache concern becomes moot; the amp page is no longer tied to the search domain.

Fair enough. I'd still argue it'd be useful to support this on cache so that mobile in-viewer results are displayed the same as on-origin, but if folks are concerned about the safety of running this feature on-cache, I would just make sure we document this limitation.

  1. Regarding security implications, I'm not a security expert. But I'm reminded of the "mass assignment" feature in Ruby on Rails, ... (e.g. definitely don't modify <input type=hidden>s)

The stakes are lower here since we're only talking about client-side programming. The only real concern is "can a user submit a form without knowing what they are submitting?" I argue that this is already possible today, and in AMP.

I think your statement of the concern is correct. (In the extreme case, the publisher may style the submit button to look like a link, and the user may not think they are submitting anything.)

Agreed it's already possible (maybe with amp-script, too?), but I think it's worth designing what's easy/hard, since people will mostly do what's easy.

It sounds like you're saying we can defer security countermeasures until the server-side, since that's where the potentially dangerous effect actually takes place. I can see that argument, but also an argument for defense-in-depth when it's possible and doesn't hurt DX too much.

Looking at Rails' response to its own high-profile mass assignment vulnerabilities, I see three things:

  • Moving the security logic from the model to the controller (link), since model security may depend on context (e.g. you may only want the "user.email" field to be mass-assignable on a profile page). In essence, AMP is acting as the controller here, so there's nothing for us to do here.
  • Default to disallowing mass-assignment except for whitelisted attributes (link). I think there is something for us here.
  • Require a CSRF token (link). We can't do this client-side, so maybe the only action item is to make sure the answer to Document AMP CSRF solution  #9471 is documented wherever this feature is.

Regarding whitelisting/blacklisting, I guess there are several ways to go. I'll throw out my strawman but I'm not tied too strongly to it:

  • Default to setting everything except input[type=hidden]. (What about form fields that are simply visually hidden with display:none or visibility:hidden or opacity:0 or z-index:-infinity?)
  • Allow hiddens to be set with a data-also-initialize attribute.
  • If at least one form field has a data-only-initialize attribute, then don't set any fields that don't have that attribute.

(Sorry if I'm retreading old ground! I came to this thread late.)

@mattwomple
Copy link
Member Author

Thanks @twifkak, all of your comments are appreciated. I believe the field whitelisting performed in PR #24671 satisfies the bulk of your whitelisting-related suggestions: each field must have attribute data-allow-initialization to be considered for initialization (code link). Do you agree?

I understand this thread and that PR have seen plenty of comments and code changes, so I'll work on adding documentation to the PR soon. Then, everyone can verify the implementation matches the design goals. Thanks for your patience.

@twifkak
Copy link
Member

twifkak commented Oct 31, 2019

Thanks @twifkak, all of your comments are appreciated. I believe the field whitelisting performed in PR #24671 satisfies the bulk of your whitelisting-related suggestions: each field must have attribute data-allow-initialization to be considered for initialization (code link). Do you agree?

Ah, yes, I agree. I hadn't read the PR. Thanks!

@mattwomple
Copy link
Member Author

mattwomple commented Nov 4, 2019

PR #24671 now includes unit tests and documentation. It has been moved out of draft status. Reviews appreciated! pr-deploy-bot sample

@cramforce
Copy link
Member

👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: amp-form Customer: Developer INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code P2: Soon Type: DevX issues impacting developer experience WG: components
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants