Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some metadata URL not re-written #501

Open
kptdobe opened this issue Jan 22, 2024 · 3 comments · May be fixed by #509
Open

Some metadata URL not re-written #501

kptdobe opened this issue Jan 22, 2024 · 3 comments · May be fixed by #509
Assignees
Labels
bug Something isn't working

Comments

@kptdobe
Copy link

kptdobe commented Jan 22, 2024

While looking at source code of page https://www.bitdefender.com.au/solutions/, I discovered:

<meta name="experiment-variants" content="https://main--bitdefender--hlxsites.hlx.page/solutions/experiments/index-experimentation-exp01, https://main--bitdefender--hlxsites.hlx.page/solutions/experiments/index-experimentation-exp02, https://main--bitdefender--hlxsites.hlx.page/solutions/experiments/index-experimentation-exp03">

 
The experiment-variants exposes hlx.page urls in production. Based on @meryllblanchet's feedback, the code consuming this meta is aware of that and does not use the host to fetch the experiments. While this is working, it is not nice to expose internal urls in production.

I suggest to add a list of known metadata for which the URLs would made absolute with production host (like https://github.com/adobe/helix-html-pipeline/blob/main/src/steps/extract-metadata.js#L254)

cc @ramboz

@kptdobe kptdobe added the bug Something isn't working label Jan 22, 2024
@ramboz
Copy link
Collaborator

ramboz commented Jan 22, 2024

@kptdobe Would it be more appropriate to just fix all metadata entries by default? I'm anticipating that customers will potentially add a slew of .page and .live URLs in there (not just for experiments).

For our experimentation plugin, we already would need to support:

  • the URLs to the experiment variants
  • the manifest path to the experiments
  • the manifest path to the campaigns
  • the manifest path to the audiences

And the plugin assumes that you can even rename those via config if authors prefer a different naming convention, so we'd still not fix URLs in that use case.

Add to this the typical SEO links that I've seen customer using:

  • canonical (if customized)
  • alternate language meta tags
  • rss/atom feeds

It seems to me that trying to allow-list every option is a bit of a moot point as we'll obviously always miss a few as the use cases grow.

@kptdobe kptdobe self-assigned this Jan 23, 2024
@kptdobe kptdobe linked a pull request Jan 23, 2024 that will close this issue
@davidnuescheler
Copy link

i think this is a broader conversation that expands into other aspects.
currently we use no heuristics to identify a "value" as a URL and decide if it should be rewritten or not. we only rewrite links (eg. href= attributes) as we are clear on the sematics.

this means that there are a range of values where we currently rely on client sided techniques to remove the hostname and effectively rewrite those on client... this includes link texts (eg. someone just pasting a URL), values in spreadsheets, values in metadata, values in block configs, etc.)

at this juncture, i think it would be better if we rewrote these URLs, or made them relative, depending on the use case but especially the latter yields a lot of incompatibilities with existing javascript code (eg. new URL()) that is in place to handle the write or interpret the current setup with heuristics (eg. checking for .page/ .live ) in javascript code.

i think the end goal should be that we don't serve any URL references as values leaking implementation details or pointing to different origins. i think there is set of standardized metadata where link semantics are implied (eg. canonical, feeds, hreflang) and could rewrite them without much danger of breaking things.

applying a more general rewriting for metadata or other values, has proven problematic in practice on individual projects / customers, so i would definitely be very careful and apply a lot of testing for rewriting custom (project or app specific) metadata, and possibly treat this as an opt-in / feature flag via config bus some time in the future to avoid breaking existing projects.

@tripodsan
Copy link
Contributor

it could also be handled via the upcoming template feature, eg https://{{url.host}}/experiment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants