Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attribute .href doesn't work #72

Open
ralyodio opened this issue Nov 23, 2021 · 3 comments
Open

attribute .href doesn't work #72

ralyodio opened this issue Nov 23, 2021 · 3 comments

Comments

@ralyodio
Copy link

  console.log(body);
  const doc = new DOMParser().parseFromString(body, "text/html");
  const links = [...doc.querySelectorAll('a.result-title')];

  for (const link of links) {
    const title = link.innerText;
    const url = link.href;

    console.log({ title, url });
  }

does this library not support standard dom methods? link.href should give me the href attribute.

@b-fuze
Copy link
Owner

b-fuze commented Nov 23, 2021

So far Deno DOM only implements the Element class. .href is a property of HTMLAnchorElement, a more specific DOM element implementation, of which there are many, and I haven't got around to implementing yet. So for now you can use the getAttribute("href") method of Element.

@ralyodio
Copy link
Author

ok no worries.

@jsejcksn
Copy link

jsejcksn commented Sep 19, 2022

One complication here is that — in a browser — the document has a location property that is used to resolve fully-qualified URLs when accessing properties like HTMLAnchorElement.href.

When a document is parsed from an HTML string using a DOMParser instance, there's not a way to attach the location information to the resulting document with the current API.

This makes it non-trivial to get fully-qualified URLs from properties on elements within the trees of such parsed documents.

However, this is both desirable and a common task, so I want to share two workaround approaches for resolving URLs from anchor element href attributes:

Functional approach

This is safer and has better type compatibility. Here's a commented example:

href-example.ts:

import {
  DOMParser,
  type Element,
} from "https://deno.land/x/deno_dom@v0.1.35-alpha/deno-dom-wasm.ts";
import { assert } from "https://deno.land/std@0.156.0/testing/asserts.ts";

/** Functional form of `element.href` */
function resolveHref(element: Element, url: string | URL): string | undefined {
  const href = element.getAttribute("href");
  //    ^? const href: string | null
  if (!href) return undefined;
  return new URL(href, url).href;
}

function main() {
  const url = new URL("https://example.com/page/hello");

  // Imagine the following HTML came from a fetch request to the URL above:
  // const html = await (await fetch(url)).text();
  const html = `
  <!doctype html>
  <html lang="en">
  <head>
    <meta charset="utf-8" />
    <title>hello world</title>
  </head>
  <body>
    <h1>hello world</h1>
    <a class="external" href="https://en.wikipedia.org/wiki/Hello">wikipedia</a>
    <a class="relative" href="about">about</a>
    <a class="root" href="/account">account</a>
  </body>
  </html>
  `;

  const document = new DOMParser().parseFromString(html, "text/html");
  //    ^? const document: HTMLDocument | null
  assert(document, "The document could not be parsed");

  for (const className of ["external", "relative", "root"]) {
    const anchor = document.querySelector(`a.${className}`);
    //    ^? const anchor: Element | null
    assert(anchor, "Anchor element not found");

    const hrefRaw = anchor.getAttribute("href");
    //    ^? const hrefRaw: string | null

    const href = resolveHref(anchor, url);
    //    ^? const href: string | undefined

    console.log({ hrefRaw, href });
  }
}

if (import.meta.main) main();
% deno run href-example.ts
{
  hrefRaw: "https://en.wikipedia.org/wiki/Hello",
  href: "https://en.wikipedia.org/wiki/Hello"
}
{ hrefRaw: "about", href: "https://example.com/page/about" }
{ hrefRaw: "/account", href: "https://example.com/account" }

Prototype manipulation

The previous approach could become tedious if there are lots of hrefs that need to be accessed. This approach defines the href property on the prototype of a created anchor element, setting its getter and setter at the time the document is parsed.

It allows for obtaining a URL string by directly accessing the href property on an element (like in browser code), but requires using a type assertion when doing so.

href-hack.ts:

import {
  DOMParser,
  type Element,
  type HTMLDocument,
} from "https://deno.land/x/deno_dom@v0.1.35-alpha/deno-dom-wasm.ts";
import { assert } from "https://deno.land/std@0.156.0/testing/asserts.ts";

type HrefAttr = { href: string };
type ElementWithHref = Element & Partial<HrefAttr>;
type HTMLDocumentWithHref = HTMLDocument & { location: HrefAttr };

function createDocumentWithHref(
  html: string,
  url: string | URL,
): HTMLDocumentWithHref {
  const document = new DOMParser().parseFromString(html, "text/html");
  assert(document, "The document could not be parsed");

  (document as HTMLDocumentWithHref).location = new URL(url);

  const elementProto = Object.getPrototypeOf(document.createElement("a"));
  Object.defineProperty(elementProto, "href", {
    configurable: true,
    enumerable: false,
    get() {
      const baseUrl = this.ownerDocument?.location?.href as string | undefined;
      const href = this.getAttribute("href") as string | null;
      if (!(baseUrl && href)) return undefined;
      return new URL(href, baseUrl).href;
    },
    set(url: string) {
      this.setAttribute("href", url);
    },
  });

  return document as HTMLDocumentWithHref;
}

function main() {
  const url = new URL("https://example.com/page/hello");

  // Imagine the following HTML came from a fetch request to the URL above:
  // const html = await (await fetch(url)).text();
  const html = `
  <!doctype html>
  <html lang="en">
  <head>
    <meta charset="utf-8" />
    <title>hello world</title>
  </head>
  <body>
    <h1>hello world</h1>
    <a class="external" href="https://en.wikipedia.org/wiki/Hello">wikipedia</a>
    <a class="relative" href="about">about</a>
    <a class="root" href="/account">account</a>
  </body>
  </html>
  `;

  const document = createDocumentWithHref(html, url);

  for (const className of ["external", "relative", "root"]) {
    const anchor = document.querySelector(`a.${className}`);
    //    ^? const anchor: Element | null
    assert(anchor, "Anchor element not found");

    const hrefRaw = anchor.getAttribute("href");
    //    ^? const hrefRaw: string | null

    // The `.href` property doesn't exist on type Element,
    // so trying to access it will create a compiler diagnostic error:
    //
    // anchor.href;
    //        ~~~~
    // Property 'href' does not exist on type 'Element'.deno-ts(2339)

    // Instead, you must assert that the Element is type ElementWithHref:
    const href = (anchor as ElementWithHref).href;
    //                                       ^ (property) href?: string | undefined

    console.log({ hrefRaw, href });
  }
}

if (import.meta.main) main();
% deno run href-hack.ts
{
  hrefRaw: "https://en.wikipedia.org/wiki/Hello",
  href: "https://en.wikipedia.org/wiki/Hello"
}
{ hrefRaw: "about", href: "https://example.com/page/about" }
{ hrefRaw: "/account", href: "https://example.com/account" }

Both approaches result in the same outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants