New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend JS Detection beyond global variables? #2450

Open
developit opened this Issue Sep 12, 2018 · 5 comments

Comments

Projects
None yet
5 participants
@developit

developit commented Sep 12, 2018

Hi there! 馃憢

JavaScript libraries bundled via Webpack or Rollup generally don't expose or leak any variables into global scope. Given the prevalence of bundling, this seems like it might make a good case for extending Wappalyzer's current JS detection approach to allow more than global nested property access.

I'd love to help out with this, assuming we can agree on a detection strategy that works across the existing 3 drivers. For the DOM drivers (extension and bookmarklet), it seems like a TreeWalker strategy would be useful. This would allow detection to define DOM properties or attributes that signal the presence of Elements rendered by a particular framework or library.

In terms of an implementation, there are two possible approaches one could take. First, a single complete TreeWalker pass could invoke a precomputed list of DOM property detection rules derived from the full Wappalyzer detection config. This has the advantage of using only a single TreeWalker, but the drawback is that a complete traversal is always required in order to ensure there are no false negatives.

A second approach (and one I'm curious to benchmark) would be to create a TreeWalker for each detection rule. This specialized traversal using the filter predicate would potentially be faster, and has the advantage of being able to traverse only until the first match is discovered.

Here's a rough sketch of what the first approach might look like:

const rules = {
  preact: {
    dom: node =>  '__preactattr_' in node
  },
  vue: {
    dom: node => '_isVue' in node
  }
];

executeDomFilters(rules);

function executeDomFilters(rules) {
  const getDomFilter = name => rules[name].dom;
  const names = Object.keys(rules).filter(getDomFilter);
  const filters = names.map(getDomFilter);
  const found = [];
  function testNode(node) {
    for (let i=0; i<names.length; i++) {
      const name = names[i];
      const filter = filters[i];
      if (filter(node)) {
        found.push(name);
        names.splice(i, 1);
        filters.splice(i, 1);
      }
    }
  }
  const treeWalker = document.createTreeWalker(document.body, 3, testNode);
  while (names.length > 0 && treeWalker.nextNode());
}

/cc @igrigorik @rviscomi

@igrigorik

This comment has been minimized.

Contributor

igrigorik commented Sep 19, 2018

@AliasIO wdyt? Would love to see this happen.

@jvoisin

This comment has been minimized.

Contributor

jvoisin commented Sep 19, 2018

I'm a bit afraid of the performance issues :/

@justinfagnani

This comment has been minimized.

justinfagnani commented Sep 21, 2018

@developit make sure any approach traverses into ShadowRoots. And of course this won't work with closed ShadowRoots at all.

@AliasIO

This comment has been minimized.

Owner

AliasIO commented Nov 30, 2018

Sorry for the late reply. This is interesting. For a while now I've been thinking of using query selectors instead of regular expressions for HTML but it looks like the TreeWalker API is better suited? I'm not familiar with it.

@developit

This comment has been minimized.

developit commented Nov 30, 2018

@AliasIO I think the key is that TreeWalker provides the ability to query arbitrary element properties rather than just attributes. In the example above, neither __preactattr_ or __isVue could be detected using querySelector since they are not exposed as attributes.

@justinfagnani no solution will work with closed shadow roots, since the detector is injected too late to patch Element.prototype.attachShadow and only has access to DOM APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment