New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect third party JS libraries with custom metric #77

Closed
rviscomi opened this Issue Mar 22, 2017 · 23 comments

Comments

4 participants
@rviscomi
Member

rviscomi commented Mar 22, 2017

Using custom metric scripts, detect the presence of third party libraries and their version if available.

For example, a page may have "jquery@3.2.0,modernizr,backbone@1.3.3,...".

@tlauinger

This comment has been minimized.

Show comment
Hide comment
@tlauinger

tlauinger Mar 27, 2017

Contributor

I'm a co-author of the paper referenced in #78. Our library detection code is a fork of the library detection code used by the Library Detector for Chrome extension. We did a few modifications to the code more than a year ago, so our fork isn't really up to date any more. I compared our code to the current version today and found the following differences:

Libraries detected by the Chrome extension but not by our code: GWT, Ink, Vaadin, Zurb, Polymer, Highcharts, InfoVis, Blackbird, CreateJS, Google Maps, Spry, Qooxdoo, Ext JS, base2, closure, Processing.js, Mapbox, Sammy, Rico, MochiKit, gRaphaël (fix gRaphaël), Glow, FuseJS, Tween.js, SproutCore, Zepto.js, PhiloGL, LABjs, Head JS, ControlJS, RightJS, Pusher, Swiffy, Move, AmplifyJS, Popcorn.js, Spine, Visibility.js, IfVisible.js, DC.js, Vue, Two, Brewser, Material Design Lite, Kendo UI, Matter.js, Riot, Sea.js, ScrollMagic

Libraries not currently detected by the Chrome extension (added by us): swfobject, flexslider, moment-timezone (plug-in), json3, pep, spf, numeral

Equivalent detection code in both: Lo-Dash, Underscore (test checks that no Lo-Dash detection in window), yepnope, jQuery Tools, D3, Ember.js, Greensock JS, Isotope

Detection code better in the Chrome extension (better error handling etc.): Bootstrap, three.js (better handling of two cases), CamanJS, WebFont Loader (set version to null)

Detection code in the Chrome extension that is more restrictive than ours (fewer false positives when the version property is not extracted, but likely lower coverage of older library versions/when the additional method used in the signature isn't part of the API): FlotCharts, jQuery UI, Dojo, Prototype, Scriptaculous, MooTools, YUI 2/YUI 3, Raphaël (possibly fix ë in name), React, Modernizr, Backbone, Mustache, Fabric.js, Paper.js, Handlebars, Knockout, jQuery Mobile, Angular, Hammer.js, Velocity.js, FastClick, Marionette, Can

Detection code improved in our fork (usually added support for older versions): Leaflet (also allows win.L.VERSION, but additionally use extension restrictions on GeoJSON etc.), Socket.IO (added alternative win.io.Socket), RequireJS (seems to cover more versions), Pixi.js (better error checking), Moment.js (but add win.moment.isMoment check from extension)

How would you like to proceed to add all those libraries? We could try to have our few extra library signatures added to Library Detector; then you could simply integrate their entire file with all the library detection tests.

Contributor

tlauinger commented Mar 27, 2017

I'm a co-author of the paper referenced in #78. Our library detection code is a fork of the library detection code used by the Library Detector for Chrome extension. We did a few modifications to the code more than a year ago, so our fork isn't really up to date any more. I compared our code to the current version today and found the following differences:

Libraries detected by the Chrome extension but not by our code: GWT, Ink, Vaadin, Zurb, Polymer, Highcharts, InfoVis, Blackbird, CreateJS, Google Maps, Spry, Qooxdoo, Ext JS, base2, closure, Processing.js, Mapbox, Sammy, Rico, MochiKit, gRaphaël (fix gRaphaël), Glow, FuseJS, Tween.js, SproutCore, Zepto.js, PhiloGL, LABjs, Head JS, ControlJS, RightJS, Pusher, Swiffy, Move, AmplifyJS, Popcorn.js, Spine, Visibility.js, IfVisible.js, DC.js, Vue, Two, Brewser, Material Design Lite, Kendo UI, Matter.js, Riot, Sea.js, ScrollMagic

Libraries not currently detected by the Chrome extension (added by us): swfobject, flexslider, moment-timezone (plug-in), json3, pep, spf, numeral

Equivalent detection code in both: Lo-Dash, Underscore (test checks that no Lo-Dash detection in window), yepnope, jQuery Tools, D3, Ember.js, Greensock JS, Isotope

Detection code better in the Chrome extension (better error handling etc.): Bootstrap, three.js (better handling of two cases), CamanJS, WebFont Loader (set version to null)

Detection code in the Chrome extension that is more restrictive than ours (fewer false positives when the version property is not extracted, but likely lower coverage of older library versions/when the additional method used in the signature isn't part of the API): FlotCharts, jQuery UI, Dojo, Prototype, Scriptaculous, MooTools, YUI 2/YUI 3, Raphaël (possibly fix ë in name), React, Modernizr, Backbone, Mustache, Fabric.js, Paper.js, Handlebars, Knockout, jQuery Mobile, Angular, Hammer.js, Velocity.js, FastClick, Marionette, Can

Detection code improved in our fork (usually added support for older versions): Leaflet (also allows win.L.VERSION, but additionally use extension restrictions on GeoJSON etc.), Socket.IO (added alternative win.io.Socket), RequireJS (seems to cover more versions), Pixi.js (better error checking), Moment.js (but add win.moment.isMoment check from extension)

How would you like to proceed to add all those libraries? We could try to have our few extra library signatures added to Library Detector; then you could simply integrate their entire file with all the library detection tests.

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Mar 27, 2017

Member

Tobias, thanks for digging into this!

We could try to have our few extra library signatures added to Library Detector; then you could simply integrate their entire file with all the library detection tests.

Big +1 to this. I'd love to avoid trying to replicate efforts and it looks like library detector is a fairly active project, so everyone would benefit if we converge on improving library detector core and reusing it across projects.

Member

igrigorik commented Mar 27, 2017

Tobias, thanks for digging into this!

We could try to have our few extra library signatures added to Library Detector; then you could simply integrate their entire file with all the library detection tests.

Big +1 to this. I'd love to avoid trying to replicate efforts and it looks like library detector is a fairly active project, so everyone would benefit if we converge on improving library detector core and reusing it across projects.

@tlauinger

This comment has been minimized.

Show comment
Hide comment
Contributor

tlauinger commented Mar 27, 2017

Sounds good! See johnmichel/Library-Detector-for-Chrome#89

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Mar 28, 2017

Member

@tlauinger awesome, thanks!

Member

igrigorik commented Mar 28, 2017

@tlauinger awesome, thanks!

@tlauinger

This comment has been minimized.

Show comment
Hide comment
@tlauinger

tlauinger Mar 29, 2017

Contributor

johnmichel/Library-Detector-for-Chrome#91 has been merged; the Library Detector code now has all our additional and updated library tests (minus two that weren't that useful after all).

A few more thoughts on things that could make the data easier to work with:

  • Instead of '@', it would be better to use a more unique character (or string) to join library names and versions; one that won't appear in "strange" library names or versions, and that doesn't have any special meaning in SQL or regular expressions

  • I assume that at some point, the list of supported libraries will be updated. When analysing library use over time, it's important to distinguish "not detected because the library isn't used by the site" from "not detected because the library wasn't supported by the system at the time of the crawl". So it could be a good idea to keep a document somewhere that lists the dates when support for a library was added

  • For people who are using the library detection data, it can be helpful to document the limitations of the measurement approach:

    • detection of libraries only in the main page but not frames;
    • no detection of duplicate inclusions of a library;
    • higher potential for false positive detections when no version number was extracted;
    • version number extraction works only if the version number is exported by the library code, which libraries don't always do from the start, and sometimes they even drop support for a few versions before reintroducing it (or it's never supported at all);
    • if a website uses heavy minification with dead code removal, or if libraries are included as a "private" reference instead of being accessible as a window-global variable, detection won't work either.
    • Also there's the more general limitation that library detections don't work retroactively (only in the crawls after support was added).

    If you'd like, I could write up these points in a brief library detection README.

Contributor

tlauinger commented Mar 29, 2017

johnmichel/Library-Detector-for-Chrome#91 has been merged; the Library Detector code now has all our additional and updated library tests (minus two that weren't that useful after all).

A few more thoughts on things that could make the data easier to work with:

  • Instead of '@', it would be better to use a more unique character (or string) to join library names and versions; one that won't appear in "strange" library names or versions, and that doesn't have any special meaning in SQL or regular expressions

  • I assume that at some point, the list of supported libraries will be updated. When analysing library use over time, it's important to distinguish "not detected because the library isn't used by the site" from "not detected because the library wasn't supported by the system at the time of the crawl". So it could be a good idea to keep a document somewhere that lists the dates when support for a library was added

  • For people who are using the library detection data, it can be helpful to document the limitations of the measurement approach:

    • detection of libraries only in the main page but not frames;
    • no detection of duplicate inclusions of a library;
    • higher potential for false positive detections when no version number was extracted;
    • version number extraction works only if the version number is exported by the library code, which libraries don't always do from the start, and sometimes they even drop support for a few versions before reintroducing it (or it's never supported at all);
    • if a website uses heavy minification with dead code removal, or if libraries are included as a "private" reference instead of being accessible as a window-global variable, detection won't work either.
    • Also there's the more general limitation that library detections don't work retroactively (only in the crawls after support was added).

    If you'd like, I could write up these points in a brief library detection README.

@rviscomi

This comment has been minimized.

Show comment
Hide comment
@rviscomi

rviscomi Mar 29, 2017

Member

If you'd like, I could write up these points in a brief library detection README.

Yes please! Let's make a docs/custom-metrics.md doc. See #82 for example.

Member

rviscomi commented Mar 29, 2017

If you'd like, I could write up these points in a brief library detection README.

Yes please! Let's make a docs/custom-metrics.md doc. See #82 for example.

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Mar 29, 2017

Member

So it could be a good idea to keep a document somewhere that lists the dates when support for a library was added

Let's log the version of Library-Detector-for-Chrome in our traces. This way we can go back to the library commit logs and see what was supported vs not... Ideally, LDfC would have a release log that we can point to; otherwise we're duplicating their work. /cc @johnmichel

Member

igrigorik commented Mar 29, 2017

So it could be a good idea to keep a document somewhere that lists the dates when support for a library was added

Let's log the version of Library-Detector-for-Chrome in our traces. This way we can go back to the library commit logs and see what was supported vs not... Ideally, LDfC would have a release log that we can point to; otherwise we're duplicating their work. /cc @johnmichel

@rviscomi

This comment has been minimized.

Show comment
Hide comment
@rviscomi

rviscomi Mar 30, 2017

Member

Another thing to note is that LDfC's detection script has two kinds of output for each library: an object containing version info, or false if the version can't be detected. If a library is detected but not its version, it's the same as no library at all. In other words, it doesn't support the lib@null use case in #80.

I opened johnmichel/Library-Detector-for-Chrome#92 to explore the feasibility of adding support.

Member

rviscomi commented Mar 30, 2017

Another thing to note is that LDfC's detection script has two kinds of output for each library: an object containing version info, or false if the version can't be detected. If a library is detected but not its version, it's the same as no library at all. In other words, it doesn't support the lib@null use case in #80.

I opened johnmichel/Library-Detector-for-Chrome#92 to explore the feasibility of adding support.

@rviscomi

This comment has been minimized.

Show comment
Hide comment
@rviscomi

rviscomi Mar 30, 2017

Member

The actual integration of LDfC is something else I'd like to get feedback on. One approach would be to run a script that:

The detection object's variable name also has some kind of unique prefix; d41d8cd98f00b204e9800998ecf8427e_LibraryDetectorTests. If that happens to be a good version string, we could extract it and save it somewhere.

Member

rviscomi commented Mar 30, 2017

The actual integration of LDfC is something else I'd like to get feedback on. One approach would be to run a script that:

The detection object's variable name also has some kind of unique prefix; d41d8cd98f00b204e9800998ecf8427e_LibraryDetectorTests. If that happens to be a good version string, we could extract it and save it somewhere.

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Mar 30, 2017

Member

@johnmichel how is the var name generated? Could we define a cleaner interface for non-extension consumers of libraries.js?

test each library, using the variable name extracted above

@rviscomi not sure I understand what this step does?

Member

igrigorik commented Mar 30, 2017

@johnmichel how is the var name generated? Could we define a cleaner interface for non-extension consumers of libraries.js?

test each library, using the variable name extracted above

@rviscomi not sure I understand what this step does?

@rviscomi

This comment has been minimized.

Show comment
Hide comment
@rviscomi

rviscomi Mar 30, 2017

Member

Roughly

Object.entries(d41d8cd98f00b204e9800998ecf8427e_LibraryDetectorTests).map((name, lib) => lib.test(window))...

Written on mobile, not tested. Just an example of accessing the objects props and testing each library.

Member

rviscomi commented Mar 30, 2017

Roughly

Object.entries(d41d8cd98f00b204e9800998ecf8427e_LibraryDetectorTests).map((name, lib) => lib.test(window))...

Written on mobile, not tested. Just an example of accessing the objects props and testing each library.

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Mar 30, 2017

Member

Oh, doh.. I see, that makes sense.

Member

igrigorik commented Mar 30, 2017

Oh, doh.. I see, that makes sense.

@johnmichel

This comment has been minimized.

Show comment
Hide comment
@johnmichel

johnmichel Mar 30, 2017

@igrigorik What @rviscomi said is basically the gist of it. That unique prefix was in place before I assumed control of the project, so if it's not ideal or usable enough, I'm certainly open to ideas for a cleaner or more straightforward approach.

@igrigorik What @rviscomi said is basically the gist of it. That unique prefix was in place before I assumed control of the project, so if it's not ideal or usable enough, I'm certainly open to ideas for a cleaner or more straightforward approach.

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Mar 31, 2017

Member

@johnmichel by the looks of it, that script gets injected into the page, so I'm assuming the idea behind the unique fingerprint is to avoid collisions with other content.. Does the prefix change from release to release?

@rviscomi re, version: I guess we can pull it out from manifest.json and record it in our script?

Member

igrigorik commented Mar 31, 2017

@johnmichel by the looks of it, that script gets injected into the page, so I'm assuming the idea behind the unique fingerprint is to avoid collisions with other content.. Does the prefix change from release to release?

@rviscomi re, version: I guess we can pull it out from manifest.json and record it in our script?

@johnmichel

This comment has been minimized.

Show comment
Hide comment
@johnmichel

johnmichel Apr 3, 2017

@igrigorik The fingerprint doesn't change from release to release, so you should be able to rely on it being consistent. If it would make sense to rotate it for versioning on the httparchive side, that is also an option that could be explored as it doesn't affect anything outside of the extension.

@igrigorik The fingerprint doesn't change from release to release, so you should be able to rely on it being consistent. If it would make sense to rotate it for versioning on the httparchive side, that is also an option that could be explored as it doesn't affect anything outside of the extension.

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Apr 4, 2017

Member

@johnmichel nah, that's fine.. we're better off with the version in the manifest.

Member

igrigorik commented Apr 4, 2017

@johnmichel nah, that's fine.. we're better off with the version in the manifest.

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Apr 7, 2017

Member

Now that #85 landed, can we close this? Anything left?

Member

igrigorik commented Apr 7, 2017

Now that #85 landed, can we close this? Anything left?

@rviscomi

This comment has been minimized.

Show comment
Hide comment
@rviscomi

rviscomi Apr 7, 2017

Member

@igrigorik the final piece would be unknown version support, but that's being tracked in johnmichel/Library-Detector-for-Chrome#92 so this one can be closed with that caveat

Member

rviscomi commented Apr 7, 2017

@igrigorik the final piece would be unknown version support, but that's being tracked in johnmichel/Library-Detector-for-Chrome#92 so this one can be closed with that caveat

@rviscomi rviscomi closed this Apr 7, 2017

@igrigorik

This comment has been minimized.

Show comment
Hide comment
@igrigorik

igrigorik Apr 7, 2017

Member

Got it, thanks. We should figure out who's taking the lead on that one.. Looks like there is agreement on how we want to tackle it, but not clear who's actually doing it :)

Member

igrigorik commented Apr 7, 2017

Got it, thanks. We should figure out who's taking the lead on that one.. Looks like there is agreement on how we want to tackle it, but not clear who's actually doing it :)

@rviscomi

This comment has been minimized.

Show comment
Hide comment
@rviscomi

rviscomi Apr 7, 2017

Member

You're right. I'll take it.

Member

rviscomi commented Apr 7, 2017

You're right. I'll take it.

@rviscomi

This comment has been minimized.

Show comment
Hide comment
@rviscomi

rviscomi May 3, 2017

Member

@tlauinger @johnmichel FYI this is now live. See https://discuss.httparchive.org/t/javascript-library-detection/955. Thanks for your help making this happen!

Member

rviscomi commented May 3, 2017

@tlauinger @johnmichel FYI this is now live. See https://discuss.httparchive.org/t/javascript-library-detection/955. Thanks for your help making this happen!

@johnmichel

This comment has been minimized.

Show comment
Hide comment
@johnmichel

johnmichel May 3, 2017

@rviscomi @tlauinger @igrigorik Happy to have helped 😄 Please let me know if future enhancements are desired!

@rviscomi @tlauinger @igrigorik Happy to have helped 😄 Please let me know if future enhancements are desired!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment