Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use third-party-web's definition for 3P entities #7474

Closed
paulirish opened this issue Mar 12, 2019 · 7 comments
Closed

Use third-party-web's definition for 3P entities #7474

paulirish opened this issue Mar 12, 2019 · 7 comments

Comments

@paulirish
Copy link
Member

https://github.com/patrickhulce/third-party-web/blob/master/data/entities.json

We can use this for the 3rd party report filter and lightwallet/budgets.

Patrick said it accounts for 90% of JS execution on non-first-party origins, based on HTTPArchive data. 👍
(I imagine there may be a few blinds spots for requests that don't have JS execution.... ones that would show up in our Caching audit, e.g. We can take a look later)

@patrickhulce
Copy link
Collaborator

Just ran the numbers again, it's 88.22% of 3rd party script execution time. It's very top heavy so I've identified 99 entities that are 88.22% and the last 12% is spread across ~661 domains. If we need more coverage, I could keep going there 👍

@paulirish
Copy link
Member Author

Just ran the numbers again

can you share your bigquery scripts?

If we need more coverage, I could keep going there 👍

90% coverage sg.. Though we're using script execution time as the coverage metric. What if instead we use frequency reported in uses-long-cache-ttl details as the coverage metric? nahmean?

@patrickhulce
Copy link
Collaborator

The bigquery scripts are all in https://github.com/patrickhulce/third-party-web/tree/master/sql :)

90% coverage sg.. Though we're using script execution time as the coverage metric. What if instead we use frequency reported in uses-long-cache-ttl details as the coverage metric? nahmean?

or network-requests ;) I believe HTTPArchive even keeps network payloads in a separate table that'd be even easier to aggregate.

@paulirish
Copy link
Member Author

paulirish commented Mar 13, 2019 via email

@patrickhulce
Copy link
Collaborator

I published https://www.npmjs.com/package/third-party-web#npm-module

It exposes getEntity which you can pass a URL and get back an entity object :)

I added a lot more entities based on the network requests query, but network requests are more spread out than script execution, so we have ~72% coverage at the moment. The top 50 entities get us 68% coverage, and the next 70 only get us 4% more...

3rd parties representing 48.83 % of total requests
120 Entities representing 71.71 % of 3rd party requests
Top 50 Entities representing 68.00 % of 3rd party requests

@paulirish
Copy link
Member Author

paulirish commented Mar 15, 2019 via email

@patrickhulce
Copy link
Collaborator

Btw forgot to update here but I've also exposed slimmer version of the package.

You can do require('third-party-web/httparchive-nostats-subset') for example to get the version of the module with only the entities seen in HTTParchive without the usage stats (just entity information/domains/etc). This brings the total weight down to just about 55KB ungzipped which is a lot more tolerable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants