Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy 2024 - Implement CCPA link metric #119

Merged
merged 7 commits into from
Jun 10, 2024

Conversation

bstandaert-wustl
Copy link
Contributor

@max-ostapenko
Copy link
Contributor

@bstandaert-wustl please move it into privacy chapter custom metric under dist/privacy.js file

@max-ostapenko
Copy link
Contributor

max-ostapenko commented May 19, 2024

Number of the websites that make sense to test for CCPA compliance is quite small compared to the dataset, particularly considering the crawlers run from Virginia, not California.

Could we define the eligible set of sites to optimize it, e.g. using locale, TLD, ..?

@pmeenan
Copy link
Member

pmeenan commented May 19, 2024

FWIW, The crawl runs from most of the US regions of Google Cloud so there's SOME California testing (~1/7 of the crawl) but it's not deterministic and hard to say how IP geolocation works for any of the regions.

@bstandaert-wustl
Copy link
Contributor Author

@max-ostapenko From https://petsymposium.org/popets/2022/popets-2022-0030.pdf:

the CCPA applies to all websites doing business in California, regardless of the domicile of their business or the language of their website.

I don't think we have a reasonable way to filter on "doing business in California". Perhaps we could rank sites by monthly traffic and take only the top fraction (since the CCPA has a minimum user threshold).

@bstandaert-wustl
Copy link
Contributor Author

Per the discussion in Slack, @UmarIqbal recommends testing this on all sites and framing it as “prevalence of enforcement mechanisms” rather than compliance: https://httparchive.slack.com/archives/C023K97SR8U/p1716560204137519

Are there any other changes you recommend making to this before the deadline?

'campaign',
'deal',
'ad choice',
'january',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the case that we need to check for month names excluded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list comes from this paper: https://petsymposium.org/popets/2022/popets-2022-0030.pdf. They ran a test crawl and then manually checked the results, so all of the exclusion phrases caused a false positive on at least one site. I don't think they identify which sites though.

dist/ccpa-link.js Outdated Show resolved Hide resolved
@tunetheweb
Copy link
Member

@max-ostapenko @bstandaert-wustl what's the latest on this? We need to merge this today if we want to include it in the Web Almanac 2024.

@bstandaert-wustl
Copy link
Contributor Author

@tunetheweb I think the remaining questions are minor - if we aren't able to resolve them before your deadline, can you go ahead and merge this?

@max-ostapenko
Copy link
Contributor

Yeah, I see no blockers. 👍

@tunetheweb
Copy link
Member

Got merge conflicts now @max-ostapenko . Can you resolve?

dist/privacy.js Show resolved Hide resolved
dist/privacy.js Outdated Show resolved Hide resolved
@max-ostapenko
Copy link
Contributor

@tunetheweb please merge

Copy link

Custom metrics for https://almanac.httparchive.org/en/2022/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240610_WJ_31

Custom metrics for https://ebay.com

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240610_6P_35
Changed custom metrics values:

{
    "_privacy": {
        "privacy_wording_links": [
            {
                "text": "Privacy"
            },
            {
                "text": "Cookies"
            },
            {
                "text": "Your Privacy Choices"
            }
        ],
        "iab_tcf_v1": {
            "present": false,
            "data": null,
            "compliant_setup": null
        },
        "iab_tcf_v2": {
            "present": false,
            "data": null,
            "compliant_setup": null
        },
        "iab_usp": {
            "present": false,
            "privacy_string": null
        },
        "navigator_doNotTrack": true,
        "navigator_globalPrivacyControl": true,
        "document_permissionsPolicy": false,
        "document_featurePolicy": true,
        "referrerPolicy": {
            "entire_document_policy": "unsafe-url",
            "individual_requests": null,
            "link_relations": null
        },
        "media_devices": {
            "navigator_mediaDevices_enumerateDevices": true,
            "navigator_mediaDevices_getUserMedia": false,
            "navigator_mediaDevices_getDisplayMedia": false
        },
        "geolocation": {
            "navigator_geolocation_getCurrentPosition": false,
            "navigator_geolocation_watchPosition": false
        },
        "request_hostnames_with_cname": {
            "www.ebay.com": [
                "e9428.a.akamaiedge.net"
            ],
            "ir.ebaystatic.com": [
                "ebaystatic.ebay.map.fastly.net"
            ],
            "i.ebayimg.com": [
                "ebayimg.map.fastly.net"
            ],
            "secureir.ebaystatic.com": [
                "e9428.a.akamaiedge.net"
            ],
            "rover.ebay.com": [
                "andes.g.ebay.com"
            ],
            "srv.main.ebayrtm.com": [
                "madronaext.g.ebay.com"
            ],
            "monitor.ebay.com": [
                "gisufespipeline22.g.ebay.com"
            ],
            "backstory.ebay.com": [
                "autotrack.g.ebay.com"
            ],
            "ib.adnxs.com": [
                "ib.anycast.adnxs.com"
            ],
            "www.ebayadservices.com": [
                "andes.g.ebay.com"
            ],
            "9fa6c6cd06cc6a66ba81515fcef4fd49.safeframe.googlesyndication.com": [
                "pagead-googlehosted.l.google.com"
            ],
            "devicebind.ebay.com": [
                "signin.g.ebay.com"
            ],
            "cdn.doubleverify.com": [
                "a1241.dsct.akamai.net"
            ],
            "tps.doubleverify.com": [
                "tps-ue1.doubleverify.com"
            ],
            "pages.ebay.com": [
                "e11847.a.akamaiedge.net"
            ],
            "src.ebay-us.com": [
                "h-ebay.online-metrix.net"
            ],
            "tpsc-ue1.doubleverify.com": [
                "tps-ue1.doubleverify.com"
            ],
            "image6.pubmatic.com": [
                "pugm-nje1.pubmnet.com"
            ],
            "ssp-sync.criteo.com": [
                "ssp-sync.us5.vip.prod.criteo.com"
            ],
            "ups.analytics.yahoo.com": [
                "ats-eks.us-east-1.dcs-online-targeting-prd.aws.oath.cloud"
            ],
            "pixel-us-west.rubiconproject.com": [
                "pixel-us-west.rubiconproject.net.akadns.net"
            ],
            "dis.criteo.com": [
                "widget.us5.vip.prod.criteo.com"
            ],
            "x.bidswitch.net": [
                "user-data-us-east.bidswitch.net"
            ]
        },
        "ccpa_link": {
            "hasCCPALink": true,
            "CCPALinkPhrases": [
                "your privacy choices"
            ]
        }
    }
}
Custom metrics for https://walmart.com

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240610_5P_39
Changed custom metrics values:

{
    "_privacy": {
        "privacy_wording_links": [
            {
                "text": "Privacy & Security"
            },
            {
                "text": "Your Privacy Choices"
            },
            {
                "text": "NV Consumer Health Data Privacy Notice"
            }
        ],
        "iab_tcf_v1": {
            "present": false,
            "data": null,
            "compliant_setup": null
        },
        "iab_tcf_v2": {
            "present": false,
            "data": null,
            "compliant_setup": null
        },
        "iab_usp": {
            "present": false,
            "privacy_string": null
        },
        "navigator_doNotTrack": true,
        "navigator_globalPrivacyControl": false,
        "document_permissionsPolicy": false,
        "document_featurePolicy": true,
        "referrerPolicy": {
            "entire_document_policy": null,
            "individual_requests": [
                {
                    "tagName": "IFRAME",
                    "referrerpolicy": "no-referrer-when-downgrade",
                    "count": 1
                }
            ],
            "link_relations": {
                "A": 12
            }
        },
        "media_devices": {
            "navigator_mediaDevices_enumerateDevices": true,
            "navigator_mediaDevices_getUserMedia": true,
            "navigator_mediaDevices_getDisplayMedia": false
        },
        "geolocation": {
            "navigator_geolocation_getCurrentPosition": false,
            "navigator_geolocation_watchPosition": false
        },
        "request_hostnames_with_cname": {
            "www.walmart.com": [
                "e4373.x.akamaiedge.net"
            ],
            "i5.walmartimages.com": [
                "e10798.x.akamaiedge.net"
            ],
            "b.wal.co": [
                "e12404.x.akamaiedge.net"
            ],
            "beacon.walmart.com": [
                "beacon-cdn.walmart.com.akadns.net"
            ],
            "tap.walmart.com": [
                "e7503.x.akamaiedge.net"
            ],
            "": [
                "aa.online-metrix.net"
            ],
            "ib.adnxs.com": [
                "ib.anycast.adnxs.com"
            ],
            "www.facebook.com": [
                "star-mini.c10r.facebook.com"
            ],
            "c.bing.com": [
                "dual-a-0034.a-msedge.net"
            ],
            "sslwidget.criteo.com": [
                "widget.us5.vip.prod.criteo.com"
            ],
            "ct.pinterest.com": [
                "prod.pinterest.global.map.fastly.net"
            ],
            "sp.analytics.yahoo.com": [
                "spdc-global.pbp.gysm.yahoodns.net"
            ],
            "fid.agkn.com": [
                "activationedge-fabrick-1457061833.us-east-1.elb.amazonaws.com"
            ],
            "drfdisvc.walmart.com": [
                "h-walmart.online-metrix.net"
            ],
            "gum.criteo.com": [
                "gum.us5.vip.prod.criteo.com"
            ]
        },
        "ccpa_link": {
            "hasCCPALink": true,
            "CCPALinkPhrases": [
                "your privacy choices"
            ]
        }
    }
}

@tunetheweb tunetheweb merged commit 1645f94 into HTTPArchive:main Jun 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants