-
-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security 2020: additional queries #1479
Conversation
I believe that @nrllh still has some queries that will be included. I still have to check if we've covered everything from the outline, and will add anything that's missing ASAP (was busy with work last week, so apologies for the delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good.
Some comments on CSP-RO and SAFE_DIVIDE.
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, '(?i)Strict-Transport-Security')), COUNT(0)) AS pct_hsts, | ||
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, '(?i)X-Content-Type-Options')), COUNT(0)) AS pct_xcto, | ||
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, '(?i)Expect-CT')), COUNT(0)) AS pct_expectct, | ||
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy')), COUNT(0)) AS pct_csp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this will include Content-Security-Policy-Report-Only
. Maybe should do this?:
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy')), COUNT(0)) AS pct_csp, | |
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy ')), COUNT(0)) AS pct_csp, |
Also would it be interesting to have a column for CSP-RO? Or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the other should also have a space in the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No harm I guess but CSP was the only one I thought but as it has another header which starts the same. But yeah should do it for consistency (and who knows what the future will bring!)
COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername))) AS total_with_header, | ||
COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername))) / COUNT(0) AS pct, | ||
COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND STARTS_WITH(url, 'https')) / COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername))) AS pct_header_and_https, | ||
COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy')) / COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername))) AS pct_header_and_csp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here and I actually ran this to see what the difference would be and got this (1st line is CSP, 2nd is CSP-RO):
So not an insignificant difference. It's only 93.8% usage when you're saying it's 100%.
Also interesting that 10% of CSP sites use CSP-RO.
And everyone using CSP-RO is also using CSP - is no one testing any more and everyone has gone all in?
Also think we should do a SAFE_DIVIDE here. Doesn't matter for now, but some of these are very small and could go to zero in near future (I experienced this when trying to run this with a different UNNEST that had a typo!)
Here's what I ended up doing:
#standardSQL
# Analysis of how certain features influence the adoption of other features
SELECT
_TABLE_SUFFIX AS client,
headername,
COUNT(0) AS total_pages,
COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername))) AS total_with_header,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername))), COUNT(0)) AS pct,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND STARTS_WITH(url, 'https')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_https,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_csp,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy ')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_csp_space,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy[^-]')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_csp_not_dash,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Content-Security-Policy-Report-Only')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_csp_report_only,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Cross-Origin-Embedder-Policy')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_coep,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Cross-Origin-Opener-Policy')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_coop,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Cross-Origin-Resource-Policy')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_corp,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Expect-CT')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_expectct,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Feature-Policy')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_featurep,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Permissions-Policy')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_permissionsp,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Referrer-Policy')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_referrerp,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Report-To')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_reportto,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)Strict-Transport-Security')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_hsts,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)X-Content-Type-Options')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_xcto,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)X-Frame-Options')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_xfo,
SAFE_DIVIDE(COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)) AND REGEXP_CONTAINS(respOtherHeaders, '(?i)X-XSS-Protection')), COUNTIF(REGEXP_CONTAINS(respOtherHeaders, CONCAT('(?i)', headername)))) AS pct_header_and_xss,
FROM
`httparchive.summary_requests.2020_08_01_*` AS r,
UNNEST(['Content-Security-Policy', 'Content-Security-Policy-Report-Only', 'Cross-Origin-Embedder-Policy', 'Cross-Origin-Opener-Policy', 'Cross-Origin-Resource-Policy', 'Expect-CT', 'Feature-Policy', 'Permissions-Policy', 'Referrer-Policy', 'Report-To', 'Strict-Transport-Security', 'X-Content-Type-Options', 'X-Frame-Options', 'X-XSS-Protection']) AS headername
WHERE
firstHtml
GROUP BY
client,
headername
ORDER BY
client,
headername
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, perhaps we should do the same for all the headers (that is, to put a space behind it in the regex)
Here the difference between CSP and CSP-RO doesn't matter so much I think; the 1.0 just shows that for hosts that use CSP or CSP-RO, all of them have CSP or CSP-RO - it does matter for the other queries though
Is there a particular reason to include Content-Security-Policy[^-]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason to include
Content-Security-Policy[^-]
?
No. Just me testing if space or dash was better separator. Let's use space since by above that's better as can use in other headers.
D'Oh - just realised this is in Draft. Could have saved myself some time there! Consider it an early review. |
@tomvangoethem are these queries relatively stable? If so we can review and merge immediately and open new PRs for any follow up changes. |
I think they're relatively stable; will do a check tonight and make the proposed changes of @bazzadp (thanks for that!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A copy more issues with @nrllh 's additions. Small (but potentially important!) typo.
Rest of the new changes look fine.
Addressed the review comments; might need some more changes - see my responses to the unresolved comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed we probably should add space for consistency to the other headers but doesn't really matter for now, and other than that it looks good, so OK approving this.
@tomvangoethem let me know if you're good to merge? |
There were still some instances where the header needed to have a space behind it, so now I just added it everywhere. In the future, I guess we would get the headers as a separate column in JSON format, so we wouldn't have to do this "parsing" I'm good to merge! |
+1! |
Progress on #906