-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider factoring in meta refresh tags when calculating redirects #52
Comments
True! There are also other redirect techniques beyond meta redirects that pshtt currently can't recognize: for example, https://abcnews.go.com uses Javascript to downgrade HTTPS: <script>
if (window.location.protocol == "https:" && window.parent.location.hostname.indexOf("outbrain") == -1) {
var _sslurl = window.location.href.replace("https://", "http://");
window.location.replace(_sslurl);
window.location.href = _sslurl;
}
</script> I think the most comprehensive approach would be to use browser automation - "it's the only way to be sure." On the other hand, while that would make it easy to determine whether a site downgrades HTTPS or not, it wouldn't automatically help with the harder problem of determining why/how a site downgrades. If you want to keep this issue specifically about meta redirects, let me know, and I'll move this comment to a dedicated issue about detecting JS redirects. |
The main reason I was considering meta redirects as possible is because in theory we should already have the HTML content from our requests to the site, and no more network activity is necessary. We'd only need to run an HTML parse operation on the retrieved content. To do JS redirect detection would require (as you say) a headless browser, and potentially more network requests if the relevant JS is brought in via an external file and not an inline script. While HTML parsing isn't trivial, operating a headless browser and making arbitrary additional network requests is less appealing to me. No worries on discussing it all in this issue, IMO. |
Workflow improvements: 🦁 set-env, and 🐯 python-3.9, and 🐻 dependabot, oh my!
Not necessarily for relaxing compliance standards around using server-side 80->443 redirects, but just to detect a broader swathe of agency behavior.
For example, segurosocial.gov seems to redirect to socialsecurity.gov, but it actually uses a
<meta>
tag to do the refresh. And further, it redirects to an insecure URL:However, this doesn't show up in
pshtt
at all, so there's no way to detect this kind of thing.It'd be a new thing to look at (and parse) HTML content instead of just HTTP headers and status codes, but if it's simple enough, it may be worth it, and offering a new field or set of fields (separate from the fields there now for server redirects) for downstream tools who care about them.
The text was updated successfully, but these errors were encountered: