Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security 2024 Updated security.txt Metric #125

Merged
merged 5 commits into from
Jun 10, 2024

Conversation

JannisBush
Copy link
Contributor

Updated custom metric for HTTPArchive/almanac.httparchive.org#3604

Description of the changes:
Update the parsing of .well-known/security.txt to take all new defined fields into account, save undefined/future/custom fields and a basic parsing of whether the file is valid (required fields exist and no field that is only allowed to occur once occurs more than once).


Test websites:

} else if (line.startsWith('Expires: ')) {
data['expires'] = line.substring(9);
data['expires'].push(line.substring(9).trim());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be arrays instead of just string value?

Copy link
Contributor Author

@JannisBush JannisBush Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification states that some fields (expires and preferred-languages) are only allowed to occur once, other fields are allowed to occur several times.

By using arrays, we can easily count how often fields occur multiple times or not and for a basic validity check validate that the fields that are only allowed to occur once do not occur multiple times.
Also we are not simply choosing the first or the last one if multiple entries exist. We could also do a string concatenation in such cases, however, that would make it more difficult to split the values once again if we want to ask whether some fields occur more than once.

Facebook uses two policies for example:

                "policy": [
                    "https://www.facebook.com/whitehat/info/",
                    "https://about.meta.com/security/vulnerability-disclosure-policy"
                ],

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels quite verbose when so many fields are empty:

        "/.well-known/security.txt": {
            "found": false,
            "data": {
                "status": 500,
                "redirected": false,
                "url": "https://example.com/.well-known/security.txt",
                "signed": false,
                "contact": [],
                "expires": [],
                "encryption": [],
                "acknowledgments": [],
                "preferred_languages": [],
                "canonical": [],
                "policy": [],
                "hiring": [],
                "csaf": [],
                "other": [],
                "all_required_exist": false,
                "only_one_requirement_broken": false,
                "valid": false
            }
        },

Can we only include the fields when they are present to reduce the storage and query size?

@JannisBush
Copy link
Contributor Author

This feels quite verbose when so many fields are empty:

Can we only include the fields when they are present to reduce the storage and query size?

Don't know about the storage and query size of empty arrays. The keys are always the same so there might be some optimization possible.

However, I adapted the query to only keep non-empty fields. Hope that does not make the query more complex.

@JannisBush
Copy link
Contributor Author

JannisBush commented Jun 5, 2024

        "/.well-known/security.txt": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": false,
                "url": "https://example.com/.well-known/security.txt",
                "signed": false,
                "other": [
                    [
                        "background-color",
                        "#f0f0f2;"
                    ],
                    [
                        "margin",
                        "0;"
                    ],
                    [
                        "padding",
                        "0;"
                    ],

Was not great, as the inline CSS is detected as Other directives.
I now also save the content-type (MUST be text/plain according to the spec but unclear if all sites follow the spec, there are probably quite some sites that do not set any content-type header 🤔)
Additionally, I only save the data is the status is of type ok (r.ok has to be true).
This fixes the case of example.com which returns an HTML document with status 404, however sites that return their landing page or similar at /.well-known/security.txt with a 200 status code would still be parsed.
Unsure, how to best handle such cases without introducing false negatives.
Ideas:

  • Require content-type to start with text/plain (Misses sites that do not set any content-type header or another one)
  • Only parse directives that start with an uppercase letter (Misses sites that use lowercase directives; also could still match other things)
  • Do not allow Spaces before the directive name (Could miss "styled" security.txt files; also spaces not required in HTML)
  • Require at least one known directive to be present to save Other values (Misses all lowercase sites)
  • Abort if indicators such as <!doctype html><html> are present
  • ...
  • Maybe the current r.ok is required is already enough 🤔

Copy link

github-actions bot commented Jun 5, 2024

Custom metrics for https://almanac.httparchive.org/en/2022/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_HZ_7

Custom metrics for https://example.com/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_J4_8
Changed custom metrics values:

{
    "_well-known": {
        "/.well-known/assetlinks.json": {
            "found": false
        },
        "/.well-known/apple-app-site-association": {
            "found": false
        },
        "/.well-known/gpc.json": {
            "found": false
        },
        "/robots.txt": {
            "found": false
        },
        "/.well-known/security.txt": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": false,
                "url": "https://example.com/.well-known/security.txt",
                "content_type": "text/html; charset=UTF-8"
            }
        },
        "/.well-known/change-password": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": false,
                "url": "https://example.com/.well-known/change-password"
            }
        },
        "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
            "found": false,
            "data": {
                "status": 500,
                "redirected": false,
                "url": "https://example.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
            }
        }
    }
}
Custom metrics for https://securitytxt.org/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_8D_9
Changed custom metrics values:

{
    "_well-known": {
        "/.well-known/assetlinks.json": {
            "found": false
        },
        "/.well-known/apple-app-site-association": {
            "found": false
        },
        "/.well-known/gpc.json": {
            "found": false
        },
        "/robots.txt": {
            "found": true,
            "data": {
                "matched_disallows": {}
            }
        },
        "/.well-known/security.txt": {
            "found": true,
            "data": {
                "status": 200,
                "redirected": false,
                "url": "https://securitytxt.org/.well-known/security.txt",
                "content_type": "text/plain; charset=utf-8",
                "signed": false,
                "contact": [
                    "https://hackerone.com/ed"
                ],
                "expires": [
                    "2025-03-14T00:00:00.000Z"
                ],
                "acknowledgments": [
                    "https://hackerone.com/ed/thanks"
                ],
                "preferred_languages": [
                    "en, fr, de"
                ],
                "canonical": [
                    "https://securitytxt.org/.well-known/security.txt"
                ],
                "policy": [
                    "https://hackerone.com/ed?type=team&view_policy=true"
                ],
                "all_required_exist": true,
                "only_one_requirement_broken": false,
                "valid": true
            }
        },
        "/.well-known/change-password": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": false,
                "url": "https://securitytxt.org/.well-known/change-password"
            }
        },
        "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": false,
                "url": "https://securitytxt.org/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
            }
        }
    }
}
Custom metrics for https://facebook.com/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_BP_A
Changed custom metrics values:

{
    "_well-known": {
        "/.well-known/assetlinks.json": {
            "found": true
        },
        "/.well-known/apple-app-site-association": {
            "found": true
        },
        "/.well-known/gpc.json": {
            "found": false
        },
        "/robots.txt": {
            "found": true,
            "data": {
                "matched_disallows": {
                    "Applebot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "baiduspider": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Bingbot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Discordbot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "DuckDuckBot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "facebookexternalhit": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Googlebot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Google-Extended": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Googlebot-Image": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "GPTBot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "ia_archiver": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "LinkedInBot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "msnbot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Naverbot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Pinterestbot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Screaming Frog SEO Spider": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "seznambot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Slurp": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "teoma": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "TelegramBot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Twitterbot": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Yandex": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ],
                    "Yeti": [
                        "/login.php*&next=",
                        "/login.php/?next=",
                        "/login.php?next=",
                        "/login/*&next=",
                        "/login/?next=",
                        "/login/device-based/regular/login/*&next=",
                        "/login/device-based/regular/login/?next=",
                        "/x/oauth/"
                    ]
                }
            }
        },
        "/.well-known/security.txt": {
            "found": true,
            "data": {
                "status": 200,
                "redirected": false,
                "url": "https://www.facebook.com/.well-known/security.txt",
                "content_type": "text/plain;charset=utf-8",
                "signed": false,
                "contact": [
                    "https://www.facebook.com/whitehat/report/"
                ],
                "expires": [
                    "Thu, 04 Jul 2024 23:55:25 -0700"
                ],
                "acknowledgments": [
                    "https://www.facebook.com/whitehat/thanks/"
                ],
                "policy": [
                    "https://www.facebook.com/whitehat/info/",
                    "https://about.meta.com/security/vulnerability-disclosure-policy"
                ],
                "hiring": [
                    "https://www.metacareers.com/areas-of-work/security/"
                ],
                "all_required_exist": true,
                "only_one_requirement_broken": false,
                "valid": true
            }
        },
        "/.well-known/change-password": {
            "found": true,
            "data": {
                "status": 200,
                "redirected": true,
                "url": "https://www.facebook.com/login.php?next=https%3A%2F%2Fwww.facebook.com%2F.well-known%2Fchange-password"
            }
        },
        "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": false,
                "url": "https://www.facebook.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
            }
        }
    }
}
Custom metrics for https://slack.com

WPT test run results: http://webpagetest.httparchive.org/results.php?test=240605_21_B
Changed custom metrics values:

{
    "_well-known": {
        "/.well-known/assetlinks.json": {
            "found": true
        },
        "/.well-known/apple-app-site-association": {
            "found": true
        },
        "/.well-known/gpc.json": {
            "found": false
        },
        "/robots.txt": {
            "found": true,
            "data": {
                "matched_disallows": {
                    "*": [
                        "/oauth"
                    ]
                }
            }
        },
        "/.well-known/security.txt": {
            "found": true,
            "data": {
                "status": 200,
                "redirected": false,
                "url": "https://slack.com/.well-known/security.txt",
                "content_type": "text/plain;charset=utf-8",
                "signed": false,
                "contact": [
                    "https://hackerone.com/slack/"
                ],
                "policy": [
                    "https://hackerone.com/slack/"
                ],
                "other": [
                    [
                        "Acknowledgements",
                        "https://hackerone.com/slack/thanks"
                    ]
                ],
                "all_required_exist": false,
                "only_one_requirement_broken": false,
                "valid": false
            }
        },
        "/.well-known/change-password": {
            "found": true,
            "data": {
                "status": 200,
                "redirected": true,
                "url": "https://slack.com/signin?redir=%2Faccount%2Fsettings"
            }
        },
        "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
            "found": false,
            "data": {
                "status": 404,
                "redirected": true,
                "url": "https://slack.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200"
            }
        }
    }
}

@JannisBush
Copy link
Contributor Author

@tunetheweb Can this be merged before the crawl starts tomorrow?

As written above there might still be a a very small number of sites with incorrect "other" values.
However, I think this does not pose a major problem:

  • Such sites (Return an OK status code and show a non-security.txt file that contains lines that match our regex) should be rare and would only be a couple of text entries in an array -> no storage issues expected
  • We save the content-type and maybe we can filter out incorrect sites using that
  • All other entries except for "Other" will be correct.
  • For "Other" the main analysis will probably be which other values occurred the most often, which is not affected by this problem

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tunetheweb tunetheweb merged commit a2f3a0d into HTTPArchive:main Jun 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants