-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weigh how to frame total number of .gov sites #151
Comments
Perhaps "functioning domains" is a phrase that would be both accurate and descriptive. |
It will be hard at this stage to programmatically represent this so that it's always live. Instead, my suggestion would be that we take the current number and frame something like this, |
+1 to that. this needs to be fixed. |
Update: we're running a fresh scan and should have updated numbers today. |
I've completed the scan. Instead of using the official executive-only list of just under 1200 domains that OGP publishes on Data.gov, I took the full list of 5000+ domains and whittled it down by removing classes of domains, which gives me 1,210 domains. I did that because I'd prefer to err on the side of too many domains. I'm not sure why we say "1350 domains" -- perhaps we were including legislative and judicial domains in that number when we came up with it? I no longer remember. I then ran these 1,210 through a tool called The scan produced these numbers:
However, I decided to dig into those 214 "not live" domains a bit more, and found that some of them did in fact work when I typed them into my browser. Some demonstrated strange behavior, or were intelligence community login portals that required a Department of Defense computer to securely access. Some had misconfigured servers that just barely allowed desktop browsers to load the content, but which broke other non-browser tools, like site-inspector. (Some would likely not work on some mobile browsers, either.) Also, some were just intermittently down, or placeholder pages for shut-down websites. Altogether, there were something like 40+ domains that probably should be considered "live" but which aren't being properly detected during my automated scans. However, many of these sites are also not appropriate for the DAP. It's also the case that some of the ones detected as DAP-eligible would actually be ineligible, upon manual scrutiny, so there's likely some compensation in the other direction too. The truth is that coming up with an ironclad exact number for DAP-eligible domains is just not feasible. There's just too much cruft and entropy at the margins of the .gov space. And of course, since this isn't analyzing subdomains, the whole enterprise is doomed to be very imprecise from the get go. I've documented and reported a major source of the entropy, the need for "AIA fetching" to accommodate .gov domains that have incomplete certificate chains, but am not sure exactly where/how to implement that. And if we're really trying to gauge DAP eligibility, then we probably need to incorporate some idea of whether the domain is functionally appropriate. National security sites, live domains with only placeholder content, intelligence community login portals -- and potentially other kinds of login portals with no public content -- likely aren't going to meet that standard. In the meantime, defensible descriptions include "around 750", "700-800", or "over 700". I'd pick one of these and leave it until the metrics for gauging DAP eligibility are more clear. |
That was a very enjoyable read. |
I would suggest this: |
@gbinal I think instead we should just remove the denominator entirely, and just link to the CSV of participating sites. There's no denominator we can stand behind as an excellent measure of DAP adoption -- even beyond all the weirdness I documented above, the lack of measuring subdomains renders it a blunt instrument that in many cases is only measuring the tip of the iceberg. |
+1 The dash has spurred a lot of interest, and we can focus on the continued growth of the participants, rather than using resources to continuously define a denominator. |
Addressed by b903432 |
From the About This Site section,
Currently, the Digital Analytics Program collects web traffic from almost 300 executive branch government domains, including every cabinet department, out of about 1,350 domains total.
But actually many of those redirect or are not live sites but rather reserved domains that aren't in use. The actual number number of live second level domains at the federal level is ~700-800. Given what that paragraph is trying to say, it might be more helpful to use that number instead of 1,350.
The text was updated successfully, but these errors were encountered: