Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Weigh how to frame total number of .gov sites #151
From the About This Site section,
But actually many of those redirect or are not live sites but rather reserved domains that aren't in use. The actual number number of live second level domains at the federal level is ~700-800. Given what that paragraph is trying to say, it might be more helpful to use that number instead of 1,350.
It will be hard at this stage to programmatically represent this so that it's always live. Instead, my suggestion would be that we take the current number and frame something like this,
I've completed the scan. Instead of using the official executive-only list of just under 1200 domains that OGP publishes on Data.gov, I took the full list of 5000+ domains and whittled it down by removing classes of domains, which gives me 1,210 domains. I did that because I'd prefer to err on the side of too many domains.
I'm not sure why we say "1350 domains" -- perhaps we were including legislative and judicial domains in that number when we came up with it? I no longer remember.
The scan produced these numbers:
However, I decided to dig into those 214 "not live" domains a bit more, and found that some of them did in fact work when I typed them into my browser. Some demonstrated strange behavior, or were intelligence community login portals that required a Department of Defense computer to securely access.
Some had misconfigured servers that just barely allowed desktop browsers to load the content, but which broke other non-browser tools, like site-inspector. (Some would likely not work on some mobile browsers, either.) Also, some were just intermittently down, or placeholder pages for shut-down websites.
Altogether, there were something like 40+ domains that probably should be considered "live" but which aren't being properly detected during my automated scans. However, many of these sites are also not appropriate for the DAP. It's also the case that some of the ones detected as DAP-eligible would actually be ineligible, upon manual scrutiny, so there's likely some compensation in the other direction too.
The truth is that coming up with an ironclad exact number for DAP-eligible domains is just not feasible. There's just too much cruft and entropy at the margins of the .gov space. And of course, since this isn't analyzing subdomains, the whole enterprise is doomed to be very imprecise from the get go.
I've documented and reported a major source of the entropy, the need for "AIA fetching" to accommodate .gov domains that have incomplete certificate chains, but am not sure exactly where/how to implement that.
And if we're really trying to gauge DAP eligibility, then we probably need to incorporate some idea of whether the domain is functionally appropriate. National security sites, live domains with only placeholder content, intelligence community login portals -- and potentially other kinds of login portals with no public content -- likely aren't going to meet that standard.
In the meantime, defensible descriptions include "around 750", "700-800", or "over 700". I'd pick one of these and leave it until the metrics for gauging DAP eligibility are more clear.
@gbinal I think instead we should just remove the denominator entirely, and just link to the CSV of participating sites. There's no denominator we can stand behind as an excellent measure of DAP adoption -- even beyond all the weirdness I documented above, the lack of measuring subdomains renders it a blunt instrument that in many cases is only measuring the tip of the iceberg.