The Site Scanning program automates a wide range of scans of public federal websites and generates data about website health and best practices.
This repository is for the technical and in-depth documentation of the program.
To ask a question or leave feedback for the program, please file an issue here or email us at site-scanning@gsa.gov.
- Program Website
- API Documentation
- Central Project Repository
- Analysis Repository
- Federal Website Index Repository
- Site Scanning Engine Repository
- Snapshots Repository
- Extensive List of Links to Technical Details, Snapshots, Analysis Reports, and More (if in doubt, look here)
- Program Overview
- Background History
- 10 Minute Walkthrough
- Questions That Site Scanning Answers
- System Schedule - When ingests, reports, scans, etc. take place each week
- Candidate Scans
- How Candidate Scans Become Active Scans
- Program Stakeholders
- Other Resources
- Data Dictionary for the Site Scanning data
- Data Dictionary for the Target URL List
- Terms
- Access the Data
- Analysis Reports: Snapshot - Primary; Snapshot - All; Federal Website Index; Federal Website Index Creation Process
- Snapshots at each stage of the target URL list generation process
- Debugging Guide; Quality Assurance Walkthrough
- Repository for storing one-off snapshots of scan data
- Sample dataset that represents different edge cases
- Snapshots that attempt to remove duplicative websites
- Publicly accessible snapshot of the data in Google Sheets