HTTP Archive - Guided Tour
The HTTP Archive contains a tremendous amount of information that can be used to understand the evolution of the web. And since the raw data is available in Google BigQuery, you can start digging into it with a minimal amount of setup!
If you are new to BigQuery, then the Getting Started guide will walk you through the basic setup. That guide ends with a sample query that explores MIME types from the
summary_pages tables. In this guide, we'll explore more of the tables and build additional queries that you can learn from. The easiest way to get started is by following along, testing some of the queries and learning from them. If you need any help then there is plenty of support available from the community at https://discuss.httparchive.org.
Prerequisites: This guide assumes that you've completed the setup from the Getting Started guide. It also assumes some familiarity with SQL. All of the examples provided will be using Standard SQL. If you are looking to adapt older HTTP Archive queries, written in Legacy SQL, then you may find this migration guide helpful.
This guide is split into multiple sections, each one focusing on different tables in the HTTP Archive. Each section builds on top of the previous one.
- Exploring the Summary Pages Tables
- Exploring the Summary Requests Tables
- JOINing Summary_Pages and Summary_Requests Tables
- Exploring the Pages Tables (Coming Soon)
- Exploring the Requests Tables (Coming Soon)
- Exploring the Lighthouse Tables (Coming Soon)
- Exploring the Response Bodies Tables (Coming Soon)