Content Data API
A data warehouse that stores content and content metrics, to help content owners measure and improve content on GOV.UK.
This repository contains:
- Extract, transform, load (ETL) processes for populating the data warehouse
- An internal tool for exploring the data (AKA the sandbox)
- An API that exposes metrics and content changes
- Data warehouse: the database where we store all the metrics.
- ETL: extract, transform, load - how we get data into the data warehouse.
- Fact: a record containing measurements/metrics
- Dimension: a characteristic that provides context for a fact (such as the time it was extracted, or the content item it belongs to)
- Star schema: The way we structure data in the data warehouse using fact and dimension tables
This is a Ruby on Rails application that stores over time performance metrics and content changes and exposes this information via an API. It is built on a PostgreSQL 9.6 database.
Running the application
See the getting started guide for instructions about setting up and running your development VM.
cd /var/govuk/govuk-puppet/development-vm bowl content-data-api
The application can be accessed from http://content-data-api.dev.gov.uk, and will be installed on port 3235 on your Dev environment.
Running the test suite
To run the test suite:
$ bundle exec rake
If you are a GOV.UK developer using the development VM, you can run the replication script to populate the database.
Run ETL processes locally
- To run the ETL process locally, you need to set up Google Analytics credentials in development.