Definitions of the data formats used to represent content on GOV.UK
Ruby Shell Makefile
Latest commit 6e38102 Jan 18, 2017 @danielroseman danielroseman committed on GitHub Merge pull request #483 from alphagov/fact-check-ids
Add fact_check_ids list to access_limited hash

GOV.UK content schemas

This repo contains schemas and examples of the content that uses them on GOV.UK.

The aim of it is to support 'contract testing' between the frontend and publisher apps by expressing the schema and examples in strict, machine processable form.

We use JSON Schema to define the schemas.

For each schema, there are three possible representations:

  • the 'publisher' representation, which is used when a publishing application transmits data to the content store.
  • the 'frontend' representation, which is produced by the content store when a frontend application requests data
  • the 'notification' representation, which is used when broadcasting messages about content items on the message queue



Publisher schema defined using component parts

The 'publisher' schema.json is built from several parts:

These files are stored in the govuk-content-schemas repository in the formats subdirectory. A build process (implemented using a Rakefile) combines the three component files into the final schema.json file.

The generated files are all stored in the dist subdirectory.

DO NOT EDIT FILES in the dist directory directly, instead, edit the source files in the formats directory.

In summary the folder structure is:

└── formats
    └── case_study
        ├── frontend
        │   └── schema.json
        └── publisher
            └── schema.json
├── case_study
│   ├── frontend
│   │   └── examples
│   │       ├── archived.json
│   │       ├── case_study.json
│   │       └── translated.json
│   └── publisher
│       ├── details.json
│       └── links.json
└── metadata.json

Combining files to make publisher schema

The build process generates the combined schema.json from the source files. It will write its output to the dist directory (generating any folders if needed).

Generation of frontend schemas

The output from publishing apps will be verified using the publisher schema, so we know that they will generate output which complies with that schema.

However, the frontend JSON is slightly different from the publisher JSON and so it needs a different schema.

In order to be sure that the frontend examples match up, we need to derive a frontend schema from the backend schema. This is also done as part of the standard build process.

Validation of frontend examples

To actually validate a frontend example, use the validate_examples Rake task:

$ bundle exec rake validate_examples

This will print the errors out to the console if validation fails.


A Rakefile exists which combines these scripts. It automatically re-generates the intermediate schema files and validates all the examples.

To invoke the default task just invoke rake on its own. You can delete all of the derived files and force a re-run by using rake clean build:

Magic fields

Not all fields for the frontend examples are defined in the schema, instead they're added by GovukContentSchemas::FrontendSchemaGenerator. For example:

  • base_path
  • updated_at

GovukContentSchemas::SchemaCombiner also adds a few:

  • schema_name
  • document_type