Skip to content

Data formats

bothness edited this page Nov 4, 2025 · 23 revisions

This page describes the data formats intended to be served from the Data API.

CSV (comma-separated values)

CSV is a very common plain text format for encoding tabular data. It is compact, relatively "human-readable", and is compatible with all common spreadsheet packages and data processing tools. A typical CSV looks something like this:

Area code,Area name,Indicator,Time period,Observation value
E06000001,Hartlepool,Population count,1992,91225
E06000001,Hartlepool,Population count,1993,91081
E06000002,Middlesbrough,Population count,1992,144130
E06000002,Middlesbrough,Population count,1993,144467
E06000003,Redcar and Cleveland,Population count,1992,145721
E06000003,Redcar and Cleveland,Population count,1993,144256

CSVW (CSV on the Web)

CSVW is a standard for adding structural and descriptive metadata to CSV files in order to make them machine-readable. This metadata is stored in a separate JSON file alongside the CSV file itself. The standard is designed to be extensible and, when the appropriate fields are included, it allows data to be transformed into many other formats.

The above CSV file could be described by the following JSON:

{
  "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}],
  "dc:title": "Population count",
  "url": "population-count.csv",
  "tableSchema": {
    "columns": [
      {"titles": "Area code", "datatype": "string"},
      {"titles": "Area name", "datatype": "string"},
      {"titles": "Indicator", "datatype": "string"},
      {"titles": "Time period", "datatype": "date"},
      {"titles": "Observation value", "datatype": "number"}
    ]
  }
}

Note: This example only includes a dataset title (in this case the "dc" in dc:title refers to the Dublin Core vocabulary) and basic column descriptions. Both the dataset-level and column-level metadata can be extended allow for more meaningful processing and data linkage. This CSVW primer gives a sense of how to get more from the standard.

JSON array

In this format, each row of a dataset is formated as a keyed object within an array. The format is the most common way to parse tabular data into JSON, for example when creating charts in Javascript.

The main advantage of using JSON over CSV when storing and transmitting data is that numbers can be explicitly distinguished from strings and nulls (for illustration, the years in this example are encoded in parentheses as "strings"). However, this particular JSON-based format is quite verbose compared with others because the column keys are repeated on every row.

[
  {"areacd": "E06000001", "areanm": "Hartlepool", "indicator": "Population count", "period": "1992", "value": 91225},
  {"areacd": "E06000001", "areanm": "Hartlepool", "indicator": "Population count", "period": "1993", "value": 91081},
  {"areacd": "E06000002", "areanm": "Middlesbrough", "indicator": "Population count", "period": "1992", "value": 144130},
  {"areacd": "E06000002", "areanm": "Middlesbrough", "indicator": "Population count", "period": "1993", "value": 144467},
  {"areacd": "E06000003", "areanm": "Redcar and Cleveland", "indicator": "Population count", "period": "1992", "value": 145721}
  {"areacd": "E06000003", "areanm": "Redcar and Cleveland", "indicator": "Population count", "period": "1993", "value": 144256}
]

JSON column-oriented arrays

This format provides a less common but more space efficient way to store tabular data as JSON. Instead of repeating the column headers in for every row, the data is instead stored in arrays per column. This format can easily be parsed into a JSON array format.

{
  "areacd": ["E06000001", "E06000001", "E06000002", "E06000002", "E06000003", "E06000003"],
  "areanm": ["Hartlepool", "Hartlepool", "Middlesbrough", "Middlesbrough", "Redcar and Cleveland", "Redcar and Cleveland"],
  "indicator": ["Population count", "Population count", "Population count", "Population count", "Population count", "Population count"],
  "period": ["1992", "1992", "1992", "1993", "1993", "1993"],
  "value": [91225, 91081, 144130, 144467, 145721, 144256]
}

JSON-Stat

JSON-Stat is a machine-readable format that is ideal for storing multi-dimensional "data cubes" along with their descriptive and structural metadata. Since it is both highly space-efficient* and contains all the necessary metadata to transform into other formats, it is being used as the internal data format behind Explore Local Statistics (ELS) API.

{
  "version": "2.0",
  "class": "dataset",
  "label": "Population count",
  "id": ["areacd", "period"],
  "size": [3, 2],
  "dimension": {
    "areacd": {
      "label": "Area code",
      "category": {
        "index": {"E06000001": 0, "E06000002": 1, "E06000003": 2},
        "label" : {"E06000001": "Hartlepool", "E06000002": "Middlesbrough", "E06000003": "Redcar and Cleveland"}
      }
    },
    "period": {
      "label": "Time period",
      "category": {
        "index": {"1992": 0, "1993": 1}
      }
    }
  },
  "value": [91225, 91081, 144130, 144467, 145721, 144256]
}

*Note: The JSON-Stat format becomes more space-efficient the more dimensions a dataset has, since each possible value of each dimension only needs to be included once.

Accessible spreadsheet (ODS)

This API aims to provide an accessible, human-readable spreadsheet format that complies with the UK Government Analysis Function standards for releasing statistics in spreadsheets, and their spreadsheet accessibility checklist.

The ODS (OpenDocument Spreadsheet) file format is an open alternative to the proprietary XLSX (Excel) format, and is compatible with almost all widely used spreadsheet packages.

Clone this wiki locally