Skip to content

Source info service

David Megginson edited this page Jun 22, 2022 · 2 revisions

The Source info service at /api/source-info returns information about the sheets in an Excel workbook.

Invocation

The endpoint accepts a GET request with the common input options (the url parameter is the only required one) and returns a JSON object containing general information about the workbook, together with a list of JSON objects with information about each sheet.

Example

/api/source-info?url=https://example.org/workbook.xslx

JSON response

Example

{
    "url_or_filename": "https://example.org/workbook.xslx",
    "format": "XLSX",
    "sheets": [
        {
            "name": "input-quality-no-hxl",
            "is_hidden": false,
            "nrows": 5,
            "ncols": 9,
            "has_merged_cells": true,
            "is_hxlated": false,
            "header_hash": "56c6270ee039646436af590e874e6f67",
            "hashtag_hash": null
        },
        {
            "name": "input-quality-hxl",
            "is_hidden": false,
            "nrows": 6,
            "ncols": 9,
            "has_merged_cells": false,
            "is_hxlated": true,
            "header_hash": "56c6270ee039646436af590e874e6f67",
            "hashtag_hash": "3252897e927737b2f6f423dccd07ac93"
        }
    ]
}

Workbook-level properties

Property Type Description
url_or_filename string The URL of the Excel workbook (filenames are supported only in the libhxl-python command-line tools).
format string Always "XLS" or "XLSX" (other formats may be supported in the future).
sheets list One object describing each sheet, in workbook order (see below).

Sheet-level properties

Property Type Description
name string the name of the sheet.
is_hidden boolean true if the sheet is hidden in the workbook.
nrows integer Number of rows of data in the sheet.
ncols integer number of columns of data in the sheet.
has_merged_cells boolean true if the sheet contains one or more areas of merged cells (this is a quality issue for automated data processing).
is_hxlated boolean true if the sheet contains valid HXL hashtags.
header_hash string MD5 hash of the contents in the first row of the spreadsheet.
hashtag_hash string MD5 hash of the HXL headers and hashtags, or null if the sheet is not HXLated.
Clone this wiki locally