Skip to content

Dictionary of metadata

Marius Retegan edited this page Oct 18, 2023 · 4 revisions

This is version 1.0.0 of the dictionary of metadata to be used with the XAS Data Interchange (XDI) format. Each item definition includes:

  1. The name representing the datum
  2. The meaning of the datum
  3. The units of the datum
  4. The format of representing its value

Words used to signify the requirements in the specification shall follow the practice of RFC 2119.

A use of this dictionary is not compliant if it fails to satisfy one or more of the must or required level requirements presented herein.

Overview

The meaning of metadata

The purpose of this dictionary is to identify a set of metadata to be encoded in the specification of the XDI format and to assign names to each meaningful concept. This effort must take a broad view, capturing metadata concepts as broadly as they are used in the community. This effort must also be open ended in that there must be a mechanism for providing new forms of metadata not considered up front. This effort is intended to serve as the XAS metadata dictionary for other data format types, for instance a database format for libraries of XAS spectra or a hierarchical format for multi-spectral datasets.

The XDI syntax

This dictionary has been developed along with the XDI specification. Any examples given in this dictionary use the recommended XDI syntax. The metadata name consists of the capitalized namespace, followed by a dot, followed by a keyword. Here is an example: Element.symbol. When appearing in an XDI file to convey a metadata value, the line begins with a comment token and end with an end-of-line token. A colon is the delimiting token between the metadata name and its value. Here is an example:

   # Element.name: Cu

The format of the value

This section needs work.

  • Specified format
  • Word/number + units, with whitespace separating the value and the unit
  • Word/number (+ units), unit may not be required
  • Free form, test should be stored verbatim

Decisions must be made about character sets and internationalization. Among other decisions:

  1. Identification of standard units and whether units must be specified in a compliant file.
  2. Representations of numerical values and special data types like timestamps.
  3. Standards for identifying facilities and beamlines
  4. Representations of deeply nested data
  5. Empty values?
  6. Define "float" -- see IEEE 754

Explain what "free format string" means.

The dictionary

Name spaces

The purpose of namespaces is to provide sensible, widely understood, semantic groupings of defined metadata tags. All tags associated with conveying information about sample preparation and the measurement environment of the sample belong in the Sample namespace, all tags associated with the configuration of the beamline optics belong in the Beamline namespace, and so on.

Namespaces are strings composed of a subset of the ASCII character set. The first character must be a letter. The remaining characters must be letters, numbers, underscores, or dashes. Letters are ASCII 65 through 90 (A-Z) and ASCII 97-122 (a-z). Numbers are ASCII 48-57 (0-9). Underscore (_) is ASCII 95 and dash (-) is ASCII 45. The namespace *must be interpreted as case insentitive.

Here is a list of all defined semantic groupings:

  1. Facility: Tags related to the synchrotron or other facility at which the measurement was made
  2. Beamline: Tags related to the structure of the beamline and its optics
  3. Mono: Tags related to the monochromator
  4. Detector: Tags related to the details of the photon detection system
  5. Sample: Tags related to the details of sample preparation and measurment
  6. Scan: Tags related to the parameters of the scan
  7. Element: Tags related to the absorbing atom
  8. Column: Tags used for identifying the data columns and their units

Below, specific members of these namespaces are defined. The definitons are not exclusive. Other metadata can be placed in these namespaces as needed. Of course, undefined metadata are unlikely to be interpreted correctly by applications using this dictionary. Metadata added to a defined namespace must not use a defined keyword. The defined namespaces and keywords shall be interpreted without sensitivity to case.

When defined metadata are present, the units and formatting specified below must be observed.


Keywords

Keywrods are the words used to denote a specific entry in a namespace.

Keywords are strings composed of a subset of the ASCII character set. All characters must be letters, numbers, underscores, or dashes. The keyword *must be interpreted as case insentitive.


Required metadata

Three items are essential to the interchange and successful interpretation of XAS data. These are required in all files using the XDI specification.

  • The d-spacing of the monochromator. This is required to convert an abscissa expressed in angle or encoder steps into energy. Also a correction to the energy axis of measured data, which may be required in the case of a miscalibration due to inaccuracies in the translation from angular position of the monochromator to energy, would need the d-spacing. See Mono.d_spacing.

  • The element of the absorbing atom. The periodic table is replete with examples of atoms that have absorption edges with very similar edge energies. For example, the tabulated values of the Cr K edge and the Ba L1 edge are both 5989 eV. Without identification of the species of the absorbing atom and of the absorption edge measured, some data cannot cannot be unambiguously identified. See Element.symbol.

  • The absorption edge measured. See above. See Element.edge.

Most other metadata definitions that follow are optional for use with XDI. Some are recommended for use with XDI. The recommended metadata convey information that is of substantive value to the interpretation of the data.


Defined items in the Facility namespace

  • Namespace: Facility -- Tag: name

    • Description: The name of synchrotron or other X-ray facility. This is recommended for use with XDI.
    • Units: none
    • Format: string
  • Namespace: Facility -- Tag: energy

    • Description: The energy of the current in the storage ring.
    • Units: GeV, MeV
    • Format: float + units
  • Namespace: Facility -- Tag: current

    • Description: The amount of stored current in the storage ring at the beginning of the scan.
    • Units: mA, A
    • Format: float + units
  • Namespace: Facility -- Tag: source

    • Description: A string identifying the source of X-ray generation, such as "bend magnet", "undulator", or "rotating copper anode". This is recommended for use with XDI.
    • Units: none
    • Format: string

Defined items in the Beamline namespace

  • Namespace: Beamline -- Tag: name

    • Description: The name by which the beamline is known. This is recommended for use with XDI.
    • Units: none
    • Format: free format string
  • Namespace: Beamline -- Tag: collimation

    • Description: A concise statement of how beam collimation is provided
    • Units: none
    • Format: free format string
  • Namespace: Beamline -- Tag: focusing

    • Description: A concise statement about how beam focusing is provided
    • Units: none
    • Format: free format string
  • Namespace: Beamline -- Tag: harmonic_rejection

    • Description: A concise statement about how harmonic rejection is accomplished
    • Units: none
    • Format: free format string

Defined items in the Mono namespace

  • Namespace: Mono -- Tag: name

    • Description: A string identifying the material and diffracting plane or grating spacing of the monochromator
    • Units: none
    • Format: free format string
  • Namespace: Mono -- Tag: d_spacing

    • Description: The known d-spacing of the monochromator under operating conditions. This is a required parameter for use with XDI when data are specified as a function of angle or step count.
    • Units: Å
    • Format: float

This is the appropriate namespace for parameters of an energy dispersive polychromator. Such parameters may be defined in future versions of this dictionary.


Defined items in the Detector namespace

  • Namespace: Detector -- Tag: i0

    • Description: A description of how the incident flux was measured
    • Units: none
    • Format: free format string
  • Namespace: Detector -- Tag: it

    • Description: A description of how the tranmission flux was measured
    • Units: none
    • Format: free format string
  • Namespace: Detector -- Tag: if

    • Description: A description of how the fluorescence flux was measured
    • Units: none
    • Format: free format string
  • Namespace: Detector -- Tag: ir

    • Description: A description of how the reference flux was measured
    • Units: none
    • Format: free format string

(The formatting for this namespace may require attention. This was one of the areas for which James advocated the use of tables.)


Defined items in the Sample namespace

  • Namespace: Sample -- Tag: name

    • Description: A string identifying the measured sample
    • Units: none
    • Format: free format string
  • Namespace: Sample -- Tag: id

    • Description: A number or string uniquely identifying the measured sample. This is intended for interoperation with a database or with laboratory management software.
    • Units: none
    • Format: free format string
  • Namespace: Sample -- Tag: stoichiometry

  • Namespace: Sample -- Tag: prep

    • Description: A string summarizing the method of sample preparation
    • Units: none
    • Format: free format string
  • Namespace: Sample -- Tag: experimenters

    • Description: The names of the experimenters present for the measurement
    • Units: none
    • Format: free format string
  • Namespace: Sample -- Tag: temperature

    • Description: The temperature at which the sample was measured
    • Units: degrees K, degrees C
    • Format: float + units

The Sample namespace is rather open-ended. It is probably impossible to anticipate all the kinds of sample-related metadata that may be useful to attach to data. That said, it would be useful to suggest tags for a number of common kinds of extrinsic parameters.

Here are some other possible tags denoting extrinsic parameters of the experiment along the line of Sample.temperature. These may be added as defined fields in future versions of the XDI specification.

  • Sample.pressure
  • Sample.ph
  • Sample.eh
  • Sample.volume
  • Sample.porosity
  • Sample.density
  • Sample.concentration
  • Sample.resistivity
  • Sample.viscosity
  • Sample.electric_field
  • Sample.magnetic_field
  • Sample.magnetic_moment
  • Sample.crystal_structure
  • Sample.opacity
  • Sample.electrochemical_potential

Defined items in the Scan namespace

  • Namespace: Scan -- Tag: start_time

  • Namespace: Scan -- Tag: end_time

  • Namespace: Scan -- Tag: edge_energy

    • Description: The absorption edge as used in the data acquisition software.
    • Units: eV (recommended), keV, inverse Å
    • Format: float + units

This is the appropriate namespace for any parameters associated with scan parameters, such as integration times, scan boundaries, or step sizes.


Defined items in the Element namespace

  • Namespace: Element -- Tag: symbol

    • Description: The measured absorption edge. This is a required parameter for use with XDI.

    • Units: none

    • Format: one of these 118 1, 2, or 3 character strings for the standard atomic symbols (not case sensitive):

        H  He Li Be B  C  N  O  F  Ne Na Mg Al Si P  S
        Cl Ar K  Ca Sc Ti V  Cr Mn Fe Co Ni Cu Zn Ga Ge
        As Se Br Kr Rb Sr Y  Zr Nb Mo Tc Ru Rh Pd Ag Cd
        In Sn Sb Te I  Xe Cs Ba La Ce Pr Nd Pm Sm Eu Gd
        Tb Dy Ho Er Tm Yb Lu Hf Ta W  Re Os Ir Pt Au Hg
        Tl Pb Bi Po At Rn Fr Ra Ac Th Pa U  Np Pu Am Cm
        Bk Cf Es Fm Md No Lr Rf Db Sg Bh Hs Mt Ds Rg Cn
        Uut Fl Uup Lv Uus Uuo
      

      See Wikipedia's list of element symbols.

  • Namespace: Element -- Tag: edge

    • Description: The measured absorption edge. This is a required parameter for use with XDI.

    • Units: none

    • Format: one of these 28 1 or 2 character strings (not case sensitive):

        K L L1 L2 L3  M M1 M2 M3 M4 M5 N N1 N2 N3 N4 N5 N6 N7 O O1 O2 O3 O4 O5 O6 O7
      

      See table 10.10 at IUPAC notation for X-ray absorption edges for further explanation. The use of the generic edges L, M, N, and O is discouraged, but may be used for spectra spanning multiple edges.

  • Namespace: Element -- Tag: reference

    • Description: The absorption edge of the reference spectrum. This is a recommended parameter for use with XDI files containing a reference spectrum.
    • Units: none
    • Format: same as Element.symbol
  • Namespace: Element -- Tag: ref_edge

    • Description: The measured edge of the reference spectrum. This is a recommended parameter for use with XDI files containing a reference spectrum.
    • Units: none
    • Format: same as Element.edge

Defined items in the Column namespace

Items in the Column namespace describe single columns of the data table. The first column must be the energy.

All tags in the Column namespace must be integers.

  • Namespace: Column -- Tag: 1

    • Description: A description of the abscissa array for the measured data. This is recommended for use with XDI.
    • Units: eV (recommended), keV, pixel, angle in degrees, angle in radians, step
    • Format: word + units
  • Namespace: Column -- Tag: N

    • Description: A description of the Nth column of the measured data. This is recommended for use with XDI.
    • Units: as needed
    • Format: word (+ units)

The following labels are defined for common array types. Column.N items must use these labels when appropriate. The array label line at the beginning of the data section of the XDI file also must use these labels when those columns are present.

COL_LABEL    Meaning                              choice of units (if required)
--------------------------------------------------------------------------------------
energy       mono energy                          eV / keV / pixel
angle        mono angle                           degrees / radians / steps

i0           monitor intensity
itrans       transmission intensity
ifluor       fluorescence intensity
irefer       reference intensity

mutrans      mu transmission
mufluor      mu fluorescence
murefer      mu reference
normtrans    normalized mu transmission
normfluor    normalized mu fluorescence
normrefer    normalized mu reference

k            wavenumber
chi          EXAFS
chi_mag      magnitude of Filtered chi(k)
chi_pha      phase of Filtered chi(k)
chi_re       real part of Filtered chi(k)
chi_im       imaginary part of Filtered chi(k)

r            radial distance
chir_mag     magnitude of FT[chi(k)]
chir_pha     phase of FT[chi(k)]
chir_re      real part of FT[chi(k)]
chir_im      imaginary part of FT[chi(k)]

Extension fields

Metadata tags carry syntax and may carry semantics. That is, it is possible to have syntactically correct tags that have no definition. Such tags could carry information considered useful by the user or the author of software that, at some point, touches the data.

Such a tag could be an extension within an existing namespace. This has already been discussed in the context of the Sample and Scan namespaces.

Such a tag could also be part of a new namespace. One application of a new namespace would be to tie a group of metadata tags to a particular application. For example, the data processing program Athena might attach tags associated with the parameters for normalizing the data. That might look something like this:

 # Athena.pre1: -150
 # Athena.pre2: -30
 # Athena.nor1: 150
 # Athena.nor2: 800

These define the boundaries of the pre- and post-edge lines used to determine the edge step of the μ(E) spectrum.

The use of such extension tags is encouraged for authors of controls, data acquisition, data analysis, and data archiving software.

If an extension tag is not understood due its lack of defined semantics, the recommended behavior for software touching the data be to silently preserve the metadata.