Skip to content

Latest commit

 

History

History
355 lines (272 loc) · 21.4 KB

ch03.adoc

File metadata and controls

355 lines (272 loc) · 21.4 KB

Description of the Data

The attributes described in this section are used to provide a description of the content and the units of measurement for each variable. We continue to support the use of the units and long_name attributes as defined in COARDS. We extend COARDS by adding the optional standard_name attribute which is used to provide unique identifiers for variables. This is important for data exchange since one cannot necessarily identify a particular variable based on the name assigned to it by the institution that provided the data.

The standard_name attribute can be used to identify variables that contain coordinate data. But since it is an optional attribute, applications that implement these standards must continue to be able to identify coordinate types based on the COARDS conventions.

Units

The units attribute is required for all variables that represent dimensional quantities (except for boundary variables defined in [cell-boundaries] and climatology variables defined in [climatological-statistics]). The value of the units attribute is a string that can be recognized by the UDUNITS package [UDUNITS], with a few exceptions that are given below. Note that case is significant in the units strings.

The COARDS convention prohibits the unit degrees altogether, but this unit is not forbidden by the CF convention because it may in fact be appropriate for a variable containing, say, solar zenith angle. The unit degrees is also allowed on coordinate variables such as the latitude and longitude coordinates of a transformed grid. In this case the coordinate values are not true latitudes and longitudes which must always be identified using the more specific forms of degrees as described in [latitude-coordinate] and [longitude-coordinate].

Units are not required for dimensionless quantities. A variable with no units attribute is assumed to be dimensionless. However, a units attribute specifying a dimensionless unit may optionally be included. The canonical unit (see also Section 3.3, "Standard Name") for dimensionless quantities that represent fractions, or parts of a whole, is 1. When a dimensionless quantity is a ratio of dimensional quantities, CF suggests that it may be informative to users of data if the units are given as ratio of dimensional units, for instance mg kg-1 for a mass ratio of 1e-6, or microlitre litre-1 for a volume ratio of 1e-6.

The UDUNITS package defines a few dimensionless units, such as percent, ppm (parts per million, 1e-6), and ppb (parts per billion, 1e-9). The CF convention supports dimensionless units that are UDUNITS compatible, with one exception, concerning the dimensionless units defined by UDUNITS for volume ratios, such as ppmv and ppbv. These units are allowed in the units attribute by CF only if the data variable has no standard_name. These units are prohibited by CF if there is a standard_name, because the standard_name defines whether the quantity is a volume ratio, so the units are needed only to indicate a dimensionless number.

Information describing a dimensionless physical quantity itself (e.g. "area fraction" or "probability") does not belong in the units attribute, but should be given in the long_name or standard_name attributes (see Section 3.2, "Long Name" and Section 3.3, "Standard Name"), in the same way as for physical quantities with dimensional units. As an exception, to maintain backwards compatibility with COARDS, the text strings level, layer, and sigma_level are allowed in the units attribute, in order to indicate dimensionless vertical coordinates. This use of units is not compatible with UDUNITS, and is deprecated by this standard because conventions for more precisely identifying dimensionless vertical coordinates are available (see [dimensionless-vertical-coordinate]).

The UDUNITS syntax that allows scale factors and offsets to be applied to a unit is not supported by this standard, except for case of specifying reference time, see section [time-coordinate]. The application of any scale factors or offsets to data should be indicated by the scale_factor and add_offset attributes. Use of these attributes for data packing, which is their most important application, is discussed in detail in [packed-data].

UDUNITS recognizes the following prefixes and their abbreviations.

Table 3.1. Supported Units
Factor Prefix Abbreviation Factor Prefix Abbreviation

1e1

deca,deka

da

1e-1

deci

d

1e2

hecto

h

1e-2

centi

c

1e3

kilo

k

1e-3

milli

m

1e6

mega

M

1e-6

micro

u

1e9

giga

G

1e-9

nano

n

1e12

tera

T

1e-12

pico

p

1e15

peta

P

1e-15

femto

f

1e18

exa

E

1e-18

atto

a

1e21

zetta

Z

1e-21

zepto

z

1e24

yotta

Y

1e-24

yocto

y

Long Name

The long_name attribute is defined by the NUG to contain a long descriptive name which may, for example, be used for labeling plots. For backwards compatibility with COARDS this attribute is optional. But it is highly recommended that either this or the standard_name attribute defined in the next section be provided to make the file self-describing. If a variable has no long_name attribute then an application may use, as a default, the standard_name if it exists, or the variable name itself.

Standard Name

A fundamental requirement for exchange of scientific data is the ability to describe precisely the physical quantities being represented. To some extent this is the role of the long_name attribute as defined in the NUG. However, usage of long_name is completely ad-hoc. For many applications it is desirable to have a more definitive description of the quantity, which allows users of data from different sources (some of which might be models and others observational) to determine whether quantities are in fact comparable. For this reason each variable may optionally be given a "standard name", whose meaning is defined by this convention. There may be several variables in a dataset with any given standard name, and these may be distinguished by other metadata, such as coordinates ([coordinate-types]) and cell_methods ([cell-methods]).

A standard name is associated with a variable via the attribute standard_name which takes a string value comprised of a standard name optionally followed by one or more blanks and a standard name modifier (a string value from [standard-name-modifiers]).

The set of permissible standard names is contained in the standard name table. The table entry for each standard name contains the following:

standard name

The name used to identify the physical quantity. A standard name contains no whitespace and is case sensitive.

canonical units

Representative units of the physical quantity. Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units, possibly modified by an operation specified by the standard name modifier (see below and [standard-name-modifiers]) or by the cell_methods attribute (see [cell-methods] and [appendix-cell-methods]) or both.

description

The description is meant to clarify the qualifiers of the fundamental quantities such as which surface a quantity is defined on or what the flux sign conventions are. We don’t attempt to provide precise definitions of fundumental physical quantities (e.g., temperature) which may be found in the literature. The description may define rules on the variable type, attributes and coordinates which must be complied with by any variable carrying that standard name (such as in example 3.4).

When appropriate, the table entry also contains the corresponding GRIB parameter code(s) (from ECMWF and NCEP) and AMIP identifiers.

The standard name table is located at https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml, written in compliance with the XML format, as described in [standard-name-table-format]. Knowledge of the XML format is only necessary for application writers who plan to directly access the table. A formatted text version of the table is provided at https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html, and this table may be consulted in order to find the standard name that should be assigned to a variable. Some standard names (e.g. region and area_type) are used to indicate quantities which are permitted to take only certain standard values. This is indicated in the definition of the quantity in the standard name table, accompanied by a list or a link to a list of the permitted values.

Standard names by themselves are not always sufficient to describe a quantity. For example, a variable may contain data to which spatial or temporal operations have been applied. Or the data may represent an uncertainty in the measurement of a quantity. These quantity attributes are expressed as modifiers of the standard name. Modifications due to common statistical operations are expressed via the cell_methods attribute (see [cell-methods] and [appendix-cell-methods]). Other types of quantity modifiers are expressed using the optional modifier part of the standard_name attribute. The permissible values of these modifiers are given in [standard-name-modifiers].

Example 3.1. Use of standard_name
float psl(lat,lon) ;
  psl:long_name = "mean sea level pressure" ;
  psl:units = "hPa" ;
  psl:standard_name = "air_pressure_at_sea_level" ;

The description in the standard name table entry for air_pressure_at_sea_level clarifies that "sea level" refers to the mean sea level, which is close to the geoid in sea areas.

Ancillary Data

When one data variable provides metadata about the individual values of another data variable it may be desirable to express this association by providing a link between the variables. For example, instrument data may have associated measures of uncertainty. The attribute ancillary_variables is used to express these types of relationships. It is a string attribute whose value is a blank separated list of variable names. The nature of the relationship between variables associated via ancillary_variables must be determined by other attributes. The variables listed by the ancillary_variables attribute will often have the standard name of the variable which points to them including a modifier ([standard-name-modifiers]) to indicate the relationship.

Example 3.2. Ancillary instrument data
  float q(time) ;
    q:standard_name = "specific_humidity" ;
    q:units = "g/g" ;
    q:ancillary_variables = "q_error_limit q_detection_limit" ;
  float q_error_limit(time)
    q_error_limit:standard_name = "specific_humidity standard_error" ;
    q_error_limit:units = "g/g" ;
  float q_detection_limit(time)
    q_detection_limit:standard_name = "specific_humidity detection_minimum" ;
    q_detection_limit:units = "g/g" ;

Alternatively, ancillary_variables may be used as status flags indicating the operational status of an instrument producing the data or as quality flags indicating the results of a quality control test, or some other quantitative quality assessment, performed against the measurements contained in the source variable. In these cases, the flag variable will include a standard name that differs from that of the source variable and indicates the specific type of flag the variable represents.

The standard names table includes many names intended to be used in this situation, both general names meant to be used to flexibly represent any type of status or quality assessment, as well as names for specific quality control tests commonly applied to geophysical phenomena timeseries data. Several examples are listed below:

Sample flag variable standard names:
  • status_flag and quality_flag: general flag categories for instrument status or quality assessment

  • climatology_test_quality_flag, flat_line_test_quality_flag, gap_test_quality_flag, spike_test_quality_flag: a subset of standard name flags used to indicate the results of commonly-used geophysical timeseries data quality control tests (consult the standard names table for a full list of published flags)

  • aggregate_quality_flag: flag indicating an aggregate summary of all quality tests performed on the data variable, both automated and manual (i.e. a master quality flag for a particular variable)

The following example illustrates the use of three of these flags to represent two independent quality control tests and an aggregate flag that combines the results of the two tests.

Example 3.3. Ancillary quality flag data
float salinity(time, z);
        salinity:units = "1";
        salinity:long_name = "Salinity";
        salinity:standard_name = "sea_water_practical_salinity";
        salinity:ancillary_variables = "salinity_qc_generic salinity_qc_flat_line_test salinity_qc_agg";

    int salinity_qc_generic(time, z);
        salinity_qc_generic:long_name = "Salinity Generic QC Process Flag";
        salinity_qc_generic:standard_name = "quality_flag";

    int salinity_qc_flat_line_test(time, z);
        salinity_qc_flat_line_test:long_name = "Salinity Flat Line Test Flag";
        salinity_qc_flat_line_test:standard_name = "flat_line_test_quality_flag";

    int salinity_qc_agg(time, z);
        salinity_qc_agg:long_name = "Salinity Aggregate Flag";
        salinity_qc_agg:standard_name = "aggregate_quality_flag";

Note that the ancillary variables in this example are simplified to exclude flag_values, flag_masks and flag_meanings attributes described in Section 3.5, "Flags" that they would ordinarily require

Flags

The attributes flag_values, flag_masks and flag_meanings are intended to make variables that contain flag values self describing. Status codes and Boolean (binary) condition flags may be expressed with different combinations of flag_values and flag_masks attribute definitions.

The flag_values and flag_meanings attributes describe a status flag consisting of mutually exclusive coded values. The flag_values attribute is the same type as the variable to which it is attached, and contains a list of the possible flag values. The flag_meanings attribute is a string whose value is a blank separated list of descriptive words or phrases, one for each flag value. Each word or phrase should consist of characters from the alphanumeric set and the following five: '_', '-', '.', '+', '@'. If multi-word phrases are used to describe the flag values, then the words within a phrase should be connected with underscores. The following example illustrates the use of flag values to express a speed quality with an enumerated status code.

Example 3.4. A flag variable, using flag_values
  byte current_speed_qc(time, depth, lat, lon) ;
    current_speed_qc:long_name = "Current Speed Quality" ;
    current_speed_qc:standard_name = "status_flag" ;
    current_speed_qc:_FillValue = -128b ;
    current_speed_qc:valid_range = 0b, 2b ;
    current_speed_qc:flag_values = 0b, 1b, 2b ;
    current_speed_qc:flag_meanings = "quality_good sensor_nonfunctional
                                      outside_valid_range" ;

Note that the data variable containing current speed has an ancillary_variables attribute with a value containing current_speed_qc.

The flag_masks and flag_meanings attributes describe a number of independent Boolean conditions using bit field notation by setting unique bits in each flag_masks value. The flag_masks attribute is the same type as the variable to which it is attached, and contains a list of values matching unique bit fields. The flag_meanings attribute is defined as above, one for each flag_masks value. A flagged condition is identified by performing a bitwise AND of the variable value and each flag_masks value; a non-zero result indicates a true condition. Thus, any or all of the flagged conditions may be true, depending on the variable bit settings. The following example illustrates the use of flag_masks to express six sensor status conditions.

Example 3.5. A flag variable, using flag_masks
  byte sensor_status_qc(time, depth, lat, lon) ;
    sensor_status_qc:long_name = "Sensor Status" ;
    sensor_status_qc:standard_name = "status_flag" ;
    sensor_status_qc:_FillValue = 0b ;
    sensor_status_qc:valid_range = 1b, 63b ;
    sensor_status_qc:flag_masks = 1b, 2b, 4b, 8b, 16b, 32b ;
    sensor_status_qc:flag_meanings = "low_battery processor_fault
                                      memory_fault disk_fault
                                      software_fault
                                      maintenance_required" ;

A variable with standard name of region, area_type or any other standard name which requires string-valued values from a defined list may use flags together with flag_values and flag_meanings attributes to record the translation to the string values. The following example illustrates this using integer flag values for a variable with standard name region and flag_values selected from the standardized region names (see section 6.1.1).

Example 3.6. A region variable, using flag_values
int basin(lat, lon);
       standard_name: region;
       flag_values: 1, 2, 3;
       flag_meanings:"atlantic_arctic_ocean indo_pacific_ocean global_ocean";
data:
   basin: 1, 1, 1, 1, 2, ..... ;

The flag_masks, flag_values and flag_meanings attributes, used together, describe a blend of independent Boolean conditions and enumerated status codes. The flag_masks and flag_values attributes are both the same type as the variable to which they are attached. A flagged condition is identified by a bitwise AND of the variable value and each flag_masks value; a result that matches the flag_values value indicates a true condition. Repeated flag_masks define a bit field mask that identifies a number of status conditions with different flag_values. The flag_meanings attribute is defined as above, one for each flag_masks bit field and flag_values definition. Each flag_values and flag_masks value must coincide with a flag_meanings value. The following example illustrates the use of flag_masks and flag_values to express two sensor status conditions and one enumerated status code.

Example 3.7. A flag variable, using flag_masks and flag_values
  byte sensor_status_qc(time, depth, lat, lon) ;
    sensor_status_qc:long_name = "Sensor Status" ;
    sensor_status_qc:standard_name = "status_flag" ;
    sensor_status_qc:_FillValue = 0b ;
    sensor_status_qc:valid_range = 1b, 15b ;
    sensor_status_qc:flag_masks = 1b, 2b, 12b, 12b, 12b ;
    sensor_status_qc:flag_values = 1b, 2b, 4b, 8b, 12b ;
    sensor_status_qc:flag_meanings =
         "low_battery
          hardware_fault
          offline_mode calibration_mode maintenance_mode" ;

In this case, mutually exclusive values are blended with Boolean values to maximize use of the available bits in a flag value. The table below represents the four binary digits (bits) expressed by the sensor_status_qc variable in the previous example.

Bit 0 and Bit 1 are Boolean values indicating a low battery condition and a hardware fault, respectively. The next two bits (Bit 2 and Bit 3) express an enumeration indicating abnormal sensor operating modes. Thus, if Bit 0 is set, the battery is low and if Bit 1 is set, there is a hardware fault - independent of the current sensor operating mode.

Table 3.2. Flag Variable Bits (from Example)
Bit 3 (MSB) Bit 2 Bit 1 Bit 0 (LSB)

H/W Fault

Low Batt

The remaining bits (Bit 2 and Bit 3) are decoded as follows:

Table 3.3. Flag Variable Bit 2 and Bit 3 (from Example)
Bit 3 Bit 2 Mode

0

1

offline_mode

1

0

calibration_mode

1

1

maintenance_mode

The "12b" flag mask is repeated in the sensor_status_qc flag_masks definition to explicitly declare the recommended bit field masks to repeatedly AND with the variable value while searching for matching enumerated values. An application determines if any of the conditions declared in the flag_meanings list are true by simply iterating through each of the flag_masks and AND’ing them with the variable. When a result is equal to the corresponding flag_values element, that condition is true. The repeated flag_masks enable a simple mechanism for clients to detect all possible conditions.