conversionintro.txt

// Copyright (c) 2019 The Khronos Group Inc.
// Copyright notice at https://www.khronos.org/registry/speccopyright.html

== Introduction to color conversions
=== Color space composition

A ``color space'' determines the meaning of decoded numerical
color values: that is, it is distinct from the bit patterns,
compression schemes and locations in memory used to store the data.

A color space consists of three basic components:

* <<TRANSFER_CONVERSION,Transfer functions>> define the
  relationships between linear intensity and linear numbers in
  the encoding scheme.
  Since the human eye's sensitivity to changes in intensity is
  non-linear, a non-linear encoding scheme typically allows
  improved visual quality at reduced storage cost.
** An opto-electrical transfer function (OETF) describes the
   conversion from ``scene-referred'' normalized linear light
   intensity to a (typically) non-linear electronic representation.
   The inverse function is written ``OETF^&#160;-1^''.
** An electro-optical transfer function (EOTF) describes the
   conversion from the electronic representation to
   ``display-referred'' normalized linear light intensity in
   the display system.
   The inverse function is written ``EOTF^&#160;-1^''.
** An opto-optical transfer function (OOTF) describes the
   relationship between the linear scene light intensity and
   linear display light intensity: OOTF(x) = EOTF(OETF(x)).
   OETF&#160;=&#160;EOTF^&#160;-1^ and
   EOTF&#160;=&#160;OETF^&#160;-1^ only if the OOTF is linear.
** Historically, a non-linear transfer function has been implicit
   due to the non-linear relationship between voltage and intensity
   provided by a CRT display.
   In contrast, many computer graphics applications are best
   performed in a representation with a linear relationship to
   intensity.
** Use of an incorrect transfer function can result in images
   which have too much or too little contrast or saturation,
   particularly in mid-tones.
* <<PRIMARY_CONVERSION,Color primaries>> define the spectral
  response of a ``pure color'' in an additive color model -
  typically, what is meant by ``red'', ``green'' and ``blue''
  for a given system, and (allowing for the relative intensity
  of the primaries) consequently define the system's white
  balance.
** These primary colors might refer to the wavelengths emitted
   by phosphors on a CRT, transmitted by filters on an LCD for a
   given back-light, or emitted by the LED sub-pixels of an OLED.
   The primaries are typically defined in terms of a reference
   display, and represent the most saturated colors the display
   can produce, since other colors are by definition created
   by combining the primaries.
   The definition usually describes a relationship to the
   responses of the human visual system rather than a full
   spectrum.
** Use of incorrect primaries introduces a shift of hue, most
   visible in saturated colors.

<<<
* <<MODEL_CONVERSION,Color models>> describe the distinction
  between a color representation and additive colors.
  Since the human visual system treats differences in absolute
  intensity differently from differences in the spectrum
  composing a color, many formats benefit from transforming
  the color representation into one which can separate these
  aspects of color.
  Color models are frequently ``named'' by listing their
  component color channels.
** For example, a color model might directly represent additive
   primaries (_RGB_), simple color difference values
   (_Y&prime;C~B~C~R~_ -- colloquially _YUV_), or
   separate hue, saturation and intensity (_HSV_/_HSL_).
** Interpreting an image with an incorrect color model typically
   results in wildly incorrect colors: a (0,0,0) triple in an
   _RGB_ additive color model typically represents black, but
   may represent white in _CMYK_, or saturated green in color
   difference models.

=== Operations in a color conversion

Conversion between color representations may require a number of
separate conversion operations:

* Conversion between representations with different
  <<PRIMARY_CONVERSION,color primaries>> can be performed directly.
  If the input and output of the conversion do not share the same
  color primaries, this transformation forms the ``core'' of the
  conversion.

* The color primary conversion operates on linear _RGB_
  additive color values; if the input or output are not defined in
  linear terms but with a non-linear <<TRANSFER_CONVERSION,transfer
  function>>, any color primary conversion must be ``wrapped'' with
  any transfer functions; conventionally, non-linear _RGB_
  values are written _R&prime;G&prime;B&prime;_.

* If the input or output <<MODEL_CONVERSION,color model>> is not
  defined in terms of additive primaries (for example,
  _Y&prime;C~B~C~R~_ -- colloquially known as _YUV_), the model
  conversion is applied to the non-linear _R&prime;G&prime;B&prime;_
  values; the _Y&prime;~C~C&prime;~CB~C&prime;~CR~_ and _IC~T~C~P~_
  color models are created from both linear and non-linear
  _RGB_.

* Converting numerical values stored in memory to the representation
  of the color model may itself require additional operations - in
  order to remove dependence on bit depth, all the formulae described
  here work with continuous natural numbers, but some common in-memory
  <<CONVERSION_QUANTIZATION, quantization schemes>> must often be
  applied.

Details of these conversion operations are described in the following
chapters.

NOTE: As described in the License Information at the start of
this document, the Khronos Data Format Specification does
not convey a right to implement the operations it describes.
This is particularly true of the conversion formulae in the
following sections, whose inclusion is purely informative.
Please refer to the originating documents and the bodies
responsible for the standards containing these formulae for
the legal framework required for implementation.


<<<

Common cases such as converting a _Y&prime;C~B~C~R~_ image
encoded for 625-line <<bt601,BT.601>> to a _Y&prime;C~B~C~R~_
image encoded for <<bt709,BT.709>> can involve multiple costly
operations.
An example is shown in the following diagram, which represents
sampling from a _Y&prime;C~B~C~R~_ texture in one color space,
and the operations needed to generate a different set of
_Y&prime;C~B~C~R~_ values representing the color of the sample
position in a different color space:

[[conversionexample]]
.Example sampling in one space and converting to a different space
image::images/colorconversion_accurate.{svgpdf}[width="{svgpdf@pdf:475pt:576}",align="center"]

In this diagram, non-linear luma _Y&prime;_ channels are shown
in black and white, color difference _C~B~_/_C~R~_
channels are shown with the colors at the extremes of their range, and
color primary channels are shown as the primary color and black.
Linear representations are shown diagonally divided by a straight line;
non-linear representations are shown with a gamma curve.
The luma and color difference representation is discussed in
<<MODEL_YUV>>.
The interpretation of color primaries is discussed in
<<PRIMARY_CONVERSION>>.
Non-linear transfer functions are described in <<TRANSFER_CONVERSION>>.
As described below, the diagram shows a 2{times}3 grid of
input chroma texel values, corresponding to a 4{times}6 grid of
luma texel values, since the chroma channels are stored at half
the horizontal and half the vertical resolution of the luma
channel (i.e. in ``4:2:0'' representation).
Grayed-out texel values do not contribute to the final output, and are
shown only to indicate relative alignment of the coordinates.

<<<

The stages numbered in <<conversionexample>> show the following operations:

. Arranging the channels from the representation correctly for the
  conversion operations (a ``swizzle'').
  In this example, the implementation requires that the _C~B~_
  and _C~R~_ values be swapped.

. Range expansion to the correct range for the values in the color
  model (handled differently, for example, for ``<<QUANTIZATION_FULL,full>>''
  and ``<<QUANTIZATION_NARROW,narrow>>'' ranges); in this example, the result
  is to increase the effective dynamic range of the encoding: contrast and
  saturation are increased.
+
In this example, operations 1 and 2 can be combined into a single
sparse matrix multiplication of the input channels, although actual
implementations may wish to take advantage of the sparseness.

. Reconstruction to full resolution of channels which are not at the
  full sampling resolution (``chroma reconstruction''), for example by
  replication or interpolation at the sites of the luma samples, allowing
  for the chroma sample positions; this example assumes that the chroma
  samples are being reconstructed through linear interpolation.
  In the diagram, sample positions for each channel are shown as green
  dots, and each channel corresponds to the same region of the image;
  in this example, the chroma samples are located at the horizontal and
  vertical midpoint of quads of luma samples, but different standards
  align the chroma samples differently.
  Note that interpolation for channel reconstruction necessarily happens
  in a non-linear representation for color difference representations
  such as _Y&prime;C~B~C~R~_: creating a linear representation would
  require converting to _RGB_, which in turn requires a full
  set of _Y&prime;C~B~C~R~_ samples for a given location.

. Conversion between color models -- in this example, from non-linear
  _Y&prime;C~B~C~R~_ to non-linear _R&prime;G&prime;B&prime;_.
  For example, the conversion might be that between BT.601
  _Y&prime;C~B~C~R~_ and BT.601 non-linear _R&prime;G&prime;B&prime;_
  described in <<MODEL_BT601>>.
  For _Y&prime;C~B~C~R~_ to _R&prime;G&prime;B&prime;_, this
  conversion is a sparse matrix multiplication.

. Application of a transfer function to convert from non-linear
  _R&prime;G&prime;B&prime;_ to linear _RGB_, using the
  color primaries of the input representation.
  In this case, the conversion might be the EOTF^&#160;-1^ described
  in <<TRANSFER_ITU>>.
+
The separation of stages 4 and 5 is specific to the _Y&prime;C~B~C~R~_
to _R&prime;G&prime;B&prime;_ color model conversion.
Other representations such as _Y&prime;~C~C&prime;~BC~C&prime;~RC~_ and
_IC~T~C~P~_ have more complex interactions between the color
model conversion and the transfer function.

. Interpolation of linear color values at the sampling position shown
  with a magenta cross according to the chosen sampling rules.

. Convert from the color primaries of the input representation to the
  desired color primaries of the output representation, which is
  a matrix multiplication operation.
  Conversion from linear BT.601 EBU primaries to BT.709
  primaries, as described in <<PRIMARIES_BT601_EBU>> and
  <<PRIMARIES_BT709>>.

. Convert from the linear _RGB_ representation using the
  target primaries to a non-linear _R&prime;G&prime;B&prime;_
  representation, for example the OETF described in <<TRANSFER_ITU>>.

. Conversion from non-linear _R&prime;G&prime;B&prime;_ to the
  _Y&prime;C~B~C~R~_ color model, for example as defined
  in as defined in <<MODEL_BT709>>
  (a matrix multiplication).

If the output is to be written to a frame buffer with reduced-resolution
chroma channels, chroma values for multiple samples need to be combined.
Note that it is easy to introduce inadvertent chroma blurring in this
operation if the source space chroma values are generated by interpolation.

In this example, generating the four linear _RGB_ values
required for linear interpolation at the magenta cross position
requires _six_ chroma samples.
In the example shown, all four _Y&prime;_ values fall between the
same two chroma sample centers on the horizontal axis, and therefore
recreation of these samples by linear blending on the horizontal axis
only requires two horizontally-adjacent samples.
However, the upper pair of _Y&prime;_ values are sited above
the sample position of the middle row of chroma sample centers, and
therefore reconstruction of the corresponding chroma values requires
interpolation between the upper four source chroma values.
The lower pair of _Y&prime;_ values are sited below the sample
position of the middle row of chroma sample centers, and
therefore reconstruction of the corresponding chroma values requires
interpolation between the lower four source chroma values.
In general, reconstructing four chroma values by interpolation may
require four, six or nine source chroma values, depending on which
samples are required.
The worst case is reduced if chroma samples are aligned (``co-sited'')
with the luma values, or if chroma channel reconstruction uses
replication (nearest-neighbor filtering) rather than interpolation.

<<<

An approximation to the conversion described in <<conversionexample>> is
depicted in <<approximateconversionexample>>:

[[approximateconversionexample]]
.Example approximated sampling in one space and converting to a different space
image::images/colorconversion_approximate.{svgpdf}[width="{svgpdf@pdf:475pt:576}",align="center"]

A performance-optimized approximation to our example conversion may
use the following steps:

. Channel rearrangement (as in the previous example)
. Range expansion (as in the previous example)
. Chroma reconstruction combined with sampling.
  In this case, the desired chroma reconstruction operation is
  approximated by adjusting the sample locations to compensate
  for the reduced resolution and sample positions of the chroma
  channels, resulting in a single set of non-linear
  _Y&prime;C~B~C~R~_ values.
. Model conversion from _Y&prime;C~B~C~R~_ to _R&prime;G&prime;B&prime;_
  as described in <<MODEL_BT601>>, here performed _after_ the
  sampling/filtering operation.
. Conversion from non-linear _R&prime;G&prime;B&prime;_ to linear
  _RGB_, using the EOTF^&#160;-1^ described
  in <<TRANSFER_ITU>>.
. Conversion of color primaries, corresponding to step 7 of the
  previous example.
. Conversion to a non-linear representation, corresponding to step
  8 of the previous example.
. Conversion to the output color model, corresponding to step 9
  of the previous example.

NOTE: Since stages 1 and 2 represent an affine matrix transform, linear
interpolation of input values may equivalently be performed before
these operations.
This observation allows stages 1..4 to be combined into a single
matrix transformation.

<<<

Large areas of constant color will be correctly converted by this
approximation.
However, there are two sources of errors near color boundaries:

. Interpolation takes place on values with a non-linear representation;
  the repercussions of this are discussed in <<TRANSFER_CONVERSION>>,
  but can introduce both intensity and color shifts.
  Note that applying a non-linear transfer function as part of filtering
  does not improve accuracy for color models other than
  _R&prime;G&prime;B&prime;_ since the non-linear additive values have been
  transformed as part of the color model representation.
. When chroma reconstruction is bilinear and the final sample operation
  is bilinear, the interpolation operation now only access a maximum of
  four chroma samples, rather than up to nine for the precise series
  of operations.
  This has the potential to introduce a degree of aliasing in the
  output.

This approximation produces identical results to the more explicit
sequence of operations in two cases:

. If chroma reconstruction uses nearest-neighbor replication and the
  sampling operation is also a nearest-neighbor operation rather than
  a linear interpolation.
. If the sampling operation is a nearest-neighbor operation and
  chroma reconstruction uses linear interpolation, _if_ the sample
  coordinate position is adjusted to the nearest luma sample location.

As another example, the conversion from BT.709-encoded
_Y&prime;C~B~C~R~_ to sRGB _R&prime;G&prime;B&prime;_ may be considered
to be a simple <<MODEL_YUV,model conversion>> (to
<<PRIMARIES_BT709,BT.709>> _R&prime;G&prime;B&prime;_ non-linear primaries
using the ``<<TRANSFER_ITU,ITU>>'' OETF), since sRGB shares the BT.709
color primaries and is defined as a complementary <<TRANSFER_SRGB,EOTF>>
intended to be combined with BT.709's OETF.
This interpretation imposes a latexmath:[$\gamma \approx$] 1.1
OOTF.
Matching the OOTF of a
<<TRANSFER_ITU,BT.709>>-<<TRANSFER_BT1886,BT.1886>> system,
for which latexmath:[$\gamma \approx$] 1.2, implies using the
<<TRANSFER_BT1886,BT.1886>> EOTF to convert to linear light,
then the <<TRANSFER_SRGB,sRGB>> EOTF^&#160;-1^ to convert back
to sRGB non-linear space.
Encoding linear scene light with linear OOTF means applying
the <<TRANSFER_ITU,BT.709>> OETF^&#160;-1^; if the sRGB
_R&prime;G&prime;B&prime;_ target is itself intended to represent
a linear OOTF, then the {_R&prime;~sRGB~_, _G&prime;~sRGB~_,
_B&prime;~sRGB~_} should be calculated as:

[latexmath]
+++++
$$\{\mathit{R}'_\mathit{sRGB},\mathit{G}'_\mathit{sRGB},\mathit{B}'_\mathit{sRGB}\} =
\textrm{EOTF}^{-1}_{sRGB}(\textrm{OETF}^{-1}_{\mathit{BT}.709}
(\{\mathit{R}'_{\mathit{BT}.709},\mathit{G}'_{\mathit{BT}.709},\mathit{B}'_{\mathit{BT}.709}\}))$$
+++++