-
Notifications
You must be signed in to change notification settings - Fork 6
/
conversionintro.txt
358 lines (314 loc) · 16.8 KB
/
conversionintro.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
// Copyright (c) 2019 The Khronos Group Inc.
// Copyright notice at https://www.khronos.org/registry/speccopyright.html
== Introduction to color conversions
=== Color space composition
A ``color space'' determines the meaning of decoded numerical
color values: that is, it is distinct from the bit patterns,
compression schemes and locations in memory used to store the data.
A color space consists of three basic components:
* <<TRANSFER_CONVERSION,Transfer functions>> define the
relationships between linear intensity and linear numbers in
the encoding scheme.
Since the human eye's sensitivity to changes in intensity is
non-linear, a non-linear encoding scheme typically allows
improved visual quality at reduced storage cost.
** An opto-electrical transfer function (OETF) describes the
conversion from ``scene-referred'' normalized linear light
intensity to a (typically) non-linear electronic representation.
The inverse function is written ``OETF^ -1^''.
** An electro-optical transfer function (EOTF) describes the
conversion from the electronic representation to
``display-referred'' normalized linear light intensity in
the display system.
The inverse function is written ``EOTF^ -1^''.
** An opto-optical transfer function (OOTF) describes the
relationship between the linear scene light intensity and
linear display light intensity: OOTF(x) = EOTF(OETF(x)).
OETF = EOTF^ -1^ and
EOTF = OETF^ -1^ only if the OOTF is linear.
** Historically, a non-linear transfer function has been implicit
due to the non-linear relationship between voltage and intensity
provided by a CRT display.
In contrast, many computer graphics applications are best
performed in a representation with a linear relationship to
intensity.
** Use of an incorrect transfer function can result in images
which have too much or too little contrast or saturation,
particularly in mid-tones.
* <<PRIMARY_CONVERSION,Color primaries>> define the spectral
response of a ``pure color'' in an additive color model -
typically, what is meant by ``red'', ``green'' and ``blue''
for a given system, and (allowing for the relative intensity
of the primaries) consequently define the system's white
balance.
** These primary colors might refer to the wavelengths emitted
by phosphors on a CRT, transmitted by filters on an LCD for a
given back-light, or emitted by the LED sub-pixels of an OLED.
The primaries are typically defined in terms of a reference
display, and represent the most saturated colors the display
can produce, since other colors are by definition created
by combining the primaries.
The definition usually describes a relationship to the
responses of the human visual system rather than a full
spectrum.
** Use of incorrect primaries introduces a shift of hue, most
visible in saturated colors.
<<<
* <<MODEL_CONVERSION,Color models>> describe the distinction
between a color representation and additive colors.
Since the human visual system treats differences in absolute
intensity differently from differences in the spectrum
composing a color, many formats benefit from transforming
the color representation into one which can separate these
aspects of color.
Color models are frequently ``named'' by listing their
component color channels.
** For example, a color model might directly represent additive
primaries (_RGB_), simple color difference values
(_Y′C~B~C~R~_ -- colloquially _YUV_), or
separate hue, saturation and intensity (_HSV_/_HSL_).
** Interpreting an image with an incorrect color model typically
results in wildly incorrect colors: a (0,0,0) triple in an
_RGB_ additive color model typically represents black, but
may represent white in _CMYK_, or saturated green in color
difference models.
=== Operations in a color conversion
Conversion between color representations may require a number of
separate conversion operations:
* Conversion between representations with different
<<PRIMARY_CONVERSION,color primaries>> can be performed directly.
If the input and output of the conversion do not share the same
color primaries, this transformation forms the ``core'' of the
conversion.
* The color primary conversion operates on linear _RGB_
additive color values; if the input or output are not defined in
linear terms but with a non-linear <<TRANSFER_CONVERSION,transfer
function>>, any color primary conversion must be ``wrapped'' with
any transfer functions; conventionally, non-linear _RGB_
values are written _R′G′B′_.
* If the input or output <<MODEL_CONVERSION,color model>> is not
defined in terms of additive primaries (for example,
_Y′C~B~C~R~_ -- colloquially known as _YUV_), the model
conversion is applied to the non-linear _R′G′B′_
values; the _Y′~C~C′~CB~C′~CR~_ and _IC~T~C~P~_
color models are created from both linear and non-linear
_RGB_.
* Converting numerical values stored in memory to the representation
of the color model may itself require additional operations - in
order to remove dependence on bit depth, all the formulae described
here work with continuous natural numbers, but some common in-memory
<<CONVERSION_QUANTIZATION, quantization schemes>> must often be
applied.
Details of these conversion operations are described in the following
chapters.
NOTE: As described in the License Information at the start of
this document, the Khronos Data Format Specification does
not convey a right to implement the operations it describes.
This is particularly true of the conversion formulae in the
following sections, whose inclusion is purely informative.
Please refer to the originating documents and the bodies
responsible for the standards containing these formulae for
the legal framework required for implementation.
<<<
Common cases such as converting a _Y′C~B~C~R~_ image
encoded for 625-line <<bt601,BT.601>> to a _Y′C~B~C~R~_
image encoded for <<bt709,BT.709>> can involve multiple costly
operations.
An example is shown in the following diagram, which represents
sampling from a _Y′C~B~C~R~_ texture in one color space,
and the operations needed to generate a different set of
_Y′C~B~C~R~_ values representing the color of the sample
position in a different color space:
[[conversionexample]]
.Example sampling in one space and converting to a different space
image::images/colorconversion_accurate.{svgpdf}[width="{svgpdf@pdf:475pt:576}",align="center"]
In this diagram, non-linear luma _Y′_ channels are shown
in black and white, color difference _C~B~_/_C~R~_
channels are shown with the colors at the extremes of their range, and
color primary channels are shown as the primary color and black.
Linear representations are shown diagonally divided by a straight line;
non-linear representations are shown with a gamma curve.
The luma and color difference representation is discussed in
<<MODEL_YUV>>.
The interpretation of color primaries is discussed in
<<PRIMARY_CONVERSION>>.
Non-linear transfer functions are described in <<TRANSFER_CONVERSION>>.
As described below, the diagram shows a 2{times}3 grid of
input chroma texel values, corresponding to a 4{times}6 grid of
luma texel values, since the chroma channels are stored at half
the horizontal and half the vertical resolution of the luma
channel (i.e. in ``4:2:0'' representation).
Grayed-out texel values do not contribute to the final output, and are
shown only to indicate relative alignment of the coordinates.
<<<
The stages numbered in <<conversionexample>> show the following operations:
. Arranging the channels from the representation correctly for the
conversion operations (a ``swizzle'').
In this example, the implementation requires that the _C~B~_
and _C~R~_ values be swapped.
. Range expansion to the correct range for the values in the color
model (handled differently, for example, for ``<<QUANTIZATION_FULL,full>>''
and ``<<QUANTIZATION_NARROW,narrow>>'' ranges); in this example, the result
is to increase the effective dynamic range of the encoding: contrast and
saturation are increased.
+
In this example, operations 1 and 2 can be combined into a single
sparse matrix multiplication of the input channels, although actual
implementations may wish to take advantage of the sparseness.
. Reconstruction to full resolution of channels which are not at the
full sampling resolution (``chroma reconstruction''), for example by
replication or interpolation at the sites of the luma samples, allowing
for the chroma sample positions; this example assumes that the chroma
samples are being reconstructed through linear interpolation.
In the diagram, sample positions for each channel are shown as green
dots, and each channel corresponds to the same region of the image;
in this example, the chroma samples are located at the horizontal and
vertical midpoint of quads of luma samples, but different standards
align the chroma samples differently.
Note that interpolation for channel reconstruction necessarily happens
in a non-linear representation for color difference representations
such as _Y′C~B~C~R~_: creating a linear representation would
require converting to _RGB_, which in turn requires a full
set of _Y′C~B~C~R~_ samples for a given location.
. Conversion between color models -- in this example, from non-linear
_Y′C~B~C~R~_ to non-linear _R′G′B′_.
For example, the conversion might be that between BT.601
_Y′C~B~C~R~_ and BT.601 non-linear _R′G′B′_
described in <<MODEL_BT601>>.
For _Y′C~B~C~R~_ to _R′G′B′_, this
conversion is a sparse matrix multiplication.
. Application of a transfer function to convert from non-linear
_R′G′B′_ to linear _RGB_, using the
color primaries of the input representation.
In this case, the conversion might be the EOTF^ -1^ described
in <<TRANSFER_ITU>>.
+
The separation of stages 4 and 5 is specific to the _Y′C~B~C~R~_
to _R′G′B′_ color model conversion.
Other representations such as _Y′~C~C′~BC~C′~RC~_ and
_IC~T~C~P~_ have more complex interactions between the color
model conversion and the transfer function.
. Interpolation of linear color values at the sampling position shown
with a magenta cross according to the chosen sampling rules.
. Convert from the color primaries of the input representation to the
desired color primaries of the output representation, which is
a matrix multiplication operation.
Conversion from linear BT.601 EBU primaries to BT.709
primaries, as described in <<PRIMARIES_BT601_EBU>> and
<<PRIMARIES_BT709>>.
. Convert from the linear _RGB_ representation using the
target primaries to a non-linear _R′G′B′_
representation, for example the OETF described in <<TRANSFER_ITU>>.
. Conversion from non-linear _R′G′B′_ to the
_Y′C~B~C~R~_ color model, for example as defined
in as defined in <<MODEL_BT709>>
(a matrix multiplication).
If the output is to be written to a frame buffer with reduced-resolution
chroma channels, chroma values for multiple samples need to be combined.
Note that it is easy to introduce inadvertent chroma blurring in this
operation if the source space chroma values are generated by interpolation.
In this example, generating the four linear _RGB_ values
required for linear interpolation at the magenta cross position
requires _six_ chroma samples.
In the example shown, all four _Y′_ values fall between the
same two chroma sample centers on the horizontal axis, and therefore
recreation of these samples by linear blending on the horizontal axis
only requires two horizontally-adjacent samples.
However, the upper pair of _Y′_ values are sited above
the sample position of the middle row of chroma sample centers, and
therefore reconstruction of the corresponding chroma values requires
interpolation between the upper four source chroma values.
The lower pair of _Y′_ values are sited below the sample
position of the middle row of chroma sample centers, and
therefore reconstruction of the corresponding chroma values requires
interpolation between the lower four source chroma values.
In general, reconstructing four chroma values by interpolation may
require four, six or nine source chroma values, depending on which
samples are required.
The worst case is reduced if chroma samples are aligned (``co-sited'')
with the luma values, or if chroma channel reconstruction uses
replication (nearest-neighbor filtering) rather than interpolation.
<<<
An approximation to the conversion described in <<conversionexample>> is
depicted in <<approximateconversionexample>>:
[[approximateconversionexample]]
.Example approximated sampling in one space and converting to a different space
image::images/colorconversion_approximate.{svgpdf}[width="{svgpdf@pdf:475pt:576}",align="center"]
A performance-optimized approximation to our example conversion may
use the following steps:
. Channel rearrangement (as in the previous example)
. Range expansion (as in the previous example)
. Chroma reconstruction combined with sampling.
In this case, the desired chroma reconstruction operation is
approximated by adjusting the sample locations to compensate
for the reduced resolution and sample positions of the chroma
channels, resulting in a single set of non-linear
_Y′C~B~C~R~_ values.
. Model conversion from _Y′C~B~C~R~_ to _R′G′B′_
as described in <<MODEL_BT601>>, here performed _after_ the
sampling/filtering operation.
. Conversion from non-linear _R′G′B′_ to linear
_RGB_, using the EOTF^ -1^ described
in <<TRANSFER_ITU>>.
. Conversion of color primaries, corresponding to step 7 of the
previous example.
. Conversion to a non-linear representation, corresponding to step
8 of the previous example.
. Conversion to the output color model, corresponding to step 9
of the previous example.
NOTE: Since stages 1 and 2 represent an affine matrix transform, linear
interpolation of input values may equivalently be performed before
these operations.
This observation allows stages 1..4 to be combined into a single
matrix transformation.
<<<
Large areas of constant color will be correctly converted by this
approximation.
However, there are two sources of errors near color boundaries:
. Interpolation takes place on values with a non-linear representation;
the repercussions of this are discussed in <<TRANSFER_CONVERSION>>,
but can introduce both intensity and color shifts.
Note that applying a non-linear transfer function as part of filtering
does not improve accuracy for color models other than
_R′G′B′_ since the non-linear additive values have been
transformed as part of the color model representation.
. When chroma reconstruction is bilinear and the final sample operation
is bilinear, the interpolation operation now only access a maximum of
four chroma samples, rather than up to nine for the precise series
of operations.
This has the potential to introduce a degree of aliasing in the
output.
This approximation produces identical results to the more explicit
sequence of operations in two cases:
. If chroma reconstruction uses nearest-neighbor replication and the
sampling operation is also a nearest-neighbor operation rather than
a linear interpolation.
. If the sampling operation is a nearest-neighbor operation and
chroma reconstruction uses linear interpolation, _if_ the sample
coordinate position is adjusted to the nearest luma sample location.
As another example, the conversion from BT.709-encoded
_Y′C~B~C~R~_ to sRGB _R′G′B′_ may be considered
to be a simple <<MODEL_YUV,model conversion>> (to
<<PRIMARIES_BT709,BT.709>> _R′G′B′_ non-linear primaries
using the ``<<TRANSFER_ITU,ITU>>'' OETF), since sRGB shares the BT.709
color primaries and is defined as a complementary <<TRANSFER_SRGB,EOTF>>
intended to be combined with BT.709's OETF.
This interpretation imposes a latexmath:[$\gamma \approx$] 1.1
OOTF.
Matching the OOTF of a
<<TRANSFER_ITU,BT.709>>-<<TRANSFER_BT1886,BT.1886>> system,
for which latexmath:[$\gamma \approx$] 1.2, implies using the
<<TRANSFER_BT1886,BT.1886>> EOTF to convert to linear light,
then the <<TRANSFER_SRGB,sRGB>> EOTF^ -1^ to convert back
to sRGB non-linear space.
Encoding linear scene light with linear OOTF means applying
the <<TRANSFER_ITU,BT.709>> OETF^ -1^; if the sRGB
_R′G′B′_ target is itself intended to represent
a linear OOTF, then the {_R′~sRGB~_, _G′~sRGB~_,
_B′~sRGB~_} should be calculated as:
[latexmath]
+++++
$$\{\mathit{R}'_\mathit{sRGB},\mathit{G}'_\mathit{sRGB},\mathit{B}'_\mathit{sRGB}\} =
\textrm{EOTF}^{-1}_{sRGB}(\textrm{OETF}^{-1}_{\mathit{BT}.709}
(\{\mathit{R}'_{\mathit{BT}.709},\mathit{G}'_{\mathit{BT}.709},\mathit{B}'_{\mathit{BT}.709}\}))$$
+++++