Skip to content
doe300 edited this page Dec 18, 2019 · 5 revisions

NOTE: Images are NOT supported, this page simply lists possibly ways to access the different OpenCL image types!

OpenCL supported image types

List of all OpenCL 1.2 supported channel types and their allowed channel order types (OpenCL 1.2 specification, tables 5.6 and 5.7).

Component width CL_R CL_Rx CL_A CL_INTENSITY CL_LUMINANCE CL_RG CL_RGx CL_RA CL_RGB CL_RGBx CL_RGBA CL_ARGB CL_BGRA
CL_SNORM_INT8 1 Byte
CL_SNORM_INT16 2 Bytes
CL_UNORM_INT8 1 Byte ✓ (*)
CL_UNORM_INT16 2 Bytes ✓ (*)
CL_UNORM_SHORT_565 2 Bytes
CL_UNORM_SHORT_555 2 Bytes
CL_UNORM_INT_101010 4 Bytes
CL_SIGNED_INT8 1 Byte ✓ (*)
CL_SIGNED_INT16 2 Bytes ✓ (*)
CL_SIGNED_INT32 4 Bytes ✓ (*)
CL_UNSIGNED_INT8 1 Byte ✓ (*)
CL_UNSIGNED_INT16 2 Bytes ✓ (*)
CL_UNSIGNED_INT32 4 Bytes ✓ (*)
CL_HALF_FLOAT 2 Bytes ✓ (*)
CL_FLOAT 4 Bytes ✓ (*)

Entries marked (*) are required to be supported (OpenCL 1.2 specification, table 5.8).

Channel type Components Storage Resulting components
CL_R 1 r (r, 0.0, 0.0, 1.0)
CL_Rx 1 r (r, 0.0, 0.0, 1.0)
CL_A 1 a (0.0, 0.0, 0.0, a)
CL_RG 2 r, g (r, g, 0.0, 1.0)
CL_RGx 2 r, g (r, g, 0.0, 1.0)
CL_RA 2 r, a (r, 0.0, 0.0, a)
CL_RGB 3 r, g, b (r, g, b, 1.0)
CL_RGBx 3 r, g, b (r, g, b, 1.0)
CL_RGBA 4 r, g, b, a (r, g, b, a)
CL_BGRA 4 b, r, g, a (r, g, b, a)
CL_ARGB 4 a, r, g, b (r, g, b, a)
CL_INTENSITY 1 I (I, I, I, I)
CL_LUMINANCE 1 L (L, L, L, 1.0)

Intel extension for packed YUV images

This extension enables support for packed YUV image channel orders. At least the channel-order YUYV could be supported by a built-in VideoCore IV texture-type. The only channel-type supported for YUYV images is CL_UNORM_INT8. The border-color for YUV-images is (0.0f, 0.0f, 0.0f, 1.0f), same as e.g. CL_RGB. The pixel-data read is mapped to the components ( V, Y, U, 1.0 ).

VideoCore IV texture types

List of all texture types supported by the VideoCore IV hardware (Broadcom specification, table 18)

Type Components Storage Component width Read components
RGBA8888 4 r, g, b, a 1 Byte (r, g, b, a)
RGBX8888 3 r, g, b 1 Byte (r, g, b, 1.0)
RGBA4444 4 r, g, b, a 4 Bits (r, g, b, a)
RGBA5551 4 r, g, b, a 5/1 Bits (r, g, b, a)
RGB565 3 r, g, b 5/6 Bits (r, g, b, 1.0)
LUMINANCE 1 L 1 Byte (L, L, L, 1.0)
ALPHA 1 a 1 Byte (0, 0, 0, a)
LUMALPHA 2 L, a 1 Byte (L, L, L, a)
ETC1 ? ? ? ?
S16F 1 s 2 Byte (s)
S8 1 s 1 Byte (s)
S16 1 s 2 Byte (s)
BW1 1 bw 1 Bit ?
A4 1 a 4 Bits ?
A1 1 a 1 Bit ?
RGBA64 4 r, g, b, a 2 Bytes (r, g, b, a)
RGBA32R (*) 4 r, g, b, a 1 Byte (r, g, b, a)
YUYV422R (*) 4 y, u, y, v 1 Byte (r, g, b, a) or (y, u, y', v) ?

Entries marked (*) are in "raster format" (rectangular grid of pixels), e.g. from videos/images (formats like bmp, gif), other formats are in T-format (see below)

YUYV422 is used in video compressions (e.g. in PAL for TV, hence the raster format). For the memory layout see here and here.

Type mapping

Draft of mapping OpenCL image and component-type to VideoCore IV texture types.

CL_R CL_Rx CL_A CL_INTENSITY CL_LUMINANCE CL_RG CL_RGx CL_RA CL_RGB CL_RGBx CL_RGBA CL_ARGB CL_BGRA CL_YUYV_INTEL
CL_SNORM_INT8 S8 S8 ALPHA? LUMINANCE? LUMINANCE? RGBA8888(R)? RGBA8888(R)? RGBA8888(R)?
CL_SNORM_INT16 S16 S16 S16 S16 S16
CL_UNORM_INT8 S8 S8 ALPHA? LUMINANCE? LUMINANCE RGBA8888(R) (*) RGBA8888(R)? RGBA8888(R)? YUYV422R ?
CL_UNORM_INT16 S16 S16 S16 S16 S16 (*)
CL_UNORM_SHORT_565 RGB565 RGB565
CL_UNORM_SHORT_555 RGBA5551? RGBA5551?
CL_UNORM_INT_101010
CL_SIGNED_INT8 S8 S8 S8 RGBA8888(R)? (*) RGBA8888(R)? RGBA8888(R)?
CL_SIGNED_INT16 S16 S16 S16 (*)
CL_SIGNED_INT32 (*)
CL_UNSIGNED_INT8 S8 S8 S8 RGBA8888(R) (*) RGBA8888(R) RGBA8888(R)
CL_UNSIGNED_INT16 S16 S16 S16 (*)
CL_UNSIGNED_INT32 (*)
CL_HALF_FLOAT S16F S16F S16F S16F S16F RGBA64 (*)
CL_FLOAT (*)

Entries marked (*) are required to be supported (OpenCL 1.2 specification, table 5.8).

Texture formats directly matching the OpenCL image format (e.g. CL_RGBA, CL_UNORM_INT8 and RGBA8888(R)) need only to be unpacked into their components. For any other format (e.g. CL_HALF_FLOAT, CL_LUMINANCE and S16F) the constant components need to be set after unpacking.

NOTE: Alternatively could store images always as RGBA32R (at least for 8 bits per channel), would make writing/reading host- and device-side so much easier!

Texture formats

VideoCore IV supports T and LT-format (as well as raster-format for RGBA32R and YUYV422R)

A micro-tile is a "rectangular image block with a fixed size of [...] 64 bytes". Micro-tiles are stored "in simple raster order". (Broadcom specification, page 105)

Pixel size micro-tile size
8 Bytes 2 x 4
4 Bytes 4 x 4
2 Bytes 8 x 4
1 Byte 8 x 8
4 Bits 16 x 8
1 Bit 32 x 16

Addressing offset within micro-tile for 4-Byte pixels:

C D E F
8 9 A B
4 5 6 7
0 1 2 3

T-format

"T-format is based around 4 Kbyte tiles of 2D image data, formed from 1 Kbyte sub-tiles of data. As an example, for 32bpp pixel mode, 1 Kbyte equates to a block of 16 × 16 image pixels. A 4 Kbyte tile therefore contains 32 × 32 pixels." The data needs to be padded to multiples of 4KB in width and height. (Broadcom specification, pages 105+)

Micro-tiles are stored in "normal" raster order within the sub-tiles (4 x 4 micro-tiles per sub-tile), resulting in the same addressing offsets as the example for within a micro-tile with 32-bits pixels. Sub-tiles are stored in circular order within a 4k tiles. The addressing offset depends on the row of the 4k tile. For even rows they are ordered bottom-left, top-left, top-right to bottom-right. For odd rows top-right, bottom-right, bottom-left, top-left:

even row
1 2
0 3
odd row
2 0
3 1

The order of the 4-k tiles themselves is left-to-right for odd rows and right-to-left for even rows:

k l m n o
j i h g f
a b c d e
9 8 7 6 5
0 1 2 3 4

See also Mesa VC4 driver here and here.

LT-format

"Linear-tile format is typically used for small textures that are smaller than a full T-format 4K tile, to avoid wasting memory in padding the image out to be a multiple of tiles in size. This format is also micro-tile based but simply stores micro-tiles in a standard raster order." (Broadcom specification, page 107)

The LT-format is automatically selected for smaller size. (Broadcom specification, page 39) "The hardware assumes a level is in T-format unless either the width or height for the level is less than one T-format tile. In this case use the hardware assumes the level is stored in LT-format." (Broadcom specification, page 40)

Raster format

The texture-types RGBA32R and YUYV422R are in raster format, meaning all pixels in a row are in consecutive memory, followed by the next row and the next and so on. This allows for easier access host-side, e.g. pixels are a simple 1D/2D/3D array in memory, no custom order required, allows for memcpy instruction.

Though not officially supported as texture-format by the VideoCore IV architecture, the general 32-bit TMU read could be used to read raster-images of arbitrary size. For this to work, the address would need to be calculated from the image width and height.

Another trick to read 4 components with 32-bit each (e.g. for CL_RGBA with CL_FLOAT) would be, to set the image-width to 4x the original width, read 4 pixels and combine into one vector. This only works for original image-widths up to 512 pixels (since 4 * 512 = 2048). Similar for 2 components with 32-bit each or 4 components with 16-bit each, could imitate an image twice the size, read 2 pixels and rearrange to correct components. Interpolations are not supported by this kind of image-access.

Image attribute storage

When using the TMU for texture-lookup, the TMU automatically loads 2 or 3 UNIFORMs from the current UNIFORM_POINTER to containing the configuration for the texture to be read. Whether the third UNIFORM is read depends on the settings in the first two.

The first UNIFORM contains following useful information:

  • The texture base pointer (the address to the image-data) in multiples of 4K
  • The lower 4 bits of the texture data type (see VideoCore IV texture types)

The second UNIFORM contains following useful information:

  • The high bit of the texture data type
  • The image-height modulo 2048
  • The image-width modulo 2048
  • The magnification filter: bilinear or nearest pixel
  • The minification filter: bilinear or nearest pixel
  • The vertical wrap mode: repeat, clamp, mirror, use border color
  • The horizontal wrap mode: repeat, clamp, mirror, use border color

The third UNIFORM contains following useful information:

  • Type: UNIFORM contains cube-map stride, child-image offset or child-image size information
  • If cube-map stride: The cube-map stride (slice-pitch?) in multiples of 4KB
  • If child-image offset: The horizontal and vertical offset (in pixels) of the child-image to query
  • If child-image size: The width and height of the child-image to query

The image attributes (for reading image-data via TMU as well as querying image-info in the corresponding OpenCL C standard-library functions) are stored in a part of the global data as follows:

  1. First UNIFORM for TMU configuration (32 bit)
  2. Second UNIFORM for TMU configuration (32 bit)
  3. Third UNIFORM for TMU configuration, empty if not used (32 bit)
  4. OpenCL channel-order and channel-type information for image query functions (2 x 16 bit)
  5. TODO Additional info for image-arrays, 3D images?

The following steps are executed to read image-data from given coordinates:

  1. The coordinate parameter is normalized within the image area, the TMU accepts normalized image coordinates
  2. The second UNIFORM is modified in-place to represent the wrap- and filter-modes of the (default) sampler used
  3. The UNIFORM_POINTER is set to the address of the first UNIFORM containing the TMU configuration for this image
  4. The values read from the TMU are converted to the channel-type and -order of the query-function
Clone this wiki locally