Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-release pass #123

Closed
13 tasks done
lexaknyazev opened this issue May 20, 2020 · 35 comments
Closed
13 tasks done

Pre-release pass #123

lexaknyazev opened this issue May 20, 2020 · 35 comments

Comments

@lexaknyazev
Copy link
Member

lexaknyazev commented May 20, 2020

© 2018 The Khronos Group Inc.

  • Update copyright year?

textures for OpenGL®, OpenGL ES™️, Vulkan® and WebGL™️ applications

  • Maybe change to GPU applications?

The canonical version of the specification is available in the Khronos Registry

  • I don't think that KTX was ever published in the Khronos Registry. Are we going to put it there?

sgdByteLength > 0 ? align(8)

  • Such syntax is a bit unusual (as it lacks the second branch in the ternary operator); also we should add a dummy field here for consistency with other alignments. Maybe change to:
    if (sgdByteLength > 0)
        align(8)      sgdPadding
    

After inflation from supercompression <...>, levelImages looks like this:
align( lcm(texel block size, 4) ) mipPadding

  • Assuming that each mip level could be inflated independently, there's nothing to align in a case when supercompression is used. Does it imply that mipPadding would be empty?
  • Harmonize names of the "virtual" variables (preferably to underscores): row_of_blocks, format-specific-number-of-bytes, texel block size (several places).

OpenGL has a right-handed coordinate system. Therefore when using a skybox cube’s coordinates as the uvw coordinates for sampling a cubemap those coordinates must first be transformed to a left-handed system by multiplying the Z coordinate by -1.
While Vulkan has a left-handed coordinate system, +Y is down, therefore skybox coordinates must be rotated 180 degrees around the X axis before they are used to sample a cubemap.

  • Does it mean that "by-default" KTX2 files are aligned with with neither OpenGL nor Vulkan? Other APIs? Could the KTX2 spec define exact directions for XYZ wrt cubemaps to ensure ecosystem consistency (we'd need that for glTF anyway)?
  • Does Zstandard have any global data? If no, we should remove "T.B.D." from the corresponding column.

The uncompressed byte length is dependent on the which transcode target format is selected.

  • Maybe change to "The effective uncompressed byte length ..." for disambiguation.
  • Upon readiness, could we integrate BasisLZ bitsream into the KTX2 spec? I don't think we can refer to an arbitrary external resource for it.
  • We should add an informative flowchart about transcode target format selection based on the DFD and used platform.

Khronos Data format specification

  • Khronos Data Format Specification

This version of the KTX Specification is published and copyrighted by Khronos®, but is not a Khronos ratified specification.

  • Update for ratification.
@MarkCallow
Copy link
Contributor

MarkCallow commented May 20, 2020

I don't think that KTX was ever published in the Khronos Registry. Are we going to put it there?

That's the intention.

Assuming that each mip level could be inflated independently, there's nothing to align in a case when supercompression is used. Does it imply that mipPadding would be empty?

I'm not sure I'm parsing your question correctly. The placement of mipPadding and wording in the spec. shows it is only used for non-supercompressed data. That takes care of one way to parse the question. If the question rather is "is mipPadding needed when you independently inflate a supercompressed level?" then the answer is yes. The purpose is to match GPU API alignment requirements and thus enable use of mmapping.

Does it mean that "by-default" KTX2 files are aligned with with neither OpenGL nor Vulkan? Other APIs?

The original OpenGL cube mapping extension explicitly states that the coordinate system within the cube is left-handed, i.e. not aligned with OpenGL. Perhaps my interpretation of the equations for direction to face/UV mapping is wrong but they appear to constitute a left-handed coordinate system with +Y up and require that the images have a top down orientation. Vulkan has exactly the same equations so yes, the cube coordinate system is aligned with neither.

Could the KTX2 spec define exact directions for XYZ wrt cubemaps to ensure ecosystem consistency (we'd need that for glTF anyway)?

That is exactly what I am trying to achieve.

Upon readiness, could we integrate BasisLZ bitsream into the KTX2 spec? I don't think we can refer to an arbitrary external resource for it.

It depends how big it is. Let's discuss when we see it.

We should add an informative flowchart about transcode target format selection based on the DFD and used platform.

Would you care to take a stab at it? How best to create a flowchart? Is there some Asciidoctor way or must we create an image for inclusion?

I'll take care of the language fixes you've pointed out.

MarkCallow added a commit that referenced this issue May 21, 2020
@MarkCallow
Copy link
Contributor

I've fixed the typos and checked off the items above. Hope my edits to the issue stick.

@lexaknyazev
Copy link
Member Author

The placement of mipPadding and wording in the spec. shows it is only used for non-supercompressed data.

OK, looks like I was confused by the text before levelImages structure. The definition of mipPadding is clear that it's required only for non-supercompressed data.

If the question rather is "is mipPadding needed when you independently inflate a supercompressed level?" then the answer is yes.

Let's say there's a KTX2 file without supercompression applied. Its layout would be:

<HEADER>
mipPadding
levelImagesData[0]
mipPadding
levelImagesData[1]
mipPadding
levelImagesData[2]
... 

Let's apply zstd to it. The layout would change to:

<HEADER>
zstd(levelImagesData[0])
zstd(levelImagesData[1])
zstd(levelImagesData[2])
... 

Now, to mmap inflated data at runtime, an application would need to manually add required padding between levels, right?

@MarkCallow
Copy link
Contributor

Now, to mmap inflated data at runtime, an application would need to manually add required padding between levels, right?

In libktx, if I am inflating the entire texture I do manually add the padding so levels are aligned correctly. I think this detail is outside the scope of the spec., perhaps an informative note or tip?

Does it mean that "by-default" KTX2 files are aligned with with neither OpenGL nor Vulkan?

Please check the equations in the specs. I'd like confirmation of my interpretation. While I'm confident that they define a left-handed system and that images need to have a top-down orientation, I am less confident about +Y being up. I may have been influenced by all the examples I've looked at having the sky as the +Y image. For example all the cube maps at http://www.humus.name.

@lexaknyazev
Copy link
Member Author

I think this detail is outside the scope of the spec., perhaps an informative note or tip?

I think we should emphasize that supercompressed payload does not include padding that would be present otherwise.

Please check the equations in the specs.

Will do.

@MarkCallow
Copy link
Contributor

@lexaknyazev I've been looking at how to create the flowchart you suggested. The best tool among those supported by asciidoctor-diagram is Mermaid. Unfortunately it has a dependency on Puppeteer and headless Chromium which is huge (> 116MB). I think that is too big. Do you have any suggestions?

@lexaknyazev
Copy link
Member Author

lexaknyazev commented Jun 8, 2020

BasisLZ/ETC1S supports inter-frame (video) encoding for 2D slices. If isIFrame is set the image (frame) is an I frame. That is, it does not refer to the previous image. If unset, the image is a P frame. If the file is not an animation sequence it must be unset. See Section 4.4, “Animation Sequence” for details.

  • Strictly speaking, all texture slices in non-animated assets are I-Frames by definition (as they are self-contained). It would be much more consistent to call this flag isPFrame and reverse its meaning: unset for all non-animated slices and I-Frames; set for P-Frames.

Valid animation files must have the combination of parameters outlined in Section 4.1, “Texture Type” for 2D Array textures in addition to KTXanimData metadata.

  • Is there any reason why the spec does not allow making other array types animated?

@lexaknyazev
Copy link
Member Author

lexaknyazev commented Jun 8, 2020

  • In Section 3.8, the old "BasisU" name is used.

@MarkCallow
Copy link
Contributor

Strictly speaking, all texture slices in non-animated assets are I-Frames by definition (as they are self-contained). It would be much more consistent to call this flag isPFrame and reverse its meaning: unset for all non-animated slices and I-Frames; set for P-Frames.

In a .basis file slice_desc the equivalent cSliceDescFlagsFrameIsIFrame only has meaning for video files (as indicated by basis_texture_type being cBASISTexTypeVideoFrames). With the current encoder implementation only the first frame is an i-frame, the rest are always p-frames. Even if the encoder is modified in future to create more i-frames I would still expect there to be many more p-frames in the sequence.

Like .basis, KTX says the flag is only used for animation sequences. So I think the current sense of the flag is correct. Only if there was some reason to set the flag for non-animated array textures would there be reason to change. I can't see any purpose for the flag in a non-animated array.

Is there any reason why the spec does not allow making other array types animated?

I can't see 1D array textures being an issue. I'm consulting Rich about cube map array textures.

@lexaknyazev
Copy link
Member Author

Like .basis, KTX says the flag is only used for animation sequences. So I think the current sense of the flag is correct.

An I-Frame is just a regular slice that could be used in any texture type, while a P-Frame could exist only in a video sequence. My point is that the flag is technically present in all slices and is always unset in non-animated assets. So an extra rule is needed to "define" this flag only for animated assets.

FWIW, the transcoder does not even check for this flag (currently) and assumes that only the first frame is an I-Frame while all subsequent frames are P-Frames.

I can't see 1D array textures being an issue. I'm consulting Rich about cube map array textures.

1D arrays have pixel height of 0, while ETC1S blocks are 4x4. We may need to further clarify KTX dimensions for supercompressed textures (a new metadata entry?).

For cube map arrays, the transcoder has to either maintain 6 parallel video streams or to flatten a cube map array to a 2D array (the latter may be sub-optimal).

@MarkCallow
Copy link
Contributor

1D arrays have pixel height of 0

Only in KTX, so we can distinguish 1D textures from 2D textures with a height of 1. In reality they have a height of 1. the only reason for needing to distinguish is to select the desired OpenGL texture target.

So an extra rule is needed to "define" this flag only for animated assets.

There is such a rule. Are you saying you want to remove that rule?

1D arrays have pixel height of 0, while ETC1S blocks are 4x4. We may need to further clarify KTX dimensions for supercompressed textures (a new metadata entry?).

Yes, of course. I momentarily forgot. It's block compressed textures that are the issue whether or not they are supercompressed. I think there are some words somewhere about following restrictions laid out in the relevant KDFS section for the format. I think it is only a minority of formats where the pixel size has to be rounded to block size. Most of them allow exact pixel sizes with the remaining pixels in a the last block being ignored somehow.

@lexaknyazev
Copy link
Member Author

lexaknyazev commented Jun 8, 2020

So an extra rule is needed to "define" this flag only for animated assets.

There is such a rule. Are you saying you want to remove that rule?

In my opinion, having the flag set for P-Frames would simplify the definitions so that such a rule could be removed.

I think it is only a minority of formats where the pixel size has to be rounded to block size. Most of them allow exact pixel sizes with the remaining pixels in a the last block being ignored somehow.

  • The KTX spec currently requires textures of all block-compressed formats to be aligned with their block sizes, see Section 3.4. We should fully relax that requirement as it's an API restriction (not relevant anymore in most cases) rather than a file format issue.

@MarkCallow
Copy link
Contributor

In my opinion, having the flag set for P-Frames would simplify the definitions so that such a rule could be removed.

It doesn't change much. Here's the rewrite:

====== imageFlags
Flags giving information about an individual image. The following flag
is valid:
[source,c]
----
isPFrame = 0x02
----

BasisLZ/ETC1S  supports inter-frame (video) encoding for 2D slices.
If `isPFrame` is set the image (frame) is a P frame. That is, it
refers to the previous image of an _animation sequence_. All other
images are I frames.  Only animation sequences can have P frames.
See <<Animation Sequence>> for details.

I okay with this change so I have done it. The code impact is minimal. I kept the same value as the .basis cSliceDescFrameIsIFrame so I can just xor between them.

@MarkCallow
Copy link
Contributor

The KTX spec currently requires textures of all block-compressed formats to be aligned with their block sizes, see Section 3.4. We should fully relax that requirement as it's an API restriction (not relevant anymore in most cases) rather than a file format issue.

Do you remember why we added these requirements? I see Vulkan doesn't place any format-related restrictions on image extents not even for PVRTC1 so it seems like a good idea to relax these.

Do any of the OpenGL extensions place restrictions on the image width & height for CompressedTexImage* or TexStorage* functions? Should I just remove these 3 bullet items?

  • width and height being multiples of 4 for BCn and ETC1/ETC2/EAC formats;
  • width, height, and depth being multiples of the corresponding block size dimensions for ASTC formats;
  • various restrictions for PVRTC formats (see [PVRTC], [PVRTC1_OES], and [PVRTC2_OES]).

@MarkCallow
Copy link
Contributor

MarkCallow commented Jun 10, 2020

WEBGL_compressed_texture_s3tc has this restriction not found in OpenGL:

When level equals zero width and height must be a multiple of 4. When level is greater than 0 width and height must be 0, 1, 2 or a multiple of 4.

I am trying to find out why. But in view of this maybe we should leave the restrictions in place.

@lexaknyazev
Copy link
Member Author

I think that these restrictions should be removed anyway. As long as libktx (or an alternative implementation) is aware of compressed block sizes and an application is aware of runtime API limitations, the latter could figure out how to load such a texture. The file format design shouldn't be bound to the decades-old platform-specific issues.

The spec should still avoid any ambiguity wrt unaligned dimensions. For example, 5x5 BC1 should normatively take 4 blocks (filled 4x4, 4x1, 1x4, 1x1) with 39 pixels unused.

We may need to clarify how orientation metadata should be treated for unused pixels.

We may need to double-check if any special care is needed for PVRTC.

@MarkCallow
Copy link
Contributor

the latter could figure out how to load such a texture.

Any clever ideas on how to do this? The obvious rounding up to the next multiple of the block size will result in edge artifacts.

The spec should still avoid any ambiguity wrt unaligned dimensions. For example, 5x5 BC1 should normatively take 4 blocks (filled 4x4, 4x1, 1x4, 1x1) with 39 pixels unused.

Where does the 1x1 come into it? Do we need to do this. Don't the format specs already discuss how to handle partially filled blocks?

We may need to clarify how orientation metadata should be treated for unused pixels.

What do you mean?

We may need to double-check if any special care is needed for PVRTC.

Please do.

@lexaknyazev
Copy link
Member Author

lexaknyazev commented Jun 17, 2020

the latter could figure out how to load such a texture.

Any clever ideas on how to do this? The obvious rounding up to the next multiple of the block size will result in edge artifacts.

I don't think there's any universal solution. Textures are usually shipped within an application rather than on their own, so it's developer's responsibility to avoid unaligned BC1 on WebGL.

The spec should still avoid any ambiguity wrt unaligned dimensions. For example, 5x5 BC1 should normatively take 4 blocks (filled 4x4, 4x1, 1x4, 1x1) with 39 pixels unused.

Where does the 1x1 come into it?

image

Do we need to do this. Don't the format specs already discuss how to handle partially filled blocks?

The issue is with levelImages. We need some language that defines pixelWidth and friends for subsequent levels.

We may need to clarify how orientation metadata should be treated for unused pixels.

What do you mean?

In the 5x5 example above, where is the origin for lu orientation: (4, 4) or (7, 7)?

@MarkCallow
Copy link
Contributor

In the 5x5 example above, where is the origin for lu orientation: (4, 4) or (7, 7)?

The spec says - in the description of KTXorientation -

Texture data in a KTX file are arranged so that the first pixel in the data stream for each face and/or array element is closest to the origin of the texture coordinate system.

So the full block would be at the lu origin. Perhaps we can clarify by saying "first pixel or block" but I'm not sure it is necessary.

The issue is with levelImages. We need some language that defines pixelWidth and friends for subsequent levels.

Please propose something.

@MarkCallow
Copy link
Contributor

@lexaknyazev Please look at the cube map issue discussed earlier in this issue.

@MarkCallow
Copy link
Contributor

MarkCallow commented Aug 17, 2020

@lexaknyazev there are 2 items remaining to be checked off:

  • Defining exact directions for XYZ wrt cubemaps, assuming you aren't satisified with what's there now.
  • Adding an informative flowchart about transcode target format selection based on the DFD and used platform.

(I've checked these items in the discussion above and repeated them here so as to avoid having to repeatedly scroll through the long discussion.)

For the first, please review and make any suggestions for improvement.

For the second, I think you created an informative flowchart. Please add it to the spec.

@MarkCallow
Copy link
Contributor

@lexaknyazev do you still want to add the informative flowchart? If so, please submit a PR. Please review the cubemap orientation stuff and suggest any changes you think necessary. I want to close this issue.

@lexaknyazev
Copy link
Member Author

We have a separate document on transcode target selection now.

@lexaknyazev
Copy link
Member Author

Cubemaps

As prescribed by the Vulkan and OpenGL specifications, all faces must have a right-down orientation...

The tables given in OpenGL and Vulkan specs are a bit more subtle. They define origins for each face without explicitly orienting the whole cube or even stating its handedness.

Still, the "skybox" example is likely the easiest way to visualize it:

  1. Setup an axis-oriented cube in a left-handed coordinate system with +Y pointing up.
  2. [-X, -Z, +X, +Z] faces now have their local origins in their top-left corners if looking at them from the centre of the cube.
  3. Changing handedness of this space while still enforcing the rules given in the API tables would move all origins to the top-right corners.
  4. Note that this vizualization does not give any immediate hints about texture origins of +Y and -Y faces although they are still well-defined.

Anyway, these details are applicable only when samplerCube* (and alike) shader instructions are used. Some applications may prefer to access face data directly by index and 2D coords. Thankfully, the order of faces is the same everywhere.

KTX cannot make any assumptions about a coordinate system that a particular application may use during texel fetches from a cubemap. Moreover, although KTX uses top-left origin by default, it could be altered by KTXorientation metadata entry (alternatively the latter could be disallowed for cubemaps).

So, the two configurations are possible for applications that use samplerCube*:

  • Left-handed (s, t, r) inputs with per-face (0, 0) in top-left corners (let's make it default)
  • Right-handed (s, t, r) system with per-face (0, 0) in top-right corners (could be signalled with ld orientation metadata)

These could be normatively defined either by copying the Vulkan / OpenGL table here or by making a simple set of labeled images with per-face origins marked.

See the identical face selection tables and the equations for calculating (s, t) in section 15.5.3 ...

The correct section number is 16.5.3.

OpenGL has a right-handed coordinate system.

All OpenGL specs include the following statement: OpenGL does not force left- or right-handedness on any of its coordinates systems. KTX shouldn't suggest otherwise. The notion of OpenGL's object space being right-handed is more like a legacy convention rather than a fixed definition. If anything, it's possible to say that the clip space is left-handed, but with GL_ARB_clip_control extension (core since 4.5) even that is not always the case.

Therefore ... those coordinates must first be transformed to a left-handed system by multiplying the Z coordinate by -1.

This effect is described above. Applications that for whatever reason supply right-handed (s, t, r) values should expect per-face origins in their top-right corners.

Vulkan has a left-handed coordinate system ...

This is a much stronger statement than the Vulkan spec implies since its clip space has been fully configurable since the beginning.

... therefore skybox coordinates must be rotated 180 degrees around the X axis before they are used to sample a cubemap.

This basically describes an opinionated "Y+ up" to "Y+ down" transformation that may sometimes be irrelevant because different engines may have different internal conventions.


Suggestions:

  • Disallow orientation metadata for cubemaps and cubemap arrays.
  • Enhance the "skybox" description with an illustration derived from API tables with top-left origins and left-handed cubemap coords.
  • Rephrase API-specific statements to not make too strong assumptions about application environments.

@MarkCallow
Copy link
Contributor

Thank you. I have no problem rephrasing API specific statements. As to your other points I need to take some time to think about them. This stuff makes my head spin. The number 1 issue for me is having a single cubemap that be displayed correctly on Vulkan and OpenGL in, preferably, their default coordinate systems. Here correctly means that the rendered result reflects ground truth.

The genesis for these comments is that I have found cubemap files that have been hacked to make them work on Vulkan (+Y and -Y images were swapped). When rendered with Vulkan's default coordinate system this results in +X and -X being swapped with respect to ground truth. I discovered this because one of the cubemaps is a place I know in Yokohama and when I saw it I when "Whoops. The Landmark Tower should be on the right not the left." There is absolutely no need to swap +Y and -Y images to get the correct results.

@lexaknyazev
Copy link
Member Author

on Vulkan and OpenGL in, preferably, their default coordinate systems

The uploaded cubemap texture cannot be displayed on its own without some minimal vertex/shaders setup. At that point, applications usually make all kinds of opinionated choices that virtually erase the notion of "default coordinate systems".

The best thing KTX spec can do here is to clearly define its own coordinate system for cubemaps: left-handed, Y+ up, top-left face origins. Then it would be application's responsibility to correctly sample input KTX data.

@MarkCallow
Copy link
Contributor

Okay, let's define our own coordinate system. I agree with making left-handed the default. I'd like to keep orientation metadata though.

Please sketch up a skybox visualization example.

@lexaknyazev
Copy link
Member Author

I'd like to keep orientation metadata though.

This feature adds another level of indirection on top of already confusing subject. Should we keep it, it's probably better to explicitly list all faces four times (rd, ru, ld, lu) to avoid any ambiguities.

It seems to me that the whole "Cubemap Orientation" topic deserves its own appendix (with illustrations and API hints) to not bloat the main spec text. The normative language should simply state the order of faces and the default coordinate system.

@lexaknyazev
Copy link
Member Author

lexaknyazev commented Apr 12, 2021

image

@lexaknyazev
Copy link
Member Author

The image above denotes the proposed KTX cubemap coordinate system that is directly compatible with Vulkan / OpenGL cube sampler, assuming that the latter receives correct left-handed inputs.

The KTXorientation metadata simply defines 2D (face-local) origin location, defaulting to top-left.

@MarkCallow
Copy link
Contributor

MarkCallow commented Apr 13, 2021

I agree the topic should be in an appendix.

The order of the faces is as now and the default coordinate system is top left origin, also as now, right?

What did you use to draw the figure you included above? can you make an SVG version?

@MarkCallow
Copy link
Contributor

it's probably better to explicitly list all faces four times (rd, ru, ld, lu) to avoid any ambiguities.

You mean have 4 versions of the image you created?

@MarkCallow
Copy link
Contributor

I've made the appendix. Please review PR #148.

The correct section number is 16.5.3.

The weekly updates change section numbers. This spec. uses links to internal anchors so even if the section number has changed, clicking the link will take you to the right place.

@lexaknyazev
Copy link
Member Author

The order of the faces is as now and the default coordinate system is top left origin, also as now, right?

Right.

You mean have 4 versions of the image you created?

Initially, my plan was to mark per-face origins on the cube. Now that faces are shown explicitly, I think one image is enough.

can you make an SVG version?

Will look into that.

@MarkCallow
Copy link
Contributor

The last piece of this was just committed in PR #148.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants