Skip to content

Commits

Permalink
unity
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Oct 23, 2018

  1. Implement ETC1S/ETC2AS image compression

    Explanation:
    
    ETC1S encoding is a subset of ETC1, which is using only one color endpoint per 4x4 block (modifier indices are identical for both subblocks, base color is encoded differentially as RGB555 with the differential RGB333 part always set to zero, flip bit is always set to zero).
    Usage: crunch_x64.exe -ETC1S input.png -out output.ktx
    
    ETC2AS encoding is a subset of ETC2A encoding which is using ETC1S encoding for color and ETC2A encoding for alpha.
    Usage: crunch_x64.exe -ETC2AS input.png -out output.ktx
    Alexander Suvorov committed Oct 23, 2018
    Copy the full SHA
    8708900 View commit details
    Browse the repository at this point in the history
  2. Make Crunch compression work correctly on CPU supporting 16 or more t…

    …hreads
    
    Explanation:
    
    Crunch library has been designed to work correctly when using up to 16 threads. Considering that one of those threads is the main thread, the maximum number of helper threads should be set to 15.
    Alexander Suvorov committed Oct 23, 2018
    Copy the full SHA
    4bb735f View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2018

  1. Add compression support for ETC1S/ETC2AS encodings

    Explanation:
    
    ETC1S encoding is a subset of ETC1, which is using only one color endpoint per 4x4 block. The base color is therefore is always encoded as RGB555 and there is no need to encode block flips. ETC2AS encoding is a subset of ETC2A encoding which is using ETC1S encoding for color and default ETC2A encoding for alpha.
    
    ETC1S/ETC2AS Crunch compression and decompression is based on ETC and DXT Crunch compression and decompression algorithms:
    - ETC1S/ETC2AS tiling is performed within the area of 8x8 pixels using DXT1/DXT5 tiling scheme
    - ETC1S color endpoints are generated using standard ETC1 optimization
    - ETC1S color codebook encoding is equivalent to ETC1 codebook encoding
    - ETC1S level encoding is equivalent to DXT1 level encoding
    - ETC2AS alpha codebook encoding is equivalent to ETC2A alpha codebook encoding
    - ETC2AS level encoding is equivalent to DXT5 level encoding
    
    Testing results suggest that ETC1S/ETC2AS encodings can be used to achieve lower bitrates than ETC1/ETC2A on the Kodak test set while providing equivalent image quality (estimated using PSNR).
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.854 sec
    Modified: 1468204 bytes / 5.473 sec
    Improvement: 7.21% (compression ratio) / 81.03% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.925 sec
    Modified: 1914805 bytes / 7.297 sec
    Improvement: 7.28% (compression ratio) / 80.24% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 12.842 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    
    ETCS Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1S quantization parameters have been selected in such a way, so that ETC1S compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1S encoding]
    Total size: 1363676 bytes
    Total time: 15.586 sec
    Average bitrate: 1.156 bpp
    Average Luma PSNR: 34.047 dB
    Alexander Suvorov committed Jun 7, 2018
    Copy the full SHA
    660322d View commit details
    Browse the repository at this point in the history

Commits on Oct 27, 2017

  1. Optimize vector quantization step

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    
    The vector quantization algorithm takes floating point vectors as input and performs vector preprocessing right before the quantization. At the same time, selector training vectors are generated directly from integer selector values, packed into a single uint64. It would therefore be more efficient to perform preprocessing of the selector training vectors (which includes sorting and deduplication) while still having them in a packed form. Additional performance boost is achieved by using multiple threads for sorting the training vectors.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.869 sec
    Modified: 1468204 bytes / 5.477 sec
    Improvement: 7.21% (compression ratio) / 81.03% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.961 sec
    Modified: 1914805 bytes / 7.322 sec
    Improvement: 7.28% (compression ratio) / 80.19% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 12.766 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 27, 2017
    Copy the full SHA
    c1d8e8d View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2017

  1. Optimize tile computation step

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    
    In the tile computation step, pixels within the tiling area are palettized using a general purpose tree clusterization algorithm. At the same time, clusterization of the tile pixels is always performed with the following restrictions: the maximum number of palettized pixels is 64, the maximum number of clusters is 2. The performance can therefore be improved by solving the palettizing task with a specialized version of the tree clusterizer, which does not maintain the tree structure and uses constant memory.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.863 sec
    Modified: 1468204 bytes / 5.726 sec
    Improvement: 7.21% (compression ratio) / 80.16% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.950 sec
    Modified: 1914805 bytes / 7.683 sec
    Improvement: 7.28% (compression ratio) / 79.21% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 13.071 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 25, 2017
    Copy the full SHA
    21eb70b View commit details
    Browse the repository at this point in the history
  2. Optimize endpoint reordering algorithm

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    The main ideas used for optimization of the endpoint reordering algorithm:
    - On each iteration, the list of the chosen endpoints is updated only from one side, so all the computations performed for the unchanged side of the list can be cached and reused on the next iteration.
    - The list of the chosen endpoints can be build using an array of double size, growing from the middle of the array. This eliminates unnecessary memory reallocations and movements.
    - When an element is removed from the list of remaining endpoints, instead of moving all the elements with higher indices, just a single last element of the list can be moved into the position of the removed element (the original indices of the remaining endpoints should be stored within the list elements to maintain proper indexing).
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.848 sec
    Modified: 1468204 bytes / 5.875 sec
    Improvement: 7.21% (compression ratio) / 79.63% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.952 sec
    Modified: 1914805 bytes / 7.834 sec
    Improvement: 7.28% (compression ratio) / 78.80% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 13.261 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 25, 2017
    Copy the full SHA
    b8b456d View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2017

  1. Optimize DXT endpoints computation

    This change improves the compression speed for DXT encoding.
    
    Explanation:
    
    When performing per-component endpoint optimization, the trial solutions are generated using all possible combinations of the component values. Then the error boundary computation is performed for each block color of the trial solution in order to check the possibility of early out. The important observation here is that some component values are present in several trial solutions and therefore are processed multiple times. The overall performance can therefore be improved by computing and caching the errors for all the possible component values in advance.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.843 sec
    Modified: 1468204 bytes / 6.067 sec
    Improvement: 7.21% (compression ratio) / 78.97% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.983 sec
    Modified: 1914805 bytes / 8.080 sec
    Improvement: 7.28% (compression ratio) / 78.15% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 13.421 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 24, 2017
    Copy the full SHA
    7143913 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2017

  1. Perform multithreaded node preprocessing for faster vector quantization

    This change significantly improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    
    On each iteration of the vector quantization algorithm, the leaf with the highest variance is selected for splitting. If the leaf gets split, then two new leaves are created (while the leaves that can not be split will be ignored on the future iterations). There does not seem to be any simple way to compute or reliably predict the variances of the future leaves in advance, which means that there is no simple way to efficiently perform split operations in parallel.
    
    And still, there is an interesting observation. Even though the order of the split operations depends on the previous iterations, the split operations performed in different subtrees are completely independent. So what if instead of solving the main quantization task we will first solve an alternative quantization task, which has a lot in common with the main task, but at the same time can be efficiently parallelized. Then the intermediate computation results of the alternative solution can be reused when solving the main task. Specifically, the idea is to efficiently compute an alternative split tree, which is more or less balanced, and has approximately the same number of nodes as the main tree. Then the overlapping part of the main and alternative trees can be reused while solving the main quantization task.
    
    In order to achieve this, the initial root is first split normally until the number of splittable leaves reaches the number of available threads. Then each leaf is split in a separate thread, while the maximum number of split iterations for each subtree is defined as the maximum number of split iterations for the whole main tree divided by the number of used threads. This way the total number of nodes in the alternative tree will be approximately the same as the number of nodes in the main tree.
    
    Note that in general, the alternative tree does not match the main tree, so some nodes of the alternative tree will never be reused. In practice however, the portion of such unnecessarily precomputed nodes is not very big. And considering that the nodes of the alternative tree are precomputed in parallel using multiple threads, in most cases the overall performance is significantly improved.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.840 sec
    Modified: 1468204 bytes / 6.303 sec
    Improvement: 7.21% (compression ratio) / 78.14% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.955 sec
    Modified: 1914805 bytes / 8.342 sec
    Improvement: 7.28% (compression ratio) / 77.43% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 13.322 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 20, 2017
    Copy the full SHA
    dbbef6a View commit details
    Browse the repository at this point in the history
  2. Use multiple threads for node split in vector quantization

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    
    During the node split iteration, identical computations are performed for all the vectors of the split node. The overall performance can be improved by performing independent computations in separate threads. In order to avoid possible performance overhead, on each iteration the number of threads is selected in such a way so that each thread processes at least 512 vectors.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.892 sec
    Modified: 1468204 bytes / 7.578 sec
    Improvement: 7.21% (compression ratio) / 73.77% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.943 sec
    Modified: 1914805 bytes / 9.993 sec
    Improvement: 7.28% (compression ratio) / 72.95% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 14.753 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 20, 2017
    Copy the full SHA
    1028520 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2017

  1. Optimize vector quantization algorithm

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    
    On each iteration of the vector quantization algorithm, the leaf with the highest variance is selected for splitting. At the same time, each split operation adds at most 2 new leaves. Considering this, the search of the leaf with the highest variance can be performed more efficiently if all the leaves are stored in a priority queue (in order to guarantee that texture decompression gives identical result to the original version of Crunch, the node comparison operation also takes the node index into account).
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.844 sec
    Modified: 1468204 bytes / 7.883 sec
    Improvement: 7.21% (compression ratio) / 72.67% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.978 sec
    Modified: 1914805 bytes / 10.490 sec
    Improvement: 7.28% (compression ratio) / 71.63% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 15.165 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 19, 2017
    Copy the full SHA
    fbe3f6c View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2017

  1. Optimize vector quantization algorithm

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    
    When a node is split during the quantization step, all of its vectors are split between the child nodes, and new memory is allocated to store each new set of vectors. At the same time, the set of vectors of the parent node is no longer accessed after the split. Considering that the sets of vectors of the child nodes do not intersect, it is possible to reuse the memory allocated for the parent set of vectors, to store the child sets of vectors. This can be achieved in the following way. All the source vectors are initially stored in an array. Let's assume that it is possible to reorder this common array of vectors in such a way, so that vectors of each node would form a continuous block within this array. Then it would be sufficient to store only two indices for each node (pointing to the first and to the last node vectors in the common array of vectors) in order to describe the complete set of vectors of this node. This assumption is correct for the root node, which has initial vector indices pointing to the first and to the last elements of the complete vector array. When a node is split, let's reorder its vectors (stored in a continuous block within the common array of vectors) in such a way, so that vectors of the left child node are put in front, and then followed by the vectors of the right child node (the indices of the first and last vectors of the child nodes should be set accordingly). This way each child node will also have its vectors stored in a continuous block within the common array of vectors, defined by two indices, and the split iteration can be repeated. Note that the memory, which is used to store the sets of vectors for all the nodes, now needs to be allocated only once.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.872 sec
    Modified: 1468204 bytes / 8.276 sec
    Improvement: 7.21% (compression ratio) / 71.34% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.971 sec
    Modified: 1914805 bytes / 10.944 sec
    Improvement: 7.28% (compression ratio) / 70.40% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 15.364 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 17, 2017
    Copy the full SHA
    11a89d2 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2017

  1. Optimize color endpoint solution evaluation

    This change improves the compression speed for DXT encoding.
    
    Explanation:
    
    In order to evaluate an endpoint solution, it is necessary to compute the sum of the squared distances from the source pixels to their nearest block colors, defined by the evaluated endpoint solution. Such computation is quite complicated, so before it is performed, we can compute the sum of the squared distances from the source pixels to the axis-aligned bounding box enclosing all the evaluated block colors (if the source pixel appears to be inside the AABB of the evaluated solution, then the distance is considered to be 0). If the sum of the squared distances to the AABB of the current solution is already bigger than the sum of the squared distances computed for the previously found best solution, then the current solution does not need to be evaluated.
    
    The actual trick here is that the sum of the squared distances to the AABB of the current solution can be computed in constant time using the following approach. The sums of the squared distances for each color component can be computed separately. For each color component the AABB determines 2 planes: the "lower" plane, defined by the lower boundary of the AABB, and the "upper" plane, defined by the upper boundary of the AABB. The sum for each color component is combined from two parts: the sum of the squared distances from the lower plane to all the source pixels which are below the lower plane, and the sum of the squared distances from the upper plane to all the source pixels which are above the upper plane. Considering that the endpoints of the evaluated solution are encoded as RGB565, there are 32 possible planes for the red and blue components, and 64 possible planes for the green component. For each plane it is sufficient to precompute the following two values: the sum of the squared distances from the plane to all the source pixels which are "below" this plane, and the sum of the squared distances from the plane to all the source pixels which are "above" this plane. The total sum of the squared distances from the source pixels to any evaluated AABB can then be represented as a sum of 6 precomputed values, while all the used values can be precomputed in linear time with dynamic programming.
    
    Note: The AABB check seems to work faster than inserting a solution into the hash map. For this reason the AABB check is performed first.
    
    Additional improvements: A few minor adjustments have been made in order to make sure that the texture decompression gives identical result to the original version of Crunch also for 32-bit builds (original Crunch library uses different floating point models for 32-bit and 64-bit builds).
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.861 sec
    Modified: 1468204 bytes / 8.622 sec
    Improvement: 7.21% (compression ratio) / 70.13% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.980 sec
    Modified: 1914805 bytes / 11.294 sec
    Improvement: 7.28% (compression ratio) / 69.46% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 15.529 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 13, 2017
    Copy the full SHA
    a14a313 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2017

  1. Optimize computation of the endpoint cluster indices

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    
    The vectors which are processed in the cluster indices computation step, are the very same vectors which were used in the vector quantization step. This means that every processed vector already has a specific centroid associated with it. Even though the associated centroid is not necessarily the closest one to the processed vector, the distance to the associated centroid can be used as an upper boundary of the distance to the closest centroid. This allows to efficiently perform early out while computing the distances to the other centroids.
    
    Note: The modified algorithm is supposed to generate decompression result identical to the original version of Crunch. For this reason the centroid associated with a specific training vector is not used as an initial best solution, because it could potentially change the decompression result in cases when the processed training vector is equidistant from multiple centroids (selection of the closest centroid in such cases depends on the processing order).
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.847 sec
    Modified: 1468204 bytes / 8.929 sec
    Improvement: 7.21% (compression ratio) / 69.05% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.953 sec
    Modified: 1914805 bytes / 11.651 sec
    Improvement: 7.28% (compression ratio) / 68.47% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 15.695 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 10, 2017
    Copy the full SHA
    65f4431 View commit details
    Browse the repository at this point in the history

Commits on Oct 6, 2017

  1. Optimize vector quantization algorithm

    This change improves the compression speed for both DXT and ETC encodings.
    
    Explanation:
    The main ideas used for optimization of the vector quantization algorithm:
    - intermediate structures can store vector indices instead of the vector data, which minimizes the total amount of copied data when splitting a node (this is especially important for selector quantization, where processed vectors have 16 components)
    - weighted vectors and weighted dot products can be cached
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.893 sec
    Modified: 1468204 bytes / 9.310 sec
    Improvement: 7.21% (compression ratio) / 67.78% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.942 sec
    Modified: 1914805 bytes / 12.232 sec
    Improvement: 7.28% (compression ratio) / 66.89% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 16.121 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 6, 2017
    Copy the full SHA
    51f73fd View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2017

  1. Optimize DXT color endpoint solution evaluation

    This change improves the compression speed for DXT encoding.
    
    Explanation:
    
    In order to evaluate an endpoint solution, it is necessary to compute the sum of the squared distances from the source pixels to their nearest block colors, defined by the evaluated endpoint solution. Considering that we are looking for a solution with a minimal sum, the computation can be stopped as soon as the current sum is higher or equal than the previously found best sum. An interesting observation here is that the performance improvement, achieved by such early out approach, depends on the order in which the source pixels are processed. It makes sense to process the pixels with the highest introduced errors first, as this significantly increases the chances to exit the computation earlier.
    
    On the one hand, equal source pixels are grouped together, so the computed distance from each unique source pixel is multiplied by its weight. For this reason it makes sense to first process the pixels with the highest weights, as their errors have the highest multipliers. On the other hand, the pixels which project onto the middle part of the endpoint interval, have higher chances of being close to one of the block colors. For this reason it makes sense to first process the pixels, which projections are the most distant from the middle of the endpoint interval, as those pixels will normally introduce the highest errors. In order to combine those two aspects, it is proposed to sort the pixels according to the multiplication of the weight and the distance from the projected pixel to the center of the endpoint interval.
    
    Of course, reordering the pixels on each iteration would be very expensive and is not considered. However, there is a high chance that most endpoint intervals will be aligned in a similar way as the principle axis, as well as have their centers close to the mean color. As soon as the principle axis is computed, it can be used for approximation of all the future endpoint intervals. So the projection and reordering of the source pixels is performed only once.
    
    Two approaches have been considered. In the first approach, the pixels have been sorted by the multiplication of the weight and the absolute distance in decreasing order. In the second approach, the pixels have been sorted by the multiplication of the weight and the signed distance, and then interleaved starting from the opposite sides of the ordered sequence. When tested on the Kodak image set, the interleaving approach shows better results.
    
    Additional optimization: perceptual and uniform versions of the evaluation function are now implemented separately, which slightly improves the performance.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.845 sec
    Modified: 1468204 bytes / 10.071 sec
    Improvement: 7.21% (compression ratio) / 65.09% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.929 sec
    Modified: 1914805 bytes / 13.248 sec
    Improvement: 7.28% (compression ratio) / 64.13% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.126 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Oct 4, 2017
    Copy the full SHA
    205829d View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2017

  1. Optimize DXT endpoints computation

    This change improves the compression speed for DXT encoding.
    
    Explanation:
    
    When performing per-component endpoint optimization, it is not necessary to go through all the source pixels on every iteration in order to calculate the total weighted squared error for a specific trial endpoint. The computation can be optimized in the following way:
    
    sum(w(i) * (x - p(i)) * (x - p(i))) = sum(w(i) * x * x) - sum(w(i) * 2 * x * p(i)) + sum(w(i) * p(i) * p(i)) = sum(w(i)) * x * x - sum(2 * w(i) * p(i)) * x + sum(w(i) * p(i) * p(i))
    
    The values of sum(w(i)), sum(2 * w(i) * p(i)), sum(w(i) * p(i) * p(i)) can be precalculated for each of 4 selectors, and only have to be updated when the solution improves. This way the error computation on each iteration can be performed using 12 multiplications instead of 2 * N (where N is the number of pixels in the processed cluster).
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.811 sec
    Modified: 1468204 bytes / 10.520 sec
    Improvement: 7.21% (compression ratio) / 63.49% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.936 sec
    Modified: 1914805 bytes / 13.902 sec
    Improvement: 7.28% (compression ratio) / 62.36% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.121 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Sep 13, 2017
    Copy the full SHA
    4bd4355 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2017

  1. Optimize DXT endpoints refinement

    This change improves the compression speed for DXT encoding.
    
    Explanation:
    
    When creating the array of trial alpha endpoints, there is no need to use bit array for tracking duplicate entries. Instead, the uniqueness of the endpoint pair can be determined using simple comparison operations. Moreover, it is not necessary to go through all the source pixels on every iteration in order to calculate the total squared error for a specific trial endpoint. Considering that selector values are not modified during the refinement step, each selector has a fixed set of pixels associated with it during optimization. This means that calculation of the total squared error can be optimized in the following way:
    
    sum((x - p(i)) * (x - p(i))) = sum(x * x) + sum(2 * x * p(i)) + sum(p(i) * p(i)) = N * x * x + sum(2 * p(i)) * x + sum(p(i) * p(i))
    
    As the set of pixels, associated with a specific selector is fixed, the sum(2 * p(i)) and sum(p(i) * p(i)) values can be precalculated in advance. This means that error computation for each component now requires only (3 * S) multiplications instead of N (where N is the number of pixels in the processed cluster, and S is the number of selectors, equal to 4 for color components and 8 for alpha components).
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.864 sec
    Modified: 1468204 bytes / 10.794 sec
    Improvement: 7.21% (compression ratio) / 62.60% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.912 sec
    Modified: 1914805 bytes / 14.244 sec
    Improvement: 7.28% (compression ratio) / 61.41% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.125 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Sep 12, 2017
    1
    Copy the full SHA
    335f0ee View commit details
    Browse the repository at this point in the history
  2. Optimize DXT endpoints computation

    This change improves the compression speed for DXT encoding.
    
    Explanation:
    
    The main ideas used for the DXT endpoints computation optimization:
    - Instead of using map in tree clusterizer, the source vectors can be stored in an array and sorted before the quantization. This might increase the amount of used memory, but is much more efficient in terms of memory reallocation.
    - Endpoint caching can be used throughout the color endpoint computation, and not just within the optimize_endpoints function. The only place where endpoint caching can not be used is the final step of the try_combinatorial_encoding function, where alternate rounding is used.
    - When computing endpoint codebooks, endpoint optimizer and endpoint refiner can be reused, which eliminates unnecessary memory reallocations.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.879 sec
    Modified: 1468204 bytes / 11.099 sec
    Improvement: 7.21% (compression ratio) / 61.57% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.919 sec
    Modified: 1914805 bytes / 14.621 sec
    Improvement: 7.28% (compression ratio) / 60.40% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.108 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Sep 12, 2017
    Copy the full SHA
    3053c9d View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2017

  1. Fix miscellaneous compiler warnings

    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.866 sec
    Modified: 1468204 bytes / 11.858 sec
    Improvement: 7.21% (compression ratio) / 58.92% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.878 sec
    Modified: 1914805 bytes / 15.625 sec
    Improvement: 7.28% (compression ratio) / 57.63% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.181 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Sep 11, 2017
    Copy the full SHA
    3e12aff View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2017

  1. Optimize DXT color endpoints computation

    This change significantly improves the compression speed for DXT encoding.
    
    Explanation:
    
    The main ideas used for the DXT color endpoints computation optimization:
    - When the DXT endpoint computation function is called from the qunatization algorithm, almost all of its input parameters (except the color metrics) are hardcoded in the quantization code. This allows to optimize the endpoint evaluation function (which is the bottleneck of the endpoint computation algorithm) for this specific set of parameters.
    - In the original version of the evaluation function, selectors are computed each time when a new endpoint is evaluated. While in fact, this is not necessary, because some selector values are never used, so they can be computed lazily, based on the previously determined optimal endpoint values. This approach significantly reduces the amount of computations.
    
    Other improvements:
    - The original version of Crunch has a minor bug: the counter for the cached endpoint values does not get initialized. This results in nondeterministic DXT conversion of large textures, as the counter overflow can occur at a random moment. The issue is now fixed in the current branch.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.893 sec
    Modified: 1468204 bytes / 11.882 sec
    Improvement: 7.21% (compression ratio) / 58.88% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.946 sec
    Modified: 1914805 bytes / 15.628 sec
    Improvement: 7.28% (compression ratio) / 57.70% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.352 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Aug 11, 2017
    Copy the full SHA
    6b3172f View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2017

  1. Add compression support for ETC2A textures

    This change makes it possible to use Crunch algorithms for ETC textures with Alpha channel.
    
    Explanation:
    
    For simplicity, Crunch algorithms currently do not use ETC2 specific modes (T, H or P). For this reason, the currently used ETC2A compression format is technically equivalent to ETC1 + Alpha. Note that ETC2 encoding is a superset of ETC1, so any texture, which consists of ETC1 color blocks and ETC2 Alpha blocks, can be correctly decoded by an ETC2A (ETC2_RGBA8) decoder.
    
    Compression scheme for ETC2 Alpha blocks is equivalent to the compression scheme for DXT5 Alpha blocks. ETC2 Alpha endpoint clusterization is performed based on the very same output of the Alpha palettizer which is used for DXT5 Alpha. The only part which is actually different is the Alpha endpoint optimization step.
    
    In order to perform ETC2 Alpha encoding, we can first run the already existing algorithm for DXT5 Alpha endpoint optimization, in order to obtain the initial approximate solution. Then the approximate solution is refined based on the ETC2 Alpha modifier table. When performing raw ETC2A encoding, all the 16 ETC2 Alpha modifiers are used during optimization. However, when performing ETC2A quantization, for performance reasons, only 2 Alpha modifiers are currently used (modifier 13, which allows to perform precise approximation on short Alpha intervals, and modifier 11, which has more or less regularly distributed values, and is used for large Alpha intervals).
    
    For compatibility reasons, ETC2 color compression wrappers have also been added to the code, though, as has been mentioned before, at the current moment ETC2 specific modes are not used, so ETC2 color compression is currently equivalent to ETC1 compression.
    
    The ETC decoder functionality has been significantly extended, Crunch is now capable to decode ETC2 and ETC2A textures (input ETC2 textures can have T, H or P blocks).
    
    In order to use ETC2A compression, use the -ETC2A command line option (i.e. "crunch_x64.exe -ETC2A input.png"). By default, compressed ETC2A textures will be decompressed into KTX file format.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.880 sec
    Modified: 1468204 bytes / 13.288 sec
    Improvement: 7.21% (compression ratio) / 53.99% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.936 sec
    Modified: 1914805 bytes / 18.044 sec
    Improvement: 7.28% (compression ratio) / 51.15% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.361 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Aug 4, 2017
    Copy the full SHA
    bec4114 View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2017

  1. Use XOR-deltas for selector codebook encoding

    This change improves compression ratio for both DXT and ETC encodings.
    
    Explanation:
    
    When encoding the deltas between two pixel selectors, it is possible to use XOR-deltas instead of modulo-deltas. At first it might seem counterintuitive that XOR-delta can perform better than modulo-delta, as it does not reflect the continuity properties of the data that well. The actual trick here is that the encoded selectors are first sorted according to the used delta operation and the corresponding metric. The initial distance maps for the XOR-deltas have been obtained experimentally, using bitrate optimization on the test set of images. Additionally, ETC1 decoding has been optimized for speed: all the normal and flipped ETC1 selectors are now computed in advance.
    
    Note: This modification alters the output file format and makes it incompatible with the previous revisions.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.899 sec
    Modified: 1468204 bytes / 13.353 sec
    Improvement: 7.21% (compression ratio) / 53.79% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.985 sec
    Modified: 1914805 bytes / 18.111 sec
    Improvement: 7.28% (compression ratio) / 51.03% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1607858 bytes
    Total time: 17.356 sec
    Average bitrate: 1.363 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Jul 19, 2017
    Copy the full SHA
    54d4084 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2017

  1. Improve selector weight computation for ETC1 encoding

    This change improves compression ratio for ETC1 encoding.
    
    Explanation:
    
    When computing endpoint weights for ETC1 encoding, it is possible to use delta luma instead of the Euclidean distance between the outer endpoint colors, as it gives approximately the same result.
    
    When computing selector weight, it is important to take into account the following factors:
    - The bigger is the difference between the outer endpoint colors, the bigger error can be introduced by the corresponding selector, therefore the bigger should be the weight of that selector. In the original Crunch algorithm, the selector weight is proportional to the squared distance between the outer endpoint colors. Such optimization improves PSNR, but it might also introduce significant distortion in smooth areas of the image. In order to mitigate this effect, it is proposed to limit the maximum difference between the endpoint colors (currently delta luma is limited by 100).
    - Blocks with low difference between the outer endpoint colors introduce relatively small error, so their selectors should have smaller weights. In the original algorithm it is achieved by using squared distance between the outer endpoint colors, though the effect can be amplified further by using powers higher than 2 (currently it is set to 2.7), which improves PSNR.
    
    In the original Crunch algorithm the encoding weights are initialized non-symmetrically (and are set to math::lerp(1.15f, 1.0f, 1.0f / 7.0f) for horizontal split and to math::lerp(1.15f, 1.0f, 2.0f / 7.0f) for vertical split). It is proposed to use the same encoding weight for both splits in case of ETC1 (the used coefficient 0.972 has been computed as math::lerp(1.15f, 1.0f, 1.5f / 7.0f) / 1.15f).
    
    The ETC1 quantization parameters have been adjusted accordingly to preserve the average Luma PSNR.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.843 sec
    Modified: 1473711 bytes / 13.312 sec
    Improvement: 6.86% (compression ratio) / 53.85% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.962 sec
    Modified: 1920600 bytes / 18.122 sec
    Improvement: 7.00% (compression ratio) / 50.97% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1612083 bytes
    Total time: 17.351 sec
    Average bitrate: 1.367 bpp
    Average Luma PSNR: 34.050 dB
    Alexander Suvorov committed Jul 17, 2017
    Copy the full SHA
    e972e0b View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2017

  1. Use diagonal endpoint references for ETC1 encoding

    This change slightly improves compression ratio for ETC1 encoding, and also demonstrates how to adjust the endpoint reference configuration for a specific texture format.
    
    Note: This modification alters the output file format for ETC1 encoding and makes it incompatible with the previous revisions.
    
    Explanation:
    
    In addition to the standard endpoint references (to the top and to the left ETC1 blocks), it is also possible to use an endpoint reference to the top-left diagonal neighbour ETC1 block. Specifically, the first ETC1 subblock will now have the reference value of 3 if the endpoint is copied from the second subblock of the top-left neighbour ETC1 block.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.895 sec
    Modified: 1473711 bytes / 13.356 sec
    Improvement: 6.86% (compression ratio) / 53.78% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.979 sec
    Modified: 1920600 bytes / 18.083 sec
    Improvement: 7.00% (compression ratio) / 51.10% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1633207 bytes
    Total time: 17.434 sec
    Average bitrate: 1.384 bpp
    Average Luma PSNR: 34.057 dB
    Alexander Suvorov committed Jul 12, 2017
    Copy the full SHA
    a004490 View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2017

  1. Use endpoint references for all the ETC1 subblocks

    This change significantly improves compression ratio for ETC1 encoding.
    
    Note: This modification alters the output file format for ETC1 encoding and makes it incompatible with the previous revisions.
    
    Explanation:
    
    Previously, for simplicity, endpoint references for ETC1 encoding have been only computed withing the tiling area. Now endpoint references are computed for all the ETC1 subblocks. This means that endpoints can now be inherited from the surrounding ETC1 blocks, which significantly improves the compression ratio.
    
    Endpoint references for ETC1 subblocks are encoded in the following way:
    - The first ETC1 subblock has the reference value of 0 if the endpoint is decoded from the input stream, the value of 1 if the endpoint is copied from the second subblock of the left neighbour ETC1 block, and the value of 2 if the endpoint is copied from the first subblock of the top neighbour ETC1 block.
    - The second ETC1 subblock has the reference value of 0 if the endpoint is copied from the first subblock, the value of 1 if the endpoint is decoded from the input stream and the corresponding ETC1 block is split horizontally, and the value of 2 if the endpoint is decoded from the input stream and the corresponding ETC1 block is split vertically.
    
    DXT Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.901 sec
    Modified: 1473711 bytes / 13.353 sec
    Improvement: 6.86% (compression ratio) / 53.80% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.997 sec
    Modified: 1920600 bytes / 18.096 sec
    Improvement: 7.00% (compression ratio) / 51.09% (compression time)
    
    ETC Testing:
    
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1639063 bytes
    Total time: 17.421 sec
    Average bitrate: 1.389 bpp
    Average Luma PSNR: 34.057 dB
    Alexander Suvorov committed Jul 11, 2017
    Copy the full SHA
    7402f3d View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2017

  1. Use modulo deltas for selector codebook encoding

    This change improves compression ratio for both DXT and ETC encodings.
    
    Explanation:
    
    In the original version of Crunch, selector codebook is encoded with Huffman coding applied to the raw deltas between corresponding pixel selectors of the neighbour codebook elements. However, using Huffman coding for raw deltas has a downside. Specifically, for each individual pixel selector, only about a half of all the possible raw deltas are valid. Indeed, once the value of the current selector is determined, the selector delta depends only on the next selector value, so only N out of 2 * N - 1 total raw delta values are possible at any specific point. And yet, the impossible raw delta values are encoded with a non-zero probability, as the probability table is calculated throughout the whole codebook.
    
    The situation can be improved by using modulo deltas instead of raw deltas (modulo 4 for color selectors and modulo 8 for alpha selectors). This eliminates the mentioned implicit restriction on the value of selector delta, and therefore improves the compression ratio. The distance maps are initialized using squared distances between the selector values (the distances are calculated on a wrapped interval, according to the modulo arithmetics).
    
    DXT Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (rev ea9b8d8).
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.870 sec
    Modified: 1473711 bytes / 13.286 sec
    Improvement: 6.86% (compression ratio) / 53.98% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.991 sec
    Modified: 1920600 bytes / 18.035 sec
    Improvement: 7.00% (compression ratio) / 51.24% (compression time)
    
    ETC Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1681327 bytes
    Total time: 17.403 sec
    Average bitrate: 1.425 bpp
    Average Luma PSNR: 34.057 dB
    Alexander Suvorov committed Jul 10, 2017
    Copy the full SHA
    e3c1c6b View commit details
    Browse the repository at this point in the history

Commits on Jul 7, 2017

  1. Use 4x4 selector dictionary for ETC1 compression

    This change significantly improves the ETC1 compression ratio.
    
    Explanation:
    
    As has been shown in the previous commit, each element of the ETC1 endpoint dictionary should correspond to a single ETC1 base color. In order to achieve near-lossless compression with unlimited dictionary, it has been proposed to use 4x2 or 2x4 ETC1 subblocks as building elements, defined by a single endpoint and selector. This scheme is equivalent to the original DXT compression scheme, expect the different size of the block, defined by the dictionary elements.
    
    Now let's pay attention to the following interesting observation. Even though in the original DXT compression scheme the dictionaries are defined in such a way, so that both endpoints and selectors from the dictionaries correspond to the same size of the decoded block (in case of DXT it is 4x4), there is no requirement for this implied by the Crunch algorithms. In fact, selector dictionary and indices are defined after the endpoint optimization is complete. At this point each image pixel is already associated with a specific endpoint. At the same time, the selector computation step is only using those per-pixel endpoint associations as an input information, so the size and the shape of the blocks, defined by selector dictionary elements, does not depend in any way on the size or shape of the blocks, defined by endpoint dictionary elements.
    
    In other words, the endpoint space of the texture can be split into one set of blocks, defined by endpoint dictionary and endpoint indices. And the selector space of the texture can be split into absolutely different set of blocks, defined by selector dictionary and selector indices. Endpoint blocks can be different in size from the selector blocks, as well as endpoint blocks can overlap in arbitrary way with the selector blocks, and such setup will still be fully compatible with the existing Crunch algorithms.
    
    In the current commit, the size of the block, defined by an ETC1 selector dictionary element, has been set to 4x4, which significantly improves the compression ratio (the ETC1 quantization parameters have been adjusted to preserve the average Luma PSNR).
    
    Future research:
    The discovered property of the Crunch algorithms opens another dimension for optimization of the compression ratio. Specifically, the quality of the compressed selectors can now be adjusted in two ways: by changing the size of the selector dictionary and by changing the size of the selector block. Note that both DXT and ETC formats have selectors encoded as plain bits in the output format, so there is no technical limitation on the size or shape of the selector block (though, for performance reasons, non-power-of-two selector blocks might require some specific optimizations in the decoder).
    
    DXT Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.859 sec
    Modified: 1482780 bytes / 13.326 sec
    Improvement: 6.28% (compression ratio) / 53.82% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.996 sec
    Modified: 1931586 bytes / 18.121 sec
    Improvement: 6.47% (compression ratio) / 51.02% (compression time)
    
    ETC Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1692204 bytes
    Total time: 17.528 sec
    Average bitrate: 1.434 bpp
    Average Luma PSNR: 34.057 dB
    Alexander Suvorov committed Jul 7, 2017
    Copy the full SHA
    205f8a1 View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2017

  1. Add compression support for ETC1 textures

    Explanation:
    
    Crunch algorithms are normally used for compression of DXTn textures. However, Crunch algorithms are much more powerful, and with some minor adjustments, those algorithms can be directly used to compress other texture formats. For example, the current commit demonstrates how to use the existing Crunch algorithms to compress ETC1 textures.
    
    Basics:
    
    In general, Crunch is performing the following steps:
    - tiling (determines block encodings)
    - quantization of the tile endpoints (determines endpoint indices)
    - optimization of the endpoints for each tile group (determines endpoint dictionary)
    - quantization of the selectors (determines selector indices)
    - selector refinement for each selector group (determines selector dictionary)
    - compression of the previously determined block encodings, dictionaries and indices
    
    Dictionary element:
    
    When applying Crunch algorithms to a new texture format, it is necessary to first define the dictionary element. In context of Crunch, this means thats the whole image consists of smaller non-overlapping blocks, while the contents of each individual block is determined by an endpoint and a selector from the corresponding dictionaries. For example, in case of DXT format, each endpoint and selector codebook element corresponds to a 4x4 pixel block. In general, the size of the blocks, which form the encoded image, depends on the texture format and quality considerations.
    
    It is proposed to define the dictionaries according to the following limitations:
    - The dictionary elements should be compatible with the existing Crunch algorithms, while the image blocks defined by those dictionary elements should be compatible with the texture encoding format.
    - It should be possible to cover a wide range of image quality and bitrates by just changing the size of the endpoint and selector dictionaries. If there is no limitation on the dictionary size, the encoding should preferably become lossless or near-lossless (not considering the quality loss implied by the texture format itself).
    
    In case of ETC1, the texture format itself determines the minimal size of the image block, defined by endpoint and selector: it can be either 2x4 or 4x2 rectangle, aligned to the borders of the 4x4 grid. It is not possible to use higher granularity, because each of those rectangles can have only one base color, according to the ETC1 format. For the same reason, any image block, defined by an endpoint and a selector from the dictionary, should be combined from those aligned 2x4 or 4x2 rectangles.
    
    Let's investigate the possibilities for the endpoint dictionary. According to the ETC1 format, each 4x4 ETC1 block is split in half, while each ETC1 subblock has it's own base color and a modifier table index. In fact, the base color and the modifier table index simply define the high and the low colors for the subblock (while there are some limitations on the position of those high and low colors, implied by the ETC1 encoding). If we define the endpoint dictionary element in such a way that it contains information about more than one ETC1 base color, then such a dictionary will become incompatible with the existing tile quantization algorithm, and the reason for this is the following. The Crunch tiling algorithm first performs quantization of all the tile pixel colors, down to just 2 colors. Then it quantizes all those color pairs, coming from different tiles. This approach works quite well for 4x4 DXT blocks, as those 2 colors approximately represent the principle component of the tile pixel colors. In case of ETC1 however, mixing together pixels, which correspond to different base colors, does not make much sense, as each group of those pixels has it's own low and high color values, independent from other groups. When those pixels are mixed together, the information about the original principle components of each subblock gets lost.
    
    For the mentioned reason, each endpoint dictionary element should correspond to a single ETC1 base color. In such case, the tile quantization algorithm will work almost the same way as for DXT format. Each pair of colors, generated by the tile palletizer, will normally have the subblock base color value somewhere in the middle between those 2 colors, so quantizing those color pairs should also automatically quantize the corresponding base colors. Moreover, each color pair implicitly contains information about the modifier table index (which corresponds to the distance between the high and the low colors), and therefore the corresponding table index will also get automatically quantized.
    
    Endpoint and selector dictionary elements, which define a single 2x4 or 4x2 ETC1 subblock, are fully compatible with the existing Crunch algorithms (because each ETC1 subblock is associated with a single base color and a single modifier table index). At the same time, those subblocks are minimal possible blocks, which can be defined by a dictionary element for ETC1 format (as has been shown earlier). Of course, it is also possible to use blocks larger than 2x4 or 4x2 (assuming that all the ETC1 subblocks, which form such a block, will have the same base color and the same modifier table index), however, with a larger block area it would be not possible to achieve near-lossless quality when the dictionary size is not limited.
    
    As the result, it is proposed to define the dictionaries in the following way:
    - Each element of the endpoint dictionary defines a single base color and a single modifier table index of a 2x4 or a 4x2 pixel block (which represents an ETC1 subblock).
    - Each endpoint is encoded as 3555 (3 bits for the table index and 5 bits for each component of the base color).
    - Each element of the selector dictionary defines selectors for a 2x4 or a 4x2 block.
    - Each selector is encoded using 16 bits.
    
    ETC1-specific adjustments:
    
    In case of DXT, the size of the encoded block is 4x4, while the tiling is performed in a 8x8 area (4 blocks). In case of ETC1, the tiling can be performed either in a 4x4 area (2 blocks), or in a 8x8 area (8 blocks), while other possibilities are either not symmetrical or too complex. For simplicity it is proposed to use 4x4 area for tiling. There are therefore 3 possible encodings: the 4x4 block is not split (encoded with a single endpoint), the 4x4 block is split horizontally, the 4x4 block is split vertically.
    
    For simplicity, endpoint references are currently determined only within the tiling area, while the encoding of the endpoint references has been adjusted in the following way:
    - The first ETC1 subblock will always have the reference value of 0
    - The second ETC1 subblock can have the reference value of 0 if it has the same endpoint as the first subblock (note that in such case the flip of the ETC1 block does not need to be defined), the value of 1 if the corresponding ETC1 block is split horizontally, and the value of 2 if the corresponding ETC1 block is split vertically
    
    According to the ETC1 format, the base colors within an ETC1 block can be encoded either as 444 and 444, or differentially as 555 and 333. For simplicity, this aspect is currently not taken into account (all the endpoints are encoded as 3555 in the codebook). If it appears that the base colors in the resulting ETC1 block can not be encoded differentially, the decoder will convert both base colors from 555 to 444.
    
    At first, it might look like the ETC1 block flipping can bring some complications for Crunch, as the subblock structure might not look like a grid. This can be easily resolved by mirroring all the vertical ETC1 blocks across the main diagonal of the block after the tiling step (so that all the ETC1 subblocks will become 4x2 and form a regular grid). The decoder can mirror the ETC1 selector back according to the decoded block flip.
    
    The code adjustments for the ETC1 compression support are pretty straightforward and mostly trivial. Just note that when format-specific adjustments affect performance critical code, it makes sense to duplicate the body of the affected function and perform format-specific optimizations in each copy of the function individually. For performance reasons, the following 4 functions now got both ETC and DTX specific versions:
    - determine_tiles_task_etc() is an ETC-optimized version of the determine_tiles_task(), where dxt_fast class has been replaced with the etc1_optimizer class.
    - determine_color_endpoint_codebook_task_etc() is an ETC-optimized version of the determine_color_endpoint_codebook_task(), where dxt1_endpoint_optimizer class has been replaced with the etc1_optimizer class.
    - pack_color_endpoints_etc() is an ETC-optimized version of the pack_color_endpoints(), where 565565 DXT color endpoint encoding has been replaced with 3555 ETC color endpoint encoding.
    - unpack_etc1() is an ETC version of the unpack_dxt1() function.
    
    The color_quality_power_mul and m_adaptive_tile_color_psnr_derating parameters for ETC1 format have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the equivalent DXT1 compression, when compressing the Kodak test set without mipmaps using default quality.
    
    In order to use ETC1 compression, use the -ETC1 command line option (i.e. "crunch_x64.exe -ETC1 input.png"). By default, compressed ETC1 textures will be decompressed into KTX file format.
    
    DXT Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps using DXT1 encoding]
    Original: 1582222 bytes / 28.876 sec
    Modified: 1482780 bytes / 13.255 sec
    Improvement: 6.28% (compression ratio) / 54.10% (compression time)
    
    [Compressing Kodak set with mipmaps using DXT1 encoding]
    Original: 2065243 bytes / 36.987 sec
    Modified: 1931586 bytes / 18.068 sec
    Improvement: 6.47% (compression ratio) / 51.15% (compression time)
    
    ETC Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
    
    [Compressing Kodak set without mipmaps using ETC1 encoding]
    Total size: 1887265 bytes
    Total time: 14.954 sec
    Average bitrate: 1.600 bpp
    Average Luma PSNR: 34.049 dB
    Alexander Suvorov committed Jul 5, 2017
    Copy the full SHA
    f284523 View commit details
    Browse the repository at this point in the history

Commits on Jun 16, 2017

  1. Optimize selector codebook creation algorithm

    This change significantly improves compression speed.
    
    Explanation:
    When generating selector codebook, pixel selectors can be processed in groups, while the intermediate error results for those groups can be precalculated.
    
    Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps]
    Original: 1582222 bytes / 28.865 sec
    Modified: 1482780 bytes / 13.340 sec
    Improvement: 6.28% (compression ratio) / 53.78% (compression time)
    
    [Compressing Kodak set with mipmaps]
    Original: 2065243 bytes / 36.988 sec
    Modified: 1931586 bytes / 18.087 sec
    Improvement: 6.47% (compression ratio) / 51.10% (compression time)
    Alexander Suvorov committed Jun 16, 2017
    Copy the full SHA
    39b85b7 View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2017

  1. Optimize endpoint and selector sorting algorithms

    This change significantly improves compression speed.
    
    Explanation:
    The main ideas used for the endpoint and selector sorting optimization:
    - unpacked color and alpha endpoints can be cached
    - pixel selectors can be processed in groups, while the intermediate error results for those groups can be precalculated
    - instead of maintaining the mask of the processed elements, the remaining elements can be reorganized to form a continuous block on each iteration (the last remaining element is moved into the position of the processed element)
    - after optimization, endpoint sorting works significantly faster than endpoint reordering, so the overall performance can be improved by moving selector optimization into the endpoint sorting thread
    
    Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps]
    Original: 1582222 bytes / 28.863 sec
    Modified: 1482780 bytes / 14.564 sec
    Improvement: 6.28% (compression ratio) / 49.54% (compression time)
    
    [Compressing Kodak set with mipmaps]
    Original: 2065243 bytes / 36.968 sec
    Modified: 1931586 bytes / 19.717 sec
    Improvement: 6.47% (compression ratio) / 46.66% (compression time)
    Alexander Suvorov committed Jun 14, 2017
    Copy the full SHA
    eee6b26 View commit details
    Browse the repository at this point in the history

Commits on Jun 9, 2017

  1. Improve and optimize the endpoint reordering algorithm

    This change significantly improves the compression ratio and compression speed.
    
    Explanation:
    After the endpoint codebook has been determined, the endpoints can be reordered in order to improve the compression ratio. On the one hand, endpoint indices of the neighbor blocks should be similar, as the encoder compresses the deltas between those neighbour indices. On the other hand, the neighbor endpoints in the codebook should be also similar, as the encoder compresses the deltas between the color components of those neighbor endpoints. The optimization is based on the Zeng's technique, using a weighted function which takes into account both similarity of the endpoint indices for the neighbor blocks and similarity of the neighbor endpoints in the codebook.
    
    The similarity of the endpoint indices is optimized using the combined neighborhood frequency of the candidate endpoint and all the currently selected endpoints in the list. The similarity of the neighbor endpoints in the codebook is optimized using euclidian distance from the candidate endpoint to the extremity of selected endpoints list. The original optimization function for the endpoint candidate (i) can be represented as:
    
    F(i) = (total_neighborhood_frequency(i) + 1) * (endpoint_similarity(i) + 1)
    
    The problem with this approach is the following. While the endpoint_similarity(i) has a limited range of values, the total_neighborhood_frequency(i) grows rapidly with the increasing size of the selected endpoints list. With each iteration this introduces additional disbalance for the weighted function. In order to minimize this effect, is it proposed to normalize the total_neighborhood_frequency(i) on each iteration. For computational simplicity, the normalizer is computed as the optimal total_neighborhood_frequency value from the previous iteration, multiplied by a constant. The modified optimization function can be represented as:
    
    F(i) = (total_neighborhood_frequency(i) + total_neighborhood_frequency_normalizer) * (endpoint_similarity(i) + 1)
    
    The main ideas used for endpoint reordering optimization:
    - all the computations, which are common for the endpoint reordering threads, have been moved outside of the threads
    - the ordering histogram offsets, which point to the neighborhood frequency values for a specific endpoint, are now cached, which reduces the number of multiplications when accessing the histogram
    - floating point operations have been replaced with integer operations
    
    Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps]
    Original: 1582222 bytes / 28.873 sec
    Modified: 1482726 bytes / 15.791 sec
    Improvement: 6.29% (compression ratio) / 45.31% (compression time)
    
    [Compressing Kodak set with mipmaps]
    Original: 2065243 bytes / 36.925 sec
    Modified: 1931475 bytes / 20.970 sec
    Improvement: 6.48% (compression ratio) / 43.21% (compression time)
    Alexander Suvorov committed Jun 9, 2017
    Copy the full SHA
    f1d6a5a View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2017

  1. Completely remove all the chunk related code from the encoder and dec…

    …oder
    
    This change slightly improves compression speed and simplifies further modification of the code.
    
    Explanation:
    Additional performance boost is achieved by using linear representation for selectors and storing block selectors in a single uint32/uint64.
    
    Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps]
    Original: 1582222 bytes / 28.927 sec
    Modified: 1494501 bytes / 17.301 sec
    Improvement: 5.54% (compression ratio) / 40.19% (compression time)
    
    [Compressing Kodak set with mipmaps]
    Original: 2065243 bytes / 36.992 sec
    Modified: 1945365 bytes / 22.548 sec
    Improvement: 5.80% (compression ratio) / 39.05% (compression time)
    Alexander Suvorov committed Jun 7, 2017
    Copy the full SHA
    5822475 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2017

  1. Switch from chunk encoding to block encoding while performing image q…

    …uantization
    
    This change improves compression speed and simplifies further modification of the code.
    
    Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps]
    Original: 1582222 bytes / 28.947 sec
    Modified: 1494501 bytes / 17.642 sec
    Improvement: 5.54% (compression ratio) / 39.05% (compression time)
    
    [Compressing Kodak set with mipmaps]
    Original: 2065243 bytes / 36.965 sec
    Modified: 1945365 bytes / 22.989 sec
    Improvement: 5.80% (compression ratio) / 37.81% (compression time)
    Alexander Suvorov committed Jun 2, 2017
    Copy the full SHA
    e7d458a View commit details
    Browse the repository at this point in the history

Commits on May 31, 2017

  1. Switch from chunk encoding to block encoding after the tile computation

    This change improves compression speed and simplifies further modification of the code.
    
    Explanation:
    This change is required for further optimization of the tile computation code. Additional performance boost is achieved by moving the tile palettizing into the tile computation thread.
    
    Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps]
    Original: 1582222 bytes / 28.928 sec
    Modified: 1494501 bytes / 18.259 sec
    Improvement: 5.54% (compression ratio) / 36.88% (compression time)
    
    [Compressing Kodak set with mipmaps]
    Original: 2065243 bytes / 36.978 sec
    Modified: 1945365 bytes / 23.857 sec
    Improvement: 5.80% (compression ratio) / 35.48% (compression time)
    Alexander Suvorov committed May 31, 2017
    Copy the full SHA
    cd9ba9b View commit details
    Browse the repository at this point in the history

Commits on May 19, 2017

  1. Optimize selector quantization, assignment and refinement

    This change significantly improves compression speed.
    
    Explanation:
    The main ideas used for selector computations optimization:
    - possible pixel values for each endpoint can be cached
    - the distances between the possible pixel values and the actual pixels values within a block can be cached for fast error computation during selector assignment
    - selector refinement can be efficiently integrated with the selector assignment, as it is based on the same set of cached error values
    - using block encoding instead of chunk encoding for both endpoints and selectors eliminates extra levels of indirection
    
    Testing:
    The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
    
    [Compressing Kodak set without mipmaps]
    Original: 1582222 bytes / 28.953 sec
    Modified: 1494501 bytes / 19.667 sec
    Improvement: 5.54% (compression ratio) / 32.07% (compression time)
    
    [Compressing Kodak set with mipmaps]
    Original: 2065243 bytes / 36.998 sec
    Modified: 1945365 bytes / 25.642 sec
    Improvement: 5.80% (compression ratio) / 30.69% (compression time)
    Alexander Suvorov committed May 19, 2017
    Copy the full SHA
    7b6f456 View commit details
    Browse the repository at this point in the history
Older