Jaguar is a CUDA-accelerated Channel Optimized Scalar Quantizer (COSQ) generator. By passing it a file consisting of a series of doubles, Jaguar will generate near-optimal quantization points for the given sequence.
The command line arguments for jaguar are in the following format::
./jaguar -b "bit_rate" -t "training_length" -f "sequence_file"Where
- bit_rate is the bit rate of the quantizer which jaguar will produce. The bit rate must be between 1 and 10 inclusive
- training_length is the length of the training sequence in the file. The length must be 2^20 to 2^25 inclusive, and must be a power of 2.
- sequence_file is the name of the file in the same directory of the exe, containing "training_length" number of doubles.
Please see Getting Started for more details.
This project was inspired by MTHE 493 at Queen's University. During my final year in the Applied Mathematics and Engineering program, my team and I were tasked with creating an image transmission system which could maintain reasonable image fidelity. The main challenge for this project was generating COSQs for image transmission due to the very long latency.
Using Python I created a naive sequential implementation which generated COSQs. It worked fast enough for low bit rates and small training sizes, however became difficult to work with when using high bit rates and training sizes larger than 100,000.
- Low bit rates considered to be [1 .. 4] and high bit rates are [5 .. 10]
- Large training sizes are > 100,000
Even after writing the same implementation in C++, performance was still lacking at high bit rates and larger training sizes. It was particularly frustrating since the team was under tight deadlines and training an 8 bit quantizer with ~1,000,000 training elements took almost 2 hours.
After the course was over, I still wanted to investigate ways to speed up heavy computational workloads, which ultimately led to this project.
Below is a very brief introduction into scalar quantization and channel modelling.
A scalar quantizer is simply a map
The quantizer
and
It is clear that every element of the real numbers cannot be represented by a value of C without losing accuracy, so why would one do this? In the case of image transmission, this idea is very powerful because images can be very large files. To accomodate the transmission of such large files, which would have high latency, one can instead reduce the precision of the image by quantizing the image data, and send the quantized data rather than the raw image. Although the image sent will not be the same as the received image, the overall fidelity of the image will not have changed much, provided that the channel error rate is not high.
The error between the original value and its associated quantization point is called the distortion. By representing
The rate of the scalar quantizer
The logarithm is base 2 because the quantization points will be represented in binary. If we were using ternary numbering, the logarithm would be base 3.
Note: In the source code, I typically use levels to denote the number of quantization levels, and q_points to denote the codebook.
The goal of optimizing scalar quantizers is to minimize the distortion for a given rate or a fixed
In addition to the error incurred from quantization, there is also error from sending data over a channel. The "channel", in the physical sense, is a medium in which electromagnetic signals are sent through to transmit a message. From the mathematical perspective, we only care about the likelyhood in which our message is transmitted without any distortion.
To "model" the probability that the message was received correctly or incorrectly, Jaguar uses the Pólya Channel. The Pólya Channel is an example of a channel with memory. Modeling a Pólya channel generally uses 2 main parameters,
and
then use it to model
This channel is said to model the 'bursty' nature of noise in wireless channels, as when an error occurs it becomes more likely to occur in the next bit as well.
Before diving into the COSQ algorithm, one first needs an initial codebook. To generate the initial codebook, the "Splitting Technique" is used. This is described in "A study of vector quantization for noisy channels", pg. 806 B. Jaguar uses this algorithm to initialize the codebook.
The COSQ algorithm involves three main operations. This algorithm is very similar to the Lloyd-Max algorithm, except it has been modified to account for channel transmission errors.
Consider the case where there is a fixed codebook
The NNC says that every training element
In the case of Jaguar, since the channel must be acccounted for and it uses the squared error, the NNC becomes
Where
Given the fixed quantizer cells
The average overall distortion (both from channel and quantization) can be computed as
where
-
$N$ is the number of quantization points -
$n$ is the number of training sequence elements -
$p$ is the channel transition matrix -
$i(t)$ is the index of the quantizer cell for which training element belongs to
Putting all the above pieces together, the COSQ algorithm for Jaguar is as follows:
Input: bit rate
d_current = 0;
d_previous = DBL_MAX;
codebook = splitting_technique();
while(true) {
NNC();
CC();
d_current = distortion();
if((d_previous - d_current) / d_previous < THRESHOLD) {
break;
}
d_previous = d_current;
}- docs documentation for the project
- measurements execution times of jaguar and sequential impl for various bit rates.
- test_kernels smaller projects used to profile & test kernels used by Jaguar individually.
- sequential sequential COSQ algorithm implementation
- src source code for Jaguar
- tests test results (accuracy_test.sh) for various bit rates
This project was developed on Windows 11 WSL 2.
Hardware
- Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz (12 CPUs), ~3.7GHz
- NVIDIA GeForce RTX 2070
- 16384MB RAM
The result of this project is very positive. Jaguar provides significant speedup compared to the sequential implementation written in C++. Please see the graphs below:
Important! These graphs were collected under the following parameters.
- Training size 1,048,576 (2^20).
- Splitting technique delta 0.001
- COSQ Threshold 0.01
- Pólya Channel parameters
$\epsilon = 0, \delta = 0$
Although Jaguar performs very well in comparison, it can still be improved. Please see Improvements & Future work.
Because the project was done on WSL, these performance measurements are NOT 100% accurate. There were certainly some background processes on Windows stealing the CPU during these measurements, and the same is true for the GPU since it was rendering both monitors I have connected.
I would like to all the people who have helped me with this project.
- Dr. Fady Alajaji for his mathematical guidance.
- Dr. Tamas Linder for supervising my capstone project.
- Dr. Ahmad Afsahi for his guidance on CUDA related resources.
- AmirHossein Sojoodi for providing project improvements.
There are many notable resources that the project has used. Below is a full list of all resources.
- Wen-mei W. Hwu, David B. Kirk, Izzat El Hajj - Programming Massively Parallel Processors. A Hands-on Approach-Elsevier (2023)
- Nariman Farvardin - A Study of Vector Quantization for Noisy Channels
- Kiraseya Preusser - Channel Optimized Scalar Quantization over Orthogonal Multiple Access Channels with Memory
- Optimizing Parallel Reduction in CUDA - Mark Harris
- An Algorithm for Vector Quantizer Design - Yoseph Linde, Andres Buzo, Robert M. Gray
- Channel Optimized Quantization of Images over Binary Channels with Memory - Julian Cheng
- Using Simulated Annealing to Design Good Codes - Abbas A. El Gamal, Lane A. Hemachandra, Itzhak Shperling, Victor K. Wei

