# Compressed Sensing (CS) based ECG compressor

# Theoretical Review

## Theory: sparsity and compression

### Understanding Sparsity
__Theoretical Sparsity__

A signal $s \in \mathbb{R}^n$ is considered $k$-sparse if it has exactly $k$ non-zero elements, with $k \ll n$. This means that $n-k$ elements of the signal are exactly zero.
$$
s = \begin{pmatrix} s_1 \\ s_2 \\ \vdots \\ s_n \end{pmatrix}
$$
where exactly $k$ elements in $s$ are non-zero, and the remaining $n-k$ elements are zero.


__Practical Sparsity__

In real-world signals, _exact sparsity is rare_. Instead, signals are often _representable_ (see next section) as __approximately sparse__: only $k$ elements _of the sparse representation_ are significant and carry most of the signal's information, the remaining $n-k$ elements have small, negligible values. 

The difference lies in the fact that the $n-k$ coefficients are small but not exactly zero.

### Sparse Representation of Signals

"Most natural signals, such as images and audio, are highly compressible. This compressibility means that, when the signal is written in an appropriate basis, only a few modes are active, thus reducing the number of values that must be stored for an accurate representation. In other words, a compressible signal $x \in \mathbb{R}^n$ may be written as a sparse vector $s \in \mathbb{R}^n$ in a transform basis $\Psi \in \mathbb{C}^{n \times n}$:

$$
x = \Psi s.
$$

If the basis $\Psi$ is generic, such as the Fourier or wavelet basis, then only the few active terms in $s$ are required to reconstruct the original signal $x$, reducing the data required to store or transmit the signal." [2]

### Classic Transformation-Based Compression

A typical transformation-based compression algorithm involves the following steps:

1. __Signal capture__: 
    Fully sense a whole __raw__ signal $x$ and store it. In this project $x$ are the _voltages_ measured by the ECG machine.
2. __Transformation to a sparse domain__:
    The signal $x$ is transformed to a sparse domain, basically we want to find the sparse vector $s \in \mathbb{R}^n$, that contain mostly negligible coefficients.

    We exploit $\Psi \in \mathbb{C}^{n \times n}$ orthogonal basis matrix, also called __dictionary__. Being $\Psi$ an orthonormal basis, it satisfies $\Psi^H \Psi = I$, where $\Psi^H$ is the Hermitian conjugate (conjugate transpose) of $\Psi$, and $I$ is the identity matrix. This implies that $\Psi^{-1} = \Psi^H$, making the transformation and its inverse straightforward.

    Therefore, when $ \Psi $ is an orthonormal basis, applying $ \Psi^H $ to the signal effectively inverts the transformation applied by $ \Psi $, we can use this to obtain sparse representation from original signal:

    $$
    s = \Psi^H x
    $$

    __The use of transforms__:
    On a mathematical note: $\Psi$ is an orthonormal basis composed of functions like Fourier Function, Wavelet, and so on.
    The actual computation of $s$ doesn't actually build a dictionary $\Psi$ to invert and multiplicate to the signal. Instead it directly applies the _transform_ (e.g. FFT, DWT, DCT, ...) to the signal $x$, to immediately obtain _sparse representation_ $s$.

3. __Sparsification__: 
    A fundamental concept is that a threshold is applied to the coefficients, retaining only those that are significant (i.e., above the threshold) and discarding the rest.

    A more detailed view reveals that these steps can be performed using a wide range of techniques, depending on the transform employed, and equivalently, on the choice of dictionary.

    _This will not be explored as it is not the subject of this project, it's a vast and intresting topic, Brunto&Kutz book in reference provide a good reference to explore more..._

4. __Encoding__:
    The retained coefficients and their positions are then encoded for storage or transmission. 

    _Another huge chapter that will not be explored here, again you can refer to the referenced book for more_

__Complexity__

Such methods can be _extremely_ effective, but they require a _thresholding/sparsification_ step, which introduces non-linearity and computational complexity. 

In the following is shown that CS-based methods can provide an alternative solution with different advantages...

<center>
    <img src="./.img/MethodsComparison.png" alt="MethodsComparison.png" width="600">
</center>

### Compressed Sensing (CS)

"Mathematically, compressed sensing exploits the _sparsity of a signal_ in a __generic basis__ to achieve full signal reconstruction from surprisingly few measurements.

If a __signal $x$ is k-sparse in $\Psi$ (it's a requirement),__ then instead of measuring $x$ directly (n measurements) and then compressing, it is possible to collect dramatically fewer randomly chosen or compressed measurements and then solve for the non-zero elements of s in the transformed coordinate system." [2]


#### Measurement

Instead of acquiring all $n$ samples, a reduced set of $m$ measurements is obtained directly by projecting the signal $x$  onto a measurement matrix $\Phi$, storing a _compressed measurement_ $y$:

$$
y = \Phi x
$$


where:
- $x \in \mathbb{R}^n$ _real_ signal coming from sensors
- $y \in \mathbb{R}^m$ _compressed measurement_
- $\Phi \in \mathbb{R}^{m \times n}$ with $m \ll n$ is the _measurement matrix_.

__Key concept__:

In the measurement phase the _sparse representation_ $s$ is __not__ computed, we directly apply the _measurement matrix_ to the _real_ signal $x$. 

$\Phi$ does not simply "select" $m$ out of $n$ coefficients out of $x$. Instead, $\Phi$ typically contains random or structured elements that ensure the measurements $y$ retain sufficient information to __later recover__ the sparse signal $s$. 

Although the _signal $x$ itself is not sparse in the time domain_, __compressed sensing theory exploits the fact that $s$ can be sparsely represented in some transform domain__ (e.g., wavelet or Fourier domain).

_Measurement matrix topic is explored some chapters ahead_


#### Recovery

With knowledge of $s \in \mathbb{R}^n$ _sparse representation_ of $x$ through $\Psi$ _dictionary_, it is possible to recovery $x$ itself as previously shown with:
$$
x = \Psi s
$$

Thus the goal of compressed sensing is to find the __sparsest__ vector $s$ that is consistent with:

$$
y = \Phi x = \Phi \Psi s
$$

where (again):
- $x \in \mathbb{R}^n$ _real_ signal coming from sensors
- $y \in \mathbb{R}^m$ _compressed measurement_
- $\Psi \in \mathbb{R}^{n \times n}$ is the _dictionary_ (same as explained in previous section)
- $\Phi \in \mathbb{R}^{m \times n}$ with $m \ll n$ is the _measurement matrix_.
- $s \in \mathbb{R}^n$ is the _sparse representation_ of $x$ in $\Psi$

__Non convex problem__

"Such system of equations is __under-determined__ since there are infinitely many consistent solution $s$. The __sparsest solution__ is the one that satisfies:

$$
\hat{s} = \arg_{s} \min \|s\|_0 \text{ subject to } y = \Phi \Psi \alpha
$$

where $\min \|s\|_0$ denotes the $\ell_0$-pseudo-norm, given by the _non-zero entries_, also referred as the _cardinality_ of $s$.

The optimization is non-convex, and in general, the solution can only be found with a brute-force search that is combinatorial in $n$ and $K$. In particular, all possible $K$-sparse vectors in $\mathbb{R}^n$ must be checked; if the exact level of sparsity $K$ is unknown, the search is even broader. Because this search is combinatorial, solving such minimization is intractable for even moderately large $n$ and $K$, and the prospect of solving larger problems does not improve with Moore’s law of exponentially increasing computational power."[2]

__Convex equivalent problem__

Fortunately, under certain conditions on the measurement matrix $\Phi$, it is possible to relax the optimization to a convex $\ell_1$-minimization.

$$
\hat{s} = \arg_{s} \min \|s\|_1 \text{ subject to } y = \Phi \Psi \alpha
$$

__In the presence of noise__, the recovery problem is modified to:

$$
\hat{s} = \arg_{s} \min \|s\|_1 \text{ subject to } \|y - \Phi \Psi s\|_2 \leq \epsilon
$$

where $\epsilon$ is a bound on the noise level.

There are very specific conditions that must be met for the $\ell_1$-minimization to converge with high probability to the sparsest solution of $\ell_0$-minimization. They can be summarized as follows:
- __Incoherence__: 
    A critical concept in compressed sensing is the _incoherence_ between the measurement matrix $\Phi$ and the dictionary $\Psi$. Incoherence refers to the property that ensures that the rows of $\Phi$ are not too similar to the columns of $\Psi$. This incoherence is vital because it allows the sparse information in the signal $x$ (which is represented in the domain of $\Psi$) to be evenly spread across the measurements $y$. This spreading ensures that no single measurement in $y$ captures too much or too little information about the signal $x$, which is essential for accurate recovery of the sparse signal $s$ from the measurements $y$.

- __Recoverability Condition:__ 
    A $K$-sparse signal $s \in \mathbb{R}^n$ can be properly recovered after Compressive Sensing (CS) if the number of measurements $m$ satisfies:

    $$
    m \geq C K \log\left(\frac{n}{K}\right)
    $$

    where $C$ is a constant that depends on how __incoherent__ $\Phi$ and $\Psi$ are. This condition ensures that enough measurements are taken to accurately recover the sparse signal, accounting for both sparsity and the ambient dimension $n$.

    The recoverability condition is a practical guideline that tells you how many measurements $m$ you need to take to ensure that a $k$-sparse signal $s \in \mathbb{R}^n$ can be recovered accurately. The $\log\left(\frac{n}{k}\right)$ term accounts for the dimensionality reduction that occurs when mapping an $n$-dimensional signal into an $m$-dimensional measurement space.

"Roughly speaking, these two conditions guarantee that the matrix $\Phi |Psi$ acts as a unitary transformation on K-sparse vectors $s$, preserving relative distances between vectors and enabling almost certain signal reconstruction with $\ell_1$ convex minimization. This is formulated precisely in terms of the restricted isometry property (RIP) that follows."[2]

- __Restricted Isometry Property (RIP):__
    "The RIP is a property of the matrix $A = \Phi \Psi$ that provides a condition under which the matrix will behave well with respect to sparse signals. Specifically, for a matrix $A$ to satisfy the RIP of order $k$ with a constant $\delta_k$, it must hold that:

    $$
    (1 - \delta_k) \|x\|_2^2 \leq \|A x\|_2^2 \leq (1 + \delta_k) \|x\|_2^2
    $$

    for all $k$-sparse vectors $x$. Here, $\delta_k$ is the smallest constant such that this inequality holds, and it should be close to zero. This ensures that the matrix $A$ approximately preserves the Euclidean length (and hence the geometry) of all $k$-sparse signals, meaning the measurements are nearly isometric.


## Theory: main aspects of study and evaluation metrics

### Aspects relevant to the study

__Work on signal block__

ECG provide continuous data sampling, a record length can vary based on why it is being taken from few minutes, to hours, to days. This work addresses small devices, that will take a number of samples that can vary between 16 and 1024 as __signal block to compress__.

__Compression ratio (CR)__

"Important factor for evaluating different methods. CR as follow 
$$
CR(\%) = 100 \frac{n - m}{n}
$$
where $m$ and $n$ are the number of compressed and original samples, respectively. "[1]

__Compression algorithm’s complexity__

Very relevant "when we talk about limited and weak ECG-recorders. The power consumption usually has a linear relation with the complex-ity of systems. Supplying the power for 24-h ambulatory or remote ECG recorders is very important, that encourage 
us to focus on systems that have low power consumption."[1]

The focus here is especially on _sampling phase_: one of the goal of the project will be to demonstrate, same as they did in the paper, that a smaller _measurement matrix_ will result in a _more efficient sampling phase_.

__Processing speed__

"In emergency situations it will be important. Considering the ambulatory ECG recorders, whatever the data sooner to be presented to a physician, the next orders from a physician can be given sooner as well."

Here it must be also taken into account the _reconstruction complexity_, in order to provide _usable_ ECG data, it's necessary to be fast both in acquirin and processing the data.

_In this work are reproposed the same fundamental metrics and evaluation aspects proposed in the [1] Izadi, V., Shahri, P.K., & Ahani, H. (2020) paper.

### Metrics to Assess the Accuracy of Reconstructed Signal

The accuracy of the reconstructed signal in ECG compression algorithms is typically evaluated using two common metrics: the Percentage Root Mean Square Difference (PRD) and Signal-to-Noise Ratio (SNR). These metrics are defined as follows:
__Percentage Root Mean Square Difference (PRD)__
The PRD is a measure of the difference between the original ECG signal and the reconstructed ECG signal. It is calculated using the following equation:

$$
\text{PRD} = 100 \times \sqrt{\frac{\sum_{i=0}^{N-1} (x(n) - \hat{x}(n))^2}{\sum_{i=0}^{N-1} x(n)^2}}
$$

where:
- $x(n)$ is the original ECG signal.
- $\hat{x}(n)$ is the reconstructed ECG signal.
- $N$ is the length of the signal.

__Signal-to-Noise Ratio (SNR)__
The SNR is another measure used to assess the quality of the reconstructed signal. It is calculated from the PRD using the following equation:

$$
\text{SNR} = -20 \log_{10} \left(\frac{\text{PRD}}{100}\right)
$$

### Quality Assessment Based on PRD and SNR

Table 1 from the referenced paper classifies the quality of the reconstructed signal based on the PRD and corresponding SNR values:

| Quality        | PRD Range      | SNR Range       |
|----------------|----------------|-----------------|
| Very Good      | 0% < PRD < 2%  | SNR > 33 dB     |
| Good           | 2% < PRD < 9%  | 20 dB < SNR < 33 dB |
| Undetermined   | PRD ≥ 9%       | SNR ≤ 20 dB     |

This table indicates that when the PRD is less than 2%, the quality of the reconstructed signal can be categorized as "Very Good." For PRD values between 2% and 9%, the quality is considered "Good," and for PRD values above 9%, the quality of the reconstructed signal cannot be precisely determined. __In this study the same metric will be adopted__.

_Table based on [1] Izadi, V., Shahri, P.K., & Ahani, H. (2020). A compressed-sensing-based compressor for ECG. *Biomedical Engineering Letters*, 10, 299–307. https://doi.org/10.1007/s13534-020-00148-7_

_More information on how such measure was established in APPENDIX_

## Theory: measurement matrix

As previously mentioned ant _ECG device_ provide continuous data sampling for consectuive hours, for instance MIT-BIH Arrhythmia Database provides _records_ for each patient for about $30$ consecutive hours, sampled at $360$ $samples/second$. This means that each _record_ has about $650000$ samples.

Exploiting _Compressed Sensing_ allows to store only a fraction of such data by immediately computing the _compress measurement_, this work won't delve into the __hardware__ specifics, the [1] Izadi, V., Shahri, P.K., & Ahani, H. (2020) paper provides a possible hardware implementation.

What is important to understand in the present study is that on an ECG signal CS-based approach works on _groups of consecutive samples_ within a _record_, each _"group"_ is a __signal blocks__.




### Compressing blocks of samples within signal

For the whole signal we understood that CS performs:
$$
y = \Phi x
$$

where:
- $x \in \mathbb{R}^n$ _real_ signal coming from sensors
- $y \in \mathbb{R}^m$ _compressed measurement_
- $\Phi \in \mathbb{R}^{m \times n}$ with $m \ll n$ is the _measurement matrix_.

In the practical case "the signal $x$" becomes the single __signal block__ $y_{block} \in \mathbb{R}^d$, where $d$ is the _block size_.

$$
y_{block} = \Phi_{p,d} \cdot x_{block}
$$

Where $p \ll d$, $\Phi_{p,d}$ will reduce $d$ _original samples_ to $p$ samples that compose _compressed measurement_ __for that single block__. 

__Compressed measurement $y$ of the whole signal is then obtained by simply concatenating previous results__



### How the measurement matrix is generated

In the present work, elements of $\Phi_{p,d}$ are drawn from a Bernoulli distribution, which is a discrete probability distribution.

For each element $\phi_{ij}$ of the matrix $\Phi$, a random value is generated that is either $+1$ or $-1$ with equal probability.

__How to Check that Restricted Isometry Property holds__

As previously explained in the theoretical review the Restricted Isometry Property (RIP) is crucial in ensuring that compressed sensing can accurately recover sparse signals from a reduced set of measurements. 

However checking whether a specific matrix $A$ satisfies the RIP is computationally infeasible for large matrices because it would involve verifying this condition across all possible sparse vectors. 

Despite this difficulty, generating the measurement matrix $\Phi$ randomly ensures that $A = \Phi \Psi$ is very likely to satisfy the RIP.__*__ This inherent randomness provides a strong theoretical basis for the effectiveness of compressed sensing without the need for direct verification of the RIP.[1]

## Theory: reconstruction of the signal

### Dictionaries

#### _fixed dictionaries_ vs _adaptive dictionary learning_

"
Decreasing  the  projection  matrixs  size  will  affect  the order of sparsity. There are two different classes of sparsifying bases: first class is fixed dictionaries such as wavelet transform dictionary or discrete cosine transform (DCT).
The second class is adaptive dictionaries that usually present better sparse representation. There are various adaptive dictionary learning algorithms, such as the method of optimal direction  (MOD)  [19],  and  K  singular  value  decomposition (K-SVD) [20] which can present efficient sparsifying dictionary if the training set has been selected accurately. 
For the case of wearable ECG recorders that are used by a patient, after training a dictionary, the probability of major 
change in ECG data of patients is low; hence adaptive sparsifying  dictionary  methods  can  be  applied  to  produce  a  more efficient sparsifying dictionary. Since the sparsity has a direct relation with the quality of the reconstructed signal, it leads to compensate for the effect of decreasing the length of the projection matrix. In this work, adaptive dictionary learning is used for the ECG signal, and the result shows that it can be a well alternative to the fixed dictionaries used by previous researches."[1]

This idea is reproposed in the present work.

#### Fixed dictionaries
In this work the _DCT_ __fixed dictionaries__ is utilized as a benchmark to test how dictionary learning can improve reconstruction.

It's assumed that the reader possesses the necessary knowledge about the topic. This are well known methdos employed in the _signal compression "world"_.
An overview is provided at the end of the document in the __Appendix__.


#### Adaptive Dictionary Learning

In this work, two main adaptive dictionary learning algorithms are used: the *Method of Optimal Directions (MOD)* and *K-Singular Value Decomposition (K-SVD)*.

The *Method of Optimal Directions (MOD)* is an iterative algorithm where, at each iteration, the goal is to update the dictionary to minimize the reconstruction error given the current sparse coefficients. The dictionary is updated by solving a least-squares problem. This method is particularly efficient when the training set is well-chosen and representative of the signals to be compressed. In this context, MOD generates a dictionary that provides a better sparse representation of the ECG signal, especially when the characteristics of the signal are stable over time, such as in the case of patient-specific ECG data.

*K-Singular Value Decomposition (K-SVD)*, on the other hand, is another adaptive dictionary learning technique that builds on the principles of the MOD algorithm but further refines the dictionary update step. In K-SVD, the dictionary atoms are updated one at a time through a singular value decomposition process. This method improves the sparsity of the signal representation by optimizing both the sparse coefficients and the dictionary atoms simultaneously. K-SVD is known for its flexibility and ability to handle complex signals, making it a suitable candidate for ECG signal compression where achieving higher sparsity can significantly improve the reconstruction quality.

Both methods allow for the creation of a more tailored dictionary compared to fixed ones, such as DCT, leading to a sparser representation of the ECG data. This sparser representation compensates for the reduction in the size of the measurement matrix, helping to maintain or improve the quality of the reconstructed signal. These adaptive techniques are particularly advantageous in scenarios where the signal remains relatively consistent over time, such as in long-term ECG monitoring, offering a substantial improvement over fixed dictionaries.



### Reconstruction method
The classic solution in CS is to solve the $\ell_1$-minimization problem, equivalent to $\ell_0$-minimization problem. In real cases is always a good idea to considere presence of noise, hence the problem is:
$$
\hat{s} = \arg_{s} \min \|s\|_1 \text{ subject to } \|y - \Phi \Psi s\|_2 \leq \epsilon
$$
where $\epsilon$ is a bound on the noise level.

Directly solving the problem __is not convenient in terms of time complexity__, especially because real measurements always present some noise, which needs to be taken into account. 

#### SL0

The *Smoothed ℓ0 (SL0)* algorithm offers an efficient alternative to solving the ℓ1-minimization problem by approximating the ℓ0 norm. The ℓ0 norm represents the number of non-zero elements in a vector, which directly captures sparsity, but solving the ℓ0-minimization problem is NP-hard. To address this, SL0 approximates the ℓ0 norm using a smooth function, which is easier to minimize.

The key idea behind SL0 is to use a sequence of smooth functions to approximate the ℓ0 norm, starting with a highly smooth (wide) approximation and gradually reducing the smoothness (narrowing the approximation) as the algorithm progresses. This allows the algorithm to efficiently find sparse solutions by iteratively refining the approximation.

The SL0 algorithm can be summarized in the following steps:

1. **Initialization**: The algorithm starts by choosing an initial estimate for the sparse coefficients, typically based on a least-squares solution.
   
2. **Smoothing and gradient descent**: A smooth approximation of the ℓ0 norm is applied, and a gradient descent method is used to minimize the smooth function. This step leads to a sparsity-promoting solution.
   
3. **Progressive narrowing**: The smooth function is gradually made less smooth (narrower) over iterations, which more closely approximates the true ℓ0 norm. As the smoothing parameter decreases, the solution becomes sparser.

4. **Stopping criterion**: The algorithm stops once the approximation is sufficiently narrow or a desired level of sparsity is achieved.

SL0 is particularly appealing because it avoids the more computationally expensive ℓ1-minimization while still producing sparse solutions. It also directly incorporates noise tolerance by allowing a relaxation of the equality constraint, as in the standard compressed sensing problem:

$$
\hat{s} = \arg_{s} \min \|s\|_0 \text{ subject to } \|y - \Phi \Psi s\|_2 \leq \epsilon
$$

Though SL0 approximates the ℓ0 norm, it has been shown to perform well in practical applications, often providing faster solutions compared to ℓ1-based methods while maintaining accuracy in the presence of noise.

# References

Direct quotation are enclosed in `"..."` and are followed by a reference number inside `[]`

[1] Izadi, V., Shahri, P.K., & Ahani, H. (2020). A compressed-sensing-based compressor for ECG. *Biomedical Engineering Letters*, 10, 299–307. https://doi.org/10.1007/s13534-020-00148-7

[2] __Chapter 3.1 "Sparsity and Compressed Sensing" of the Book__:
   Brunton, S. L., & Kutz, J. N. (2022). *Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control* (2nd ed.). Cambridge University Press.

__Data__:

Moody GB, Mark RG. *The impact of the MIT-BIH Arrhythmia Database*. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001). (PMID: 11446209)

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). *PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals*. Circulation [Online]. 101 (23), pp. e215–e220.



# Appendix

### Discrete Cosine Transform (DCT)

The Discrete Cosine Transform (DCT) is a transform similar to the Discrete Fourier Transform (DFT) but uses only real numbers and cosines. It is widely used in image and video compression (e.g., JPEG, MPEG) due to its properties that are particularly suitable for these applications.

##### Overview of the Discrete Cosine Transform (DCT)

The DCT represents a signal as a sum of cosine functions oscillating at different frequencies. It transforms a sequence of real numbers into a sequence of coefficients representing the signal in the frequency domain.

#### Types of DCT

There are several types of DCT, but the most commonly used are DCT-I, DCT-II, and DCT-III. The most frequently used variant in practical applications is DCT-II, often referred to simply as "the DCT."

__DCT-II (The Most Common DCT)__

For a sequence of $N$ real numbers $x[n]$, where $n = 0, 1, \ldots, N-1$, the DCT-II is defined as:

$$
X[k] = \sum_{n=0}^{N-1} x[n] \cos \left[ \frac{\pi}{N} \left( n + \frac{1}{2} \right) k \right] \quad \text{for} \quad k = 0, 1, \ldots, N-1
$$

__Inverse DCT-II__

The inverse DCT-II (often referred to as IDCT) is defined as:

$$
x[n] = \frac{1}{N} \left( \frac{X[0]}{2} + \sum_{k=1}^{N-1} X[k] \cos \left[ \frac{\pi}{N} \left( n + \frac{1}{2} \right) k \right] \right) \quad \text{for} \quad n = 0, 1, \ldots, N-1
$$

__DCT-I__

The DCT-I is defined for a sequence $x[n]$ of length $N$ as:

$$
X[k] = \sum_{n=0}^{N-1} x[n] \cos \left( \frac{\pi}{N-1} nk \right) \quad \text{for} \quad k = 0, 1, \ldots, N-1
$$

DCT-I is defined only for sequences of length $N \geq 2$ and is less commonly used due to boundary conditions.

__DCT-III__

The DCT-III, often referred to as the inverse DCT of DCT-II, is defined as:

$$
x[n] = \frac{1}{2} X[0] + \sum_{k=1}^{N-1} X[k] \cos \left( \frac{\pi}{N} k \left( n + \frac{1}{2} \right) \right) \quad \text{for} \quad n = 0, 1, \ldots, N-1
$$

#### Properties of the DCT

- __Orthogonality__: The cosine basis functions used in DCT are orthogonal.
- __Real-Valued Output__: For real-valued input signals, the DCT output is also real-valued.
- __Energy Compaction__: The DCT tends to concentrate the energy of the signal in a few low-frequency components, making it efficient for compression.

#### Complexity of DCT

__Direct Computation__

The direct computation of DCT for a sequence of length $N$ involves $N$ multiplications and $N-1$ additions for each of the $N$ frequency components, resulting in a total complexity of:

$$
O(N^2)
$$

__Fast Algorithms for DCT__

Fast algorithms, similar to the Fast Fourier Transform (FFT), reduce the computational complexity of the DCT to:

$$
O(N \log N)
$$

These algorithms exploit symmetry properties and use divide-and-conquer approaches to achieve significant computational savings.

__Energy Compaction and Low-Frequency Components in DCT__

The Discrete Cosine Transform (DCT) has a key property known as energy compaction, where most of the signal's energy is concentrated in a few low-frequency components. This property is essential for efficient compression, as it allows significant data reduction while preserving the essential features of the original signal.

In the DCT, the index $k$ represents the frequency component. Low values of $k$ correspond to low-frequency components, which represent slow variations in the signal, while high values of $k$ correspond to high-frequency components, representing rapid variations.

__Frequency Interpretation__

- $k = 0$: The basis function is a constant, representing the average value of the signal.
- Low $k$: Represent slow variations, such as $\cos \left( \frac{\pi}{N} \left( n + \frac{1}{2} \right) \cdot 1 \right)$.
- High $k$: Represent rapid variations, such as $\cos \left( \frac{\pi}{N} \left( n + \frac{1}{2} \right) \cdot (N-1) \right)$.

__Energy Compaction__

The DCT's ability to concentrate energy in low-frequency components means that for many natural signals, including images and audio, most of the significant information can be captured with only a few coefficients. This makes the DCT highly efficient for compression purposes, as the majority of high-frequency coefficients (which represent fine details and noise) can be quantized more coarsely or discarded without significantly affecting the perceived quality of the signal.

#### Matrix Representation of the DCT

The Discrete Cosine Transform (DCT) can be represented in a matrix form, which is particularly useful for understanding the transform as a linear operation. This approach involves the use of an orthonormal basis matrix formed by cosine functions.

__DCT as a Matrix Product__

Let $\mathbf{x}$ be the input signal, which is a column vector of length $N$. The DCT of this signal can be expressed as a matrix-vector multiplication:

$$
\mathbf{X} = \mathbf{\Psi} \mathbf{x}
$$

where $\mathbf{X}$ is the vector of DCT coefficients, and $\mathbf{\Psi}$ is the $N \times N$ DCT matrix whose elements are defined as:

$$
\mathbf{\Psi}[k,n] = \cos \left[ \frac{\pi}{N} \left( n + \frac{1}{2} \right) k \right] \quad \text{for} \quad k, n = 0, 1, \ldots, N-1
$$

This matrix $\mathbf{\Psi}$ forms an orthonormal basis for the space of real-valued signals of length $N$.

__Inverse DCT as a Matrix Product__

The inverse DCT (IDCT) can also be represented in matrix form. Given the DCT coefficients $\mathbf{X}$, the original signal $\mathbf{x}$ can be recovered as:

$$
\mathbf{x} = \mathbf{\Psi}^\top \mathbf{X}
$$

Here, $\mathbf{\Psi}^\top$ is the transpose of the DCT matrix $\mathbf{\Psi}$, not the conjugate transpose (Hermitian), since the DCT is a real-valued transform and $\mathbf{\Psi}$ is a real-valued matrix.

__Orthogonality of the DCT Matrix__

The matrix $\mathbf{\Psi}$ is orthonormal, meaning it satisfies:

$$
\mathbf{\Psi}^\top \mathbf{\Psi} = \mathbf{I}
$$

where $\mathbf{I}$ is the identity matrix. This property ensures that the DCT and IDCT operations are perfect inverses of each other, preserving the energy of the original signal in the frequency domain.


---

### Discrete Wavelet Transform (DWT)

The Discrete Wavelet Transform (DWT) is a transform used in signal processing and compression, offering advantages over the Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT). The DWT provides a time-frequency representation of the signal, capturing both frequency and location information.

#### Overview of DWT

The DWT decomposes a signal into a set of wavelets, which are localized in both time and frequency. This allows for multi-resolution analysis, where different parts of the signal can be analyzed at different scales.

#### Key Concepts

- __Wavelets__: Functions that efficiently represent data with sharp changes or edges, localized in time.
- __Scaling and Translation__: Wavelets can be scaled (dilated) and translated (shifted) to capture different frequency components and their locations in the signal.
- __Multi-Resolution Analysis__: DWT performs analysis at multiple resolutions, capturing both coarse and fine details of the signal.

#### DWT Algorithm

The DWT of a signal can be computed using recursive filtering and downsampling. The process involves two main steps: decomposition (analysis) and reconstruction (synthesis).

__Decomposition (Analysis)__

- __Filter Bank__: Apply a pair of filters to the signal: a low-pass filter (L) and a high-pass filter (H). The low-pass filter captures the approximation (low-frequency) components, while the high-pass filter captures the detail (high-frequency) components.
- __Downsampling__: After filtering, the signal is downsampled by a factor of 2 (keeping every other sample) to reduce the data size.
- __Recursive Decomposition__: The decomposition process is recursively applied to the low-pass filtered signal to create a multi-level decomposition.

### Reconstruction (Synthesis)

- __Upsampling__: The downsampled components are upsampled by a factor of 2 (inserting zeros between samples).
- __Filter Bank__: Apply the synthesis filters (low-pass and high-pass) to the upsampled components.
- __Combining__: The filtered components are combined to reconstruct the signal.

#### Mathematical Formulation

Given a signal $x[n]$:

- __Approximation Coefficients (Low Frequency)__:
  
  $$
  A_j[k] = \sum_n x[n] \cdot \phi_{j,k}[n]
  $$
  
  where $\phi_{j,k}[n]$ are the scaling functions (low-pass).

- __Detail Coefficients (High Frequency)__:
  
  $$
  D_j[k] = \sum_n x[n] \cdot \psi_{j,k}[n]
  $$
  
  where $\psi_{j,k}[n]$ are the wavelet functions (high-pass).

#### Advantages of DWT

- __Localization__: Wavelets are localized in both time and frequency, allowing DWT to capture transient features more effectively than DFT or DCT.
- __Multi-Resolution Analysis__: DWT provides a hierarchical representation, enabling analysis at multiple resolutions and scales.
- __Efficient Compression__: DWT often achieves better compression efficiency for images and signals with sharp changes or edges, as it can represent such features more compactly.

#### Complexity of DWT

__Direct Computation__

The direct computation of DWT for a signal of length $N$ involves $O(N)$ operations per level of decomposition. For a full $J$-level decomposition, the total complexity is:

$$
O(N)
$$

__Fast Algorithms for DWT__

Fast DWT algorithms, such as those based on recursive filtering and downsampling, also achieve a complexity of:

$$
O(N)
$$

These algorithms exploit the hierarchical structure of the wavelet transform to achieve efficient computation.

__Matrix Representation of the DWT__

The Discrete Wavelet Transform (DWT) can also be represented in matrix form, analogous to the matrix representation of the Discrete Cosine Transform (DCT). This approach allows us to see the DWT as a linear operation involving an orthonormal basis formed by wavelet functions.

__DWT as a Matrix Product__

Let $\mathbf{x}$ be the input signal, which is a column vector of length $N$. The DWT of this signal can be expressed as a matrix-vector multiplication:

$$
\mathbf{W} = \mathbf{\Phi} \mathbf{x}
$$

where $\mathbf{W}$ is the vector of wavelet coefficients, and $\mathbf{\Phi}$ is the $N \times N$ wavelet transform matrix. The matrix $\mathbf{\Phi}$ is constructed using wavelet functions (for high-frequency components) and scaling functions (for low-frequency components).

__Inverse DWT as a Matrix Product__

The inverse DWT (IDWT) can be represented in matrix form similarly. Given the wavelet coefficients $\mathbf{W}$, the original signal $\mathbf{x}$ can be recovered as:

$$
\mathbf{x} = \mathbf{\Phi}^\top \mathbf{W}
$$

Here, $\mathbf{\Phi}^\top$ is the transpose of the wavelet transform matrix $\mathbf{\Phi}$. Since the DWT is typically real-valued, we use the transpose rather than the conjugate transpose (Hermitian).

__Orthogonality of the DWT Matrix__

The matrix $\mathbf{\Phi}$ is orthonormal, which means it satisfies:

$$
\mathbf{\Phi}^\top \mathbf{\Phi} = \mathbf{I}
$$

where $\mathbf{I}$ is the identity matrix. This property ensures that the DWT and IDWT are perfect inverses of each other, preserving the energy of the original signal while transforming it into the wavelet domain.

---

### Quality Assessment Based on PRD and SNR

Table 1 from the referenced paper classifies the quality of the reconstructed signal based on the PRD and corresponding SNR values:

| Quality        | PRD Range      | SNR Range       |
|----------------|----------------|-----------------|
| Very Good      | 0% < PRD < 2%  | SNR > 33 dB     |
| Good           | 2% < PRD < 9%  | 20 dB < SNR < 33 dB |
| Undetermined   | PRD ≥ 9%       | SNR ≤ 20 dB     |

This table indicates that when the PRD is less than 2%, the quality of the reconstructed signal can be categorized as "Very Good." For PRD values between 2% and 9%, the quality is considered "Good," and for PRD values above 9%, the quality of the reconstructed signal cannot be precisely determined. 

__Metric Based on Physician Qualitative Assessments__

The classification of the PRD and SNR values into "Very Good," "Good," and "Undetermined" categories was established based on a study by Zigel et al., which is referenced in the paper. In this study, a link was established between the diagnostic distortion of ECG signals and the PRD metric. The researchers conducted qualitative assessments with physicians, who evaluated the diagnostic quality of reconstructed ECG signals at different PRD levels.

The physicians' qualitative assessments provided a subjective but clinically relevant measure of how much distortion could be tolerated in the reconstructed signals before it began to interfere with accurate diagnosis. These evaluations were then correlated with specific PRD values, allowing the researchers to define thresholds where the signal quality was deemed acceptable or unacceptable for clinical use. For instance, a PRD of less than 2% was consistently associated with minimal diagnostic distortion, leading to its classification as "Very Good." As PRD increased, the likelihood of clinically significant distortion also increased, which was reflected in the "Good" and "Undetermined" categories.

This physician-based qualitative assessment was crucial in grounding the PRD and SNR metrics in practical clinical utility, ensuring that the numerical 
thresholds corresponded to meaningful diagnostic criteria.

_Based on [1] Izadi, V., Shahri, P.K., & Ahani, H. (2020). A compressed-sensing-based compressor for ECG. *Biomedical Engineering Letters*, 10, 299–307. https://doi.org/10.1007/s13534-020-00148-7_
