# TP1
- Master MVA ENS-Paris Saclay
- Balthazar Neveu
- balthazarneveu@gmail.com

# DCT denoiser

### Question 1. 
#### Maximizing the likelihood
Given $Y = X+B$ where $B \sim \mathcal{N}(0,\sigma^{2})$ and $\mathcal{p}(X) = \frac{e^{-\|X\|}}{ \int_{-\infty}^{\infty} e^{-\|X\|}dx}$

We're looking for the most likely value $x$ of $X$  given the observation $y$ of the random variable $Y$.

$x^{*} = \text{argmax}_{x}P(X=x|Y=y)$

Let's first apply Bayes rule $P(X=x|Y=y) = \frac{P(Y=y|X=x)P(Y=y)}{P(X=x)} = \frac{P(B=y-x|X=x)P(Y=y)}{P(X=x)}  \propto  \frac{e^{-\frac{\|y-x\|^{2}}{2\sigma^2}}}{e^{-\|x\|}}P(Y=y)$

Since we're searching for the argmax of this expression regarding $x$ ($y$ is constant so $P(Y=y)$ is constant too).

We have $x^{*} = \text{argmax}_{x} e^{-\frac{\|y-x\|^{2}}{2\sigma^2} + \|x\|}$

*since $e^{-u}$ is a monotonic decreasing function.*

$x^{*}= \text{argmin}_{x} (-\frac{\|y-x\|^{2}}{2\sigma^2} + \|x\|) $

-------
#### Finding the solution of the cost function
Let's minimize $C(x) = -\frac{\|y-x\|^{2}}{2\sigma^2} + \|x\|$

- if $x>0$ , $\frac{dC(x)}{dx} = \frac{x-y}{\sigma^2} + 1$. Critical point shall statisfy $x=y - \sigma^{2}$ 
- if $x<0$ , $\frac{dC(x)}{dx} = \frac{x-y}{\sigma^2} - 1$. Critical point shall statisfy $x=y + \sigma^{2}$ 


---------

- If $ y > \sigma^2$, then $ y - \sigma^2 > 0$, and the critical point $ x = y - \sigma^2$ is valid as it falls in the $ x > 0$ case.
- If $ y < -\sigma^2$, then $ y + \sigma^2 < 0$, and the critical point $ x = y + \sigma^2$ is valid as it falls in the $ x < 0$ case.
- If $ -\sigma^2 \leq y \leq \sigma^2$, the solution might be $ x = 0$ as neither of the critical points fall within their respective ranges.


Let's assume $y>0$
 



### Question 2.
`DCT_denoise` performs a 

### Question 3.
- The hard thesholding function can be differentiated with regard to the input $y$ but not with regard to the threshold $T=\sigma^{2}$ unfortunately.
- A workaround is to use **a very rough approximation** of the hard thresholding function to stay in the standard framework of torch operators: using a bias and a Relu, it is possible to perform an operation of soft thresholding.
- Best idea is to use an differentiable approximation of the hard thresholding function. 

##### Approximate differentiable hard thresholding function

Another idea is to approximate the hard thresholding by a function satisfying the following properties:
-  *differentiable* with regard to the threshold (and the input obviously)
-  *parametric*: use a temperature $\lambda$ parameter so that when the temperature varies from $+\infty$ and 0, the thresholding function varies between a soft threshold and a hard threshold of value $T$.
-  Using this idea, you can use the approximate differentiable function in a deep learning standard framework and proggressively vary the temperature

$$f(x) = \text{ReLU}(x - T + T. tanh(\frac{(x-T)}{\lambda}))$$

![](figures/thresholding_functions.png)


```python
# Definition of various thresholding functions
def hard_thresholding(x: torch.Tensor, threshold: torch.Tensor) -> torch.Tensor:
    """Non-differentiable with regard to threshold"""
    return torch.where(torch.abs(x) > threshold, x, torch.zeros_like(x))


def soft_thresholding(x: torch.Tensor, threshold: torch.Tensor) -> torch.Tensor:
    """Differentiable with regard to threshold, does not preserve energy of input signal (biased)"""
    return torch.nn.functional.relu(x - threshold) - torch.nn.functional.relu(-(x + threshold))


def assym_differentiable_hard_thresholding(x: torch.Tensor, threshold: torch.Tensor, temperature: float=1) -> torch.Tensor:
    x_offset = x - threshold
    return torch.nn.functional.relu(x_offset + threshold*torch.tanh(x_offset/temperature))


def differentiable_hard_thresholding(x: torch.Tensor, threshold: torch.Tensor, temperature: float=1) -> torch.Tensor:
    """Approximated of hard thresholding, differentiable with regard to threshold
    When temperature is high, it is close to soft thresholding
    When temperature is close to 0, it is close to hard thresholding
    """
    return assym_differentiable_hard_thresholding(x, threshold, temperature) - assym_differentiable_hard_thresholding(-x, threshold, temperature)
```

##### Proof of concept
We build a simple toy example where an input gaussian distribution of standard deviation 8 is hard-thresholded with a trheshold of $2.4$. 

![toy_example](figures/toy_example.png)

The goal is to fit/learn this threshold. We'll preform Stochastic gradient descent using the Mean Square Error (L²) and we'll decrease the temperature progressively.

![learnable_hard_threshold](figures/learnable_threshold.png)

It is doable to learn the right hard threshold using gradient descent.

### Question 4.
The number of significative operations (multiplications) per pixels is in $2*N^{4} $
- $N^2$ frequencies (number of channels) multiplied by 
  - convolution kernel of size $N^2$ multipliciations.
  - 2 because of DCT and inverse DCT.
  - No bias addition, $N^2$ thresholdings.
- Final normalization is negligible.
 

#### Question 5.
Best threshold (ratio) value for the Zebre picture with AWGN $\sigma=25$ is $2.7$.

$r = \frac{T}{\sigma} = 2.7$

![ratio_search](figures/figure_ratio_search_2_73.png)

| Noisy | DCT denoised $r=2.7$| DCT denoised $r=5$|
|:----:| :----:| :----:|
|![](figures/zebre_noisy.png) | ![](figures/zebre_DCT_denoised_opt.png) | ![](figures/zebre_DCT_denoised.png)  |
|  $\sigma=25$ | Optimum DCT denoiser in the MSE sense | Oversmoothed, threshold is too high |