# TP2
- Master MVA ENS-Paris Saclay
- Balthazar Neveu
- balthazarneveu@gmail.com
- [Web Version](https://balthazarneveu.github.io/MVA24-delire/) | [Github](https://github.com/balthazarneveu/MVA24-delire)

> Please note that the whole experiments are done on a single image... Conclusions would require more time

> Bonus part: not done in time

# Default denoisers

Evaluated on observation with additive white gaussian noise. $sigma_{255}=40$
| $u$ = Original / $\tilde{u}$ = Noisy | 
|:--:|
| ![](figures/in_out.png) |
|Denoised with standard, non tuned denoisers|
| ![](figures/basic_default_denoisers.png) |


- TV-Denoising has not been tuned (and it can do better as we'll see later).
- DN-CNN trained with a noise level $sigma_{255}=40$ is used here - so the denoising network is expected to work : there's no gap between its training conditions and evaluation conditions.
- BM3D is used with the right noise input level (`BM3DDenoiser(40)`), it is a non machine learning based algorithm. 

| BM3D with different tunings.|
|:--:|
| ![bm3d_sensitivity](figures/BM3D_tuning.png) |
|On the left, BM3D would expect a not too noisy image and so leave a lot of residual noise. On the right side, BM3D runs on an image where that was more noisy that reality so it ends up oversmoothing|

## Optimal TV Denoiser

**Auto tuning TV denoiser**

### Question 1


```python
for i in range(1,100):
    uTV = TVDenoiser(lamb=TVlamb,niter=100).denoise(utilde)
    res = rmse(uTV,utilde)
    # Estimated residual between the noisy image and the denoised image
    error = rmse(uTV,u)
    # True error between the prediction and the clean image
    non_gaussianity_error = sigma1*beta-res
    # if the error was gaussian with the right amount of noise, this should be 0
    correction_factor = rho*non_gaussianity_error
    TVlamb *= np.exp(correction_factor)
    stop_cond= np.fabs(non_gaussianity_error) < 0.01 * (sigma1*beta)
    if stop_cond:
        break
```

### Question 2

Initial tuning : $\lambda^{TV}_{\text{n=0}} = 0.024 =  \sigma^2$

$\rightarrow  \lambda^{TV}_{n=6} \approx 0.125 \approx 5.1 \sigma^ 2 $ to achieve a 1% tolerance (`residual/sigma = 0.955`)

Although simple, the proposed automatic tuning technique seems to be quite robust to initial values. Even if you start with a high value for $\lambda$ like $10 \sigma^{2}$ where the initial images are oversmoothed, the tuning parameter correction converges to the right value.

![](figures/tuning_parameter_convergence.png)



![](figures/single_standalone_awgn_denoisers.png)

|Denoiser | RMSE(normalized gray levels) | PSNR(dB) |
|:---:|:---:|:---:|
|Tuned TV Denoiser |  0.0630 | 24.0 dB |
|DnCNN Denoiser | 0.0450 | 26.9 dB |
|Real SN_DnCNN | 0.0448 | 27.0 dB |

`Real_SN_DnCNN` and `DnCNN` denoising have similar denoising quality, and way better than the correctly tuned TV denoiser  

### Question 3 
`RealSN_DnCNN` has the same exact architecture as `DnCNN` but has been trained in a very specific fashion:
The residual satisfies the Lipschitz constraint. This will not change anything to the denoising capabilities of the network under AWGN (this is what we observed in the previous table in question 2., additionnally the 2 denoised results look very much alike visually).

The only difference is part of the training: compared to a "classical" MSE minimization with gradient descent the weights are modified during training so that the residual satisfies the Lipschitz constraint (Relu is 1-Lipschitz, Convolutions coefficients at each layer need to be normalized by their largest eigen value). RealSN stands for real Spectral Normalization and uses a "tricky" implementation (not naïvely performing SVD at each step, instead relying on an iterative power method = avoids computing SVD at every training step for all netwrok layers).

### Question 4 

$\alpha = \frac{\sigma^2}{\gamma}$

In [None]:
def prox_datafit_gaussian_denoising(x: np.ndarray, y: np.ndarray, alpha: float, s: float) -> np.ndarray:
    """
    Proximal Operator for Gaussian denoising:

    f(x) = || x - y ||^2 / (2 s^2)

    prox_{alpha f} (x) = (x + y*alpha/s^2)/(1+alpha/s^2)

    Parameters:
        :x - the argument to the proximal operator.
        :y - the noisy observation (flattened).
        :opts - the kwargs for hyperparameters.
            :alpha - the value of alpha.
            :s - the standard deviation of the gaussian noise in y.
    """
    a = alpha/(s**2)
    v = (x+y*a)/(1+a)
    return v

In the case where $s=\sigma$, the proximal term becomes
$\text{prox}_{\alpha.F(x)}=\frac{\gamma x+y}{\gamma +1}$ wich is a weighted sum of x and y (we're blending the denoised result from the Denoiser). When $\gamma=1$ , this is just the average $\frac{x+y}{2}$.

<!-- UNROLLING ADMM
Let's unroll the first iteration:
Init: 
- $u^{0}=0$ zero residual
- $y^{0} = x^{0} = \tilde{u}$ noisy image

- Step 1: (*Regularization*) - We first denoise the image.
$$x^{1} = \text{prox}_{\sigma^2 G}(y^{0}-u^{0}) \approx D_{\sigma}(y^{0}-u^{0}) = \text{Denoiser}_{\sigma}(\tilde{u})$$
- Step 2: (*Data term*) - We'll add back a bit of noise by averageing the denoised image with the original noisy image.
$$ y^{1} = \text{prox}_{\alpha F}(x^{1} + u^0, \tilde{u}) =  \frac{\gamma . (x^{1} + u^0)+\tilde{u}}{\gamma +1} = 
\frac{\gamma .\text{Denoiser}_{\sigma}(\tilde{u})  + \tilde{u}}{\gamma +1}$$ -->


<!-- - Step 3: Estimated residual $$u_1 = \frac{x^{1}-y^{0}}{2} =  \frac{\text{Denoiser}_{\sigma}(\tilde{u}) - \tilde{u}}{\gamma+1}$$ -->


<!-- $ y^{k+1} = \text{prox}_{\alpha F}(x^{k+1} + u^k) $ -->
<!-- $$\text{prox}_{\sigma^2 G}(x) \approx D_{\sigma}(x)$$ -->

The $\gamma$ parameter definitely acts as a tuning parameter for the denoiser as can be seen in the next figure. Good news is that compared to the "classic trick" (telling the network to process the image with a tweaked noise value), this tuning parameter actually seems to make sense. 

| $\gamma=0.8$ | $\gamma=1.0$ | $\gamma=1.2$ |
|:----:|:----:|:----:|
|![](figures/drs_same_noise_gamma_0p8.png) |![](figures/drs_same_noise_gamma_1p0.png)|![](figures/drs_same_noise_gamma_1p2.png) |

# Auto tuning of PnP Gaussian denoising with $s \neq \sigma$

------
Results using RealSN DnCNN

| $\sigma = 5$ | $\sigma = 15$ | $\sigma = 40$ |
|:-----: |:-----: |:-----: |
|![](figures/pnp_drs_sigma=5.png) | ![](figures/pnp_drs_sigma=15.png) | ![](figures/pnp_drs_sigma=40.png) |

Observations:
- Regarding the image on the left side of the table containing a bunch of weird artifacts, this translates with the impossibility of convergence we'll shown in question 5.
- The results when using PnP DRS for Gaussian Denoising with DnCNN at $\sigma=15$  and $\sigma=40$ are much better than if we'd simply performed inference with the DnCNN $\sigma=15$ or $\sigma=40$


| no PnP $\sigma = 15$ | no PnP $\sigma = 40$ |
| :----: | :-: |
| ![](figures/DnCNN_s=30_sigma=15.png) |  ![](figures/DnCNN_s=30_sigma=40.png) | 


------

### Question 5


> Disclaimer: *I solved question 5 and wrote the answers without notticing the "Check convergence conditions" section so I basically had to find everything by myself which roughly took me half a day*.

| $\sigma = 5$ | $\sigma = 15$ | $\sigma = 40$ |
|:-----: |:-----: |:-----: |
| $\gamma=0.351$|  $\gamma=0.5723$  | $\gamma=1.3715$ | 
|Condition $\gamma<0.062$ | Condition $\gamma<0.55$ |   Condition $\gamma<3.9$ |
|SNR=21.85dB | SNR=23.28dB | SNR=22.90dB |
|PSNR=27.30dB | PSNR=28.74dB | PSNR=28.35dB |
#### 5.1 Convergence conditions
- Since $F(x) = \frac{\|x-y\|^{2}}{2s^2}$ is a quadaratic function, it is $\mu = \frac{1}{s^2} $ strictly -convex 
- $\gamma$  shall therefore satisfy the following conditions $$\gamma \leq \frac{\sigma^2}{s^2} .(\frac{1+L-2L^2}{L})$$
- We know that the SN_DnCNN has been trained with the right Lipschitz constraint, we have $0<L<1$ we find that the constant is approximately . 
  - From the Ryu paper, `Plug-and-Play Methods Provably Converge with Properly Trained Denoisers`, we can get the estimations of the Lipschitz constant $L\approx 0.464$  for DnCNN.

![](figures/lipschitz_constants_estimator.png)

- We can therefore deduce the maximum value of Gamma for each noise level of the pretrained denoisers.

![](figures/gamma_convergence_limits.png)

- It's clear that the optimum $\gamma=0.35 >0.062$ found when $\sigma=5$ violates the constraint, which probably explain why we observe a bunch of artifacts.  
- At $\sigma=15$, the optimum $\gamma=0.57 \approx 0.557$ is close to the limit conditions but works correctly.

#### 5.2 Convergence conditions assessment
Convergence is theoretically guaranteed in all cases for `Real SN DnCNN` as long as the constraint shown above is respected on $\gamma$.
In practice when $\gamma$ is small, you almost remove the regularization term so not much will happen.
But empirically, the iterative search to get the best regularization parameter $\gamma$ at $\sigma=5$ ends up with a non satisfied condition.

- PnP **BM3D is not guaranteed converged as the Lipschitz** constant is  $L>1$

#### 5.3 Best results at $\sigma=15$ 
Best quantitative results are obtained for $\sigma=15$ (a denoiser trained for noises lower than $s=30$). It's also confirmed in terms of quality as textures are also better preserved. 
The explanation is not truly straightforward but first of all the proximal operator for regularization is just approximated by the denoiser.
Let's get an intuition of what
- In the case of $\sigma=15$, the first denoiser iteration will end up with an image which has a lot of residual noise (but less than the original image)... and the mechanism will go on, noise will be progressively removed.
- In the case of $\sigma=40$, at the first iteration the denoiser will oversmooth textures (denoise too much). Then data term will add back a bit of the original noise but you'll then fall back into the same trap at next iteration (the signal is not noisy enough to what the denoiser is expecting - therefore the denoiser removes texture and content considered as noise rather than true signal). 

If we recall what was said about Tweedie's formula, everytime we're using the MMSE denoiser and remove a bit of the residual noise, we're trying to project the noisy image into an improved version according to a smoothed version of the posterior distribution: Basically, by using the denoiser, we increase the probability of the denoised image of belonging to a smoothed version of the posterior image distribution. When $\sigma$ is high, the distribution may actually be too "blurry" therefore potentially limiting the results. Having  too small of a $\sigma$ may result in preventing correct convergence as the posterior distribution may be peaky and multimodal.

### Question 6
- When using DnCNN (without the Lipschitz constraint at training time), there are more visible local artifacts but the overall trend is similar to what we observed in question 5 for the `Real SN DnCNN`. See that bright spot on the nose for instance. This was discussed in question 3 already but here we see the limitations visually when the Lipschitz constraints where not taken into account at training time. 
- To be honest, **this is not too critical** but we have to keep in mind that when we look at Ryu's estimation of the Lipschitz constant of the denoiser residual, the DnCNN $L$ ($epsilon$ in the paper ) was close to the one trained with the spectral normalization trick.

| PnP DRS , denoiser = Real SN-DnCNN $\sigma=15$ | PnP DRS, denoiser DnCNN $\sigma=15$  |
|:---: | :--:|
| ![](figures/drs_real_dn_cnn_sigma=15.png) | ![](figures/drs_dnn_cnn_sigma=15.png) |


-----

| PnP BM3D | PnP DnCNN  | PnP Real SN DnCNN |
|:---: | :--:|:--:|
| $\sigma=40,  N=20 , \gamma = 1.374$   | $\sigma=40, N=20, \gamma = 1.379$  | $\sigma=40, N=20, \gamma = 1.371$ |
| PSNR=29.1dB | PSNR=28.32dB | PSNR=28.35dB |

**Best quantitative quality (PSNR) is achieved with PnP BM3D** although the convergence condition was not even granted. Qualitatively, PnP BM3D leads to more artifacts (see the unexpected patterns on the cheek for instance). *We have to keep in mind that PSNR is not a perfect measurement of image quality*.
I can find no clear and honnest explanation of the mismatch between theory stating there's no convergence for PnP BM3D and empirical results. It make come from the fact that BM3D roughly denoises in all conditions (whereas neural networks trained on a single noise value behave kind of badly as soon as they're out of their training distribution).


| $s=30$ PnP DRS , denoiser = BM3D $\sigma=40$  , DnCNN $\sigma=40$ , Real SN-DnCNN  $\sigma=40$ |
|:---: |
| ![](figures/PnP_BM3D_DnCNN_RealSNDnCNN.png) |

#### Question 7: (in)sensitivity to initialization
The PnP algorithm is **almost unsensitive to initialization conditions**. We get similar results no matter initializing with zeros, random uniform noise (between 0 and 1). This is a remarkable property and the convergence of DRS was proved no matter the initialization.

![](figures/convergence_speed.png)

| Sensitivity to initialization: Normal vs Zeros |  Crazy vs Random noise |
| :--: | :--: |
| Top row : "normal" init with noisy image | Top row : "crazy" noise between -100 and 100, |
| ![](figures/sensitivity_of_initialization_zeros_vs_normal.png) |![](figures/sensitivity_of_initialization_noisy_vs_crazy.png)|
| Bottom row: init with zeros | Bottom row: uniform noise between 0 and 1 |
| Left to right $\sigma = 5, 15, 40$ | Left to right $\sigma = 5, 15, 40$  |

The $\sigma=5$ case (with no guaranty of convergence) becomes more problematic with bad initialization.


Visualizing convergence visually: $n_\text{iterations PnP} =[0, 3, 6, 9, 20, 100], \gamma= 0.57, \sigma=15$.


|Random uniform noise initialization | Regular initialization from noisy image |  Initialization from zeros |  
| :---: | :---: |:---: |
| ![](figures/convergence_init_from_rand_noise.png) | ![](figures/convergence_init_from_noisy.png) | ![](figures/convergence_init_from_zeros.png) |

------

#### Question 8: PnP ADMM $\Leftrightarrow$ PnP DRS

| PnP DRS  (Douglas–Rachford splitting) vs ADMM (alternating directions method of multipliers) |
| :---: | 
| ![](figures/drs_vs_admm.png) Class slides |
|  ![](figures/drs_vs_pnp_ryu.png)  [Ryu et al - Plug-and-Play Methods Provably Converge with Properly Trained Denoisers](https://arxiv.org/abs/1905.05406) |

-------


| Sanity check PnP ADMM vs DRS - noisy init | Sanity check PnP ADMM vs DRS - zero init|
| :---: | :---: |
| ![](figures/convergence_ADMM_DRS_sanity_noisy.png) | ![](figures/convergence_ADMM_DRS_sanity.png)  |

Convergence sanity check - slight difference at the first iteration (due to the order of operations). See the code for details.

> Proof of the equivalency between DRS and ADMM : **not found by myself.** - Tried for almost 2 hours and could not find the mechanic. [Ryu et al](https://arxiv.org/abs/1905.05406) Section 9.1 has the proof, this is smart.



------
# Inpainting / Missing pixels

#### Question 9

#### Degradation

| Degradation | Mask $A$|
| :--: | :--:  |
|![](figures/inpainting_noisy.png) |![](figures/inpainting_mask.png) |


##### Potential function

Notations: Random variables defined at a given pixel location $i$
- $Y_{i}$ noisy degraded observation with realization $y_{i}$.
- $X_{i}$ unknown ideal gray level (groundtruth) with realization $x_{i}$ - sampled from the prior image distribution.
- $M_{i}$ mask with realization $m_{i}$: the mask is known, $\mathbb{p}({N_{i}=n_{i}})= 1$ . This means we have access to an **oracle on the inpainting mask**. 
- $N_{i}$ additive white gaussian noise with realization $n_{i}$ . $N_i \sim \mathcal{N}(0,s^2)$ 
$$ Y_i = M_i X_i + N_i$$
For  a given observed pixel $y_{i}$ , $ y_i = m_i x_i + n_i$  where $m_i \sim \operatorname{Ber}(p)$ is known and is additive white gaussian noise.


$\mathbb{p}_{Y_{i}|X_{i}} (Y_{i}=y_{i} | X_{i}=x_{i}) = \mathbb{p}_{Y_{i}|X_{i}} (Y_{i}=y_{i} | X_{i}=x_{i}, M_{i}={m_i}) = \mathbb{p}_{N_{i}}(N_{i}=y_i - m_i x_i) \propto e^{-\frac{(y_i - m_i x_i)^2}{2 s^2}}$ since the noise distribution of $N_{i}$ is Gaussian.

We assume that the noise is i.i.d so this formula stands for the whole image of size $(H, W)$ considered as a vector $\R^{H.W}$ (where the noise covariance matrix is a diagonal full of $\sigma^2$ and the degradation mask matrix can also be expressed as a diagonal matrix $A$ filled with 0 and 1 ).

Potential for a single pixel:
$F(x_i, y_i) = \frac{1}{2 s^2}(m_i .x_i - y_i)^2$
which can be re-written for the whole matrix using vector notations.

$F(x, y) = \frac{1}{2 s^2}\|A.x - y\|^2$, note that the degradation operator $A$ is known. 


##### Proximal operator
Let's use a single pixel to avoid heavy pointless matrix computations here as all elements are independant.

$\text{prox}_{\alpha  F}(x) = \text{argmin}_v \frac{1}{2} \|v - x \|^2 + \alpha F(v) = \text{argmin}_v \frac{1}{2} (v - x)^2 + \alpha \frac{1}{2 s^2}(m.v - y)^2  $ .
- If $m=0$, $\text{prox}_{\alpha  F}(x)  =  x$  (*keep previously impainted value*)
- If $m=1$, $\text{prox}_{\alpha  F}(x)$  can be found by zero-ing the derivate of function F. $\frac{df}{dx} = (v-x) + \frac{\alpha}{s^2} (v - y) = 0$ , $v=\frac{s^2 x+\alpha y}{s^2+\alpha}$

$$\text{prox}_{\alpha  F}(x) =\frac{s^2 x+\alpha y}{s^2+\alpha}$$
(*add back a bit of the noisy version where pixels are not masked*)


Finally, the closed form can be written with a single liner (thus vectorized, no need for if conditions).

$$\text{prox}_{\alpha  F}(x) =m*\frac{s^2 x+\alpha y}{s^2+\alpha} + (1-m)*x$$




#### Results analyzis (inpainting)

Using the DRS algorithm

| $\sigma = 5$ | $\sigma = 15$ | $\sigma = 40$ |
|:-----: |:-----: |:-----: |
| $\gamma = 0.2167$ | $\gamma = 1.5e-12 $ | $\gamma = 8.10^{-19}$ | 
|PSNR=5.81dB | PSNR=13.62dB | PSNR 24.09dB |
|SNR=-9.07dB | PSNR = 5.36dB | SNR = 18.65dB |

Results are not good unless using the RealSN DnCNN with $\sigma=40$. Problem looks difficult though to the human eye, results are still impressive.
 
| $\sigma = 5$ | $\sigma = 15$ | $\sigma = 40$ |
|:-----: |:-----: |:-----: |
|![](figures/Inpainting_DRS_PnP_RealSN_DnCNN_sigma=5_gamma=0.2167_PSNR=-9.07dB_SNR=-9.07dB_niter=100_convergence.png) | ![](figures/Inpainting_DRS_PnP_RealSN_DnCNN_sigma=15_gamma=0.0000_PSNR=5.36dB_SNR=5.36dB_niter=100_convergence.png) | ![](figures/Inpainting_DRS_PnP_RealSN_DnCNN_sigma=40_gamma=0.0000_PSNR=18.65dB_SNR=18.65dB_niter=100_convergence.png) |

In [2]:
# Code of the proximal operator for the data term (inpainting with known mask oracel +denoising problem)
def prox_datafit_inpainting(x: np.ndarray, y: np.ndarray, mask: np.ndarray, alpha: float, s: float) -> np.ndarray:
    """Proximal Operator for Inpainting combined with gaussian denoising
    using oracle mask

    Args:
        x (np.ndarray): argument to the proximal operator.
        y (np.ndarray): noisy observation.
        mask (np.ndarray): binary image of the same size as x. Assumed a float array filled with 0. and 1.
        alpha (float): alpha value (data fidelity parameter).
        s (float): noise value (standard deviation of the gaussian noise in y).

    Returns:
        np.ndarray: proximal value
    
    Equations:
    ==========
    f(x) = || M*x - y ||^2 / (2 s^2)

    prox_{alpha f} (x[i]) = (x[i]*s^2 + y[i]*alpha)/(s^2+alpha) if M[i]==1
                          = x[i]                                if M[i]==0
    """

    
    s2 = (s**2)
    unmasked_proximal_value = (x*s2+y*alpha)/(s2+alpha)
    proximal_value = mask * unmasked_proximal_value + (1-mask) * x
    return proximal_value

In [3]:
# Code to generate the inpainting input data (degraded and masked data)
from typing import Tuple
def create_inpainting_input_data(
        clean_image: np.ndarray,
        proba_visible: float=0.1,
        noise_level_255: float = 1.
    ) -> Tuple[np.ndarray, np.ndarray]:
    """Inpainting input data creation

    Args:
        - clean_image (np.ndarray): normalized clean image between 0 and 1 <-> u
        - proba_visible (float, optional): probability p in the Bernoulli that a pixel is visible
        p(m_i=1) = p, p(m_i=0) = 1-p
        Defaults to 0.1. <-> p
        - noise_level_255 (float, optional): _description_. Defaults to 1.

    Returns:
        Tuple[np.ndarray, np.ndarray]:
        - noisy image <-> utilde
        - mask (degradation operator) <-> m
    """
    np.random.seed(42)  # Freeze random number generator for reproducibility
    noise_level_normalized = noise_level_255/255.0 
    mask = 1.*(np.random.uniform(0, 1, clean_image.shape) < proba_visible) # Bernoulli mask
    noise = np.random.normal(loc=0, scale=noise_level_normalized, size=(clean_image.shape))
    noisy = mask*clean_image + noise
    return noisy, mask

## Appendix
- [Ryu et al - Plug-and-Play Methods Provably Converge with Properly Trained Denoisers](https://arxiv.org/abs/1905.05406)

<!-- ---------

## Appendix 

#### Cheatsheet
$\hat{x} = argmin F(x) + \lambda G(x) = \alpha * F(x) + \sigma^2 G(x)$

$\lambda = \frac{\sigma^2}{\alpha}$ , $\alpha<1$ to leave a bit of noise.

- $s$ actual standard deviation in $\tilde{u}$
- $\sigma$ noise standard deviation used to train the denoiser whose residual is considered as the poximal of $G$ -->