

# Smooth SCAD Monte Carlo Study (SSCAD_ALL.ipynb)

## Produces the Table 2 in Section 8 in paper: "Smooth SCAD: A Differentiable SCAD Rule for Wavelet Thresholding with Risk-Based Tuning."

This notebook implements a controlled Monte Carlo study comparing several wavelet thresholding rules under additive Gaussian noise. The goal is to assess mean squared error (MSE) performance across representative test signals and to generate the summary results reported in Table~\ref{table:dj10247} of the manuscript.

The study focuses on **oracle-tuned thresholding performance**, not on SURE minimization. All conclusions should be interpreted in that context.

---

## 1. Model and simulation setup

We work in the standard orthonormal sequence model
$$
d_i = \theta_i + \epsilon_i, \qquad \epsilon_i \sim N(0,\sigma^2),
$$
which arises after applying an orthogonal discrete wavelet transform to a noisy signal.

Four canonical Donoho--Johnstone test signals are considered:

- Doppler (spatially varying oscillation),
- Blocks (piecewise constant with discontinuities),
- HeaviSine (smooth with jumps),
- Bumps (heterogeneous localized spikes).

Each signal has length $N=1024$. Noise is Gaussian with variance $\sigma^2=1$.  
The signal is rescaled so that the signal-to-noise ratio (variance ratio) satisfies
$$
\mathrm{SNR} = \frac{\mathrm{Var}(f)}{\mathrm{Var}(\epsilon)} = 7.
$$

Wavelet transforms are computed using standard orthonormal compactly supported filters commonly used in the denoising literature (e.g., Symmlet, Haar, Daubechies), chosen to match the signal characteristics.

---

## 2. Shrinkage rules compared

The following coefficientwise shrinkage rules are applied in the wavelet domain:

1. **Hard thresholding**, with threshold $\lambda$.
2. **Soft thresholding**, with threshold $\lambda$.
3. **Classical SCAD thresholding**, with shape parameter $a=3.7$ (Fan--Li default).
4. **Smooth SCAD thresholding**, using a raised-cosine transition with fixed shape parameter $a=3.0$.

For Smooth SCAD, the transition region is $[\lambda, a\lambda]$, within which shrinkage decays smoothly from full attenuation to no attenuation. The value $a=3.0$ is fixed across all signals and Monte Carlo runs.

No joint optimization over $(\lambda,a)$ is performed in this notebook.

---

## 3. Threshold selection strategy

Thresholds are selected **by oracle minimization of empirical MSE**, not by SURE.

For each Monte Carlo replication and for each shrinkage rule:

- A fixed grid of candidate thresholds $\lambda$ is constructed over $[0, \lambda_U]$.
- The universal upper bound is
$$
\lambda_U = \sigma \sqrt{2\log N},
$$
where $\log$ denotes the natural logarithm.
- For each $\lambda$ in the grid, the reconstruction MSE is computed against the known clean signal.
- The value of $\lambda$ minimizing the true MSE is selected (oracle choice).

This procedure isolates the intrinsic bias--variance behavior of each shrinkage rule, independent of threshold selection noise.

---

## 4. Monte Carlo design and reproducibility

For each signal and each shrinkage rule:

- $M=1000$ independent Monte Carlo replications are generated.
- In each replication, fresh Gaussian noise is added, thresholding is applied, and the reconstruction MSE is recorded.
- The reported performance metrics are:
  - AMSE: the average MSE over the $M$ replications,
  - SD: the Monte Carlo standard deviation of the MSE.

The Monte Carlo driver supports explicit seeding via a `seed` argument.  
When `seed` is provided, results are fully reproducible.

In the final version of this notebook, the seed will be fixed (e.g., `seed=2026`) to ensure exact reproducibility of all reported numerical results.

---

## 5. Interpretation of the results

The table produced by this notebook summarizes oracle-tuned AMSE and SD values for all four signals at $N=1024$ and $\mathrm{SNR}=7$.

Key observations:

- Both SCAD and Smooth SCAD substantially outperform universal hard and soft thresholding.
- Smooth SCAD consistently improves upon classical SCAD across all signals.
- The largest gains occur for **Blocks** and **Bumps**, where suppressing noise without introducing excessive bias is critical.
- Improvements for Doppler and HeaviSine are more modest but systematic.

Because thresholds are chosen by oracle MSE minimization, these results reflect the **best achievable performance** of each shrinkage rule under ideal tuning.

---

## 6. Scope and limitations

This notebook does **not** implement:

- SURE minimization of $\lambda$,
- joint tuning of $(\lambda,a)$,
- level-dependent thresholding.

Those extensions are discussed theoretically in the manuscript and can be implemented separately. The present study is intended as a clean, controlled comparison of shrinkage rules under oracle tuning.

---

## 7. Summary

This notebook provides a reproducible Monte Carlo benchmark demonstrating that Smooth SCAD:

- retains the sparsity and low-bias advantages of SCAD,
- benefits from smooth transition behavior,
- achieves consistently lower MSE than classical SCAD under oracle tuning.

The numerical results reported in the manuscript table are generated directly by this notebook.





In [1]:
import numpy as np
import pywt
import argparse

# ============================================================
# Helper: MSE
# ============================================================
def mse(y_hat, y_true):
    return np.mean((y_hat - y_true) ** 2)


# ============================================================
# SCAD and Smooth SCAD shrinkage
# ============================================================
def scad_threshold(coeffs, lam, a=3.7):
    """
    Classical SCAD shrinkage (Fan & Li).
    coeffs: 1D numpy array of coefficients
    lam:    main threshold
    a:      SCAD shape parameter (> 2)
    """
    out = np.zeros_like(coeffs)
    for i, d in enumerate(coeffs):
        ad = abs(d)
        if ad <= lam:
            out[i] = 0.0
        elif ad <= 2 * lam:
            out[i] = np.sign(d) * (ad - lam)
        elif ad <= a * lam:
            out[i] = np.sign(d) * (((a - 1) * ad - a * lam) / (a - 2))
        else:
            out[i] = d
    return out


def smooth_scad_threshold(coeffs, lam, a=3.0):
    """
    Smooth SCAD shrinkage using raised-cosine generator.
    coeffs: 1D numpy array of coefficients
    lam:    main threshold
    a:      shape parameter (> 1)
    """
    out = np.zeros_like(coeffs)
    for i, d in enumerate(coeffs):
        absd = abs(d)
        if absd <= lam:
            out[i] = 0.0
        elif absd >= a * lam:
            out[i] = d
        else:
            # 0 < (absd - lam)/((a-1) * lam) < 1
            s = (absd - lam) / ((a - 1) * lam)
            h = lam * (np.cos((np.pi / 2.0) * s) ** 2)
            out[i] = np.sign(d) * (absd - h)
    return out


# ============================================================
# Donohoâ€“Johnstone test signals
# ============================================================
def doppler(N):
    x = np.linspace(0, 1, N)
    # Standard Doppler test function
    f = np.sqrt(x * (1 - x)) * np.sin((2 * np.pi * 1.05) / (x + 0.05))
    return f


def blocks(N):
    x = np.linspace(0, 1, N)
    t = np.array([0.1, 0.13, 0.15, 0.23, 0.25,
                  0.40, 0.44, 0.65, 0.76, 0.78,
                  0.81])
    h = np.array([4, -5, 3, -4, 5,
                  -4.2, 2.1, 4.3, -3.1, 2.1,
                  -4.2])
    f = np.zeros_like(x)
    for tk, hk in zip(t, h):
        f += hk * (x > tk)
    return f


def heavisine(N):
    x = np.linspace(0, 1, N)
    f = 4.0 * np.sin(4.0 * np.pi * x) \
        - np.sign(x - 0.3) \
        - np.sign(0.72 - x)
    return f


def bumps(N):
    x = np.linspace(0, 1, N)
    t = np.array([0.1, 0.13, 0.15, 0.23, 0.25,
                  0.40, 0.44, 0.65, 0.76, 0.78,
                  0.81, 0.84, 0.85])
    h = np.array([4.0, 5.0, 3.0, 4.0, 5.0,
                  4.2, 2.1, 4.3, 3.1, 5.1,
                  4.2, 3.3, 5.3])
    w = np.array([0.005, 0.005, 0.006, 0.006, 0.005,
                  0.010, 0.010, 0.010, 0.005, 0.008,
                  0.005, 0.008, 0.005])

    f = np.zeros_like(x)
    for tk, hk, wk in zip(t, h, w):
        f += hk / (1.0 + np.abs((x - tk) / wk) ** 4)
    return f


# ============================================================
# Universal threshold
# ============================================================
def univ_thresh(sigma, N):
    return sigma * np.sqrt(2.0 * np.log(N))


# ============================================================
# Monte Carlo driver for one signal
# ============================================================
def run_mc_signal(
    signal_name,
    generator,
    wavelet,
    N=1024,
    SNR=7.0,
    sigma=1.0,
    M=1000,
    seed=2026
):
    """
    Monte Carlo experiment for a single signal.

    Returns: (AMSE_vector, STD_vector)
        each is length 4: [Hard, Soft, SCAD, SmoothSCAD]
    """
    if seed is not None:
        np.random.seed(seed)

    base = generator(N)
    var_base = np.var(base)
    scale = np.sqrt(SNR * sigma**2 / var_base)
    f = base * scale

    MSE_hard = []
    MSE_soft = []
    MSE_scad = []
    MSE_sscad = []

    for _ in range(M):
        noise = sigma * np.random.randn(N)
        y = f + noise

        # Wavelet decomposition
        coeffs = pywt.wavedec(y, wavelet, mode='per')
        cA = coeffs[0]
        dcoeffs = coeffs[1:]

        lam_u = univ_thresh(sigma, N)

        # Hard universal
        d_hard = [pywt.threshold(c, lam_u, mode='hard') for c in dcoeffs]
        rec_hard = pywt.waverec([cA] + d_hard, wavelet, mode='per')
        MSE_hard.append(mse(rec_hard, f))

        # Soft universal
        d_soft = [pywt.threshold(c, lam_u, mode='soft') for c in dcoeffs]
        rec_soft = pywt.waverec([cA] + d_soft, wavelet, mode='per')
        MSE_soft.append(mse(rec_soft, f))

        # SCAD: oracle over lambda grid
        lam_grid = np.linspace(0.1 * lam_u, lam_u, 25)
        best_scad = np.inf
        for lam in lam_grid:
            d_scad = [scad_threshold(c, lam, a=3.7) for c in dcoeffs]
            rec_scad = pywt.waverec([cA] + d_scad, wavelet, mode='per')
            mse_sc = mse(rec_scad, f)
            if mse_sc < best_scad:
                best_scad = mse_sc
        MSE_scad.append(best_scad)

        # Smooth SCAD: oracle over same lambda grid
        best_ss = np.inf
        for lam in lam_grid:
            d_ss = [smooth_scad_threshold(c, lam, a=3.0) for c in dcoeffs]
            rec_ss = pywt.waverec([cA] + d_ss, wavelet, mode='per')
            mse_ss = mse(rec_ss, f)
            if mse_ss < best_ss:
                best_ss = mse_ss
        MSE_sscad.append(best_ss)

    MSE_hard = np.array(MSE_hard)
    MSE_soft = np.array(MSE_soft)
    MSE_scad = np.array(MSE_scad)
    MSE_sscad = np.array(MSE_sscad)

    AMSE = np.array([
        MSE_hard.mean(),
        MSE_soft.mean(),
        MSE_scad.mean(),
        MSE_sscad.mean()
    ])

    STD = np.array([
        MSE_hard.std(ddof=1),
        MSE_soft.std(ddof=1),
        MSE_scad.std(ddof=1),
        MSE_sscad.std(ddof=1)
    ])

    return AMSE, STD


# ============================================================
# Main: run all four signals and print LaTeX table
# ============================================================
def main():
    parser = argparse.ArgumentParser(
        description="Monte Carlo AMSE / Std(MSE) for four test signals "
                    "with Hard, Soft, SCAD, Smooth SCAD."
    )
    parser.add_argument("--N", type=int, default=1024,
                        help="Signal length (default: 1024)")
    parser.add_argument("--snr", type=float, default=7.0,
                        help="SNR in terms of variance ratio (default: 7)")
    parser.add_argument("--M", type=int, default=1000,
                        help="Number of Monte Carlo replications (default: 1000)")
    parser.add_argument("--sigma", type=float, default=1.0,
                        help="Noise standard deviation (default: 1)")
    parser.add_argument("--seed", type=int, default=None,
                        help="Random seed (default: None)")
    args, unknown = parser.parse_known_args()
    N = args.N
    SNR = args.snr
    M = args.M
    sigma = args.sigma
    seed = args.seed

    signals = [
        ("Doppler",   doppler,   "sym4"),
        ("Blocks",    blocks,    "haar"),
        ("HeaviSine", heavisine, "sym4"),
        ("Bumps",     bumps,     "db3"),
    ]

    results = {}

    for name, gen, wav in signals:
        print(f"Running {name}: N={N}, wavelet={wav}, SNR={SNR}, "
              f"sigma={sigma}, M={M}")
        AMSE, STD = run_mc_signal(
            name,
            gen,
            wav,
            N=N,
            SNR=SNR,
            sigma=sigma,
            M=M,
            seed=seed
        )
        results[name] = (AMSE, STD)

    # Print LaTeX table
    print("\n% LaTeX table: AMSE and Std(MSE) for four signals")
    print("\\begin{table}[h!]")
    print("\\centering")
    print("\\caption{AMSE and standard deviation of MSE over "
          f"$M={M}$ runs, $N={N}$, SNR (variance ratio) = {SNR}, "
          "$\\sigma=1$.}")
    print("\\label{tab:amse_std_four_signals}")
    print("\\begin{tabular}{lcccc}")
    print("\\hline")
    print("Signal & Hard & Soft & SCAD & Smooth SCAD \\\\")
    print("\\hline")

    for name, _, _ in signals:
        AMSE, STD = results[name]
        # AMSE row
        print(f"{name} & "
              f"{AMSE[0]:.4e} & {AMSE[1]:.4e} & {AMSE[2]:.4e} & {AMSE[3]:.4e} \\\\")
        # Std(MSE) row
        print(f" & "
              f"({STD[0]:.4e}) & ({STD[1]:.4e}) & ({STD[2]:.4e}) & ({STD[3]:.4e}) \\\\")
        print("\\hline")

    print("\\end{tabular}")
    print("\\end{table}")


if __name__ == "__main__":
    main()


Running Doppler: N=1024, wavelet=sym4, SNR=7.0, sigma=1.0, M=1000
Running Blocks: N=1024, wavelet=haar, SNR=7.0, sigma=1.0, M=1000
Running HeaviSine: N=1024, wavelet=sym4, SNR=7.0, sigma=1.0, M=1000
Running Bumps: N=1024, wavelet=db3, SNR=7.0, sigma=1.0, M=1000

% LaTeX table: AMSE and Std(MSE) for four signals
\begin{table}[h!]
\centering
\caption{AMSE and standard deviation of MSE over $M=1000$ runs, $N=1024$, SNR (variance ratio) = 7.0, $\sigma=1$.}
\label{tab:amse_std_four_signals}
\begin{tabular}{lcccc}
\hline
Signal & Hard & Soft & SCAD & Smooth SCAD \\
\hline
Doppler & 1.5513e-01 & 4.0264e-01 & 1.3505e-01 & 1.3188e-01 \\
 & (1.8200e-02) & (3.2646e-02) & (1.4472e-02) & (1.4609e-02) \\
\hline
Blocks & 2.0078e-01 & 6.6805e-01 & 1.5286e-01 & 1.4484e-01 \\
 & (2.7987e-02) & (4.4013e-02) & (1.8624e-02) & (1.9502e-02) \\
\hline
HeaviSine & 7.7260e-02 & 1.0477e-01 & 6.8234e-02 & 6.5714e-02 \\
 & (1.5162e-02) & (1.1461e-02) & (1.0011e-02) & (1.0515e-02) \\
\hline
Bumps & 2.9867e-01 & 9.5