# Stability and Convergence Report (DreamSim Pairwise Optimization)

## 1. Problem Setup and Notation

We evaluate whether the iterative selection scheme converges toward a reference image under a pairwise ranking signal. Let the reference image be $x_{ref}$ and the current baseline be $x_b$. At each epoch $t$, a batch of $B$ candidates $\{x_{t,b}\}_{b=1}^B$ is generated from a latent vector $z_t$ selected by a Thompson-sampling policy. A perceptual distance $d(\cdot,\cdot)$ is computed using DreamSim; lower is better.

**Selection rule (per candidate):**


$$y_{t,b} = \mathbb{I}[d(x_{t,b}, x_{ref}) < d(x_b, x_{ref})]$$

**Batch best:**


$$\hat{b}_t = \arg\min_b d(x_{t,b}, x_{ref}),\ d_t = d(x_{t,\hat{b}_t}, x_{ref})$$

**Baseline update:**


$$x_b \leftarrow x_{t,\hat{b}_t}\quad\text{iff}\quad d_t < d(x_b, x_{ref})$$

The learned preference model is a logistic Thompson sampler using pairwise comparisons of the form $y_{t,b}$ with input features $\Delta x = x_t - x_b$. The generative model itself is fixed; only the sampling distribution over $z$ is adapted.

## 2. Stability and Convergence Criteria

We define two metrics to assess stability and convergence:

1) **Baseline improvement count** $N_{imp}$: number of epochs where $d_t$ strictly improves over the current baseline.

2) **Best-so-far updates** $N_{bsf}$: number of epochs where $d_t$ is the best value observed so far (monotone record).


## 3. Experiments

We report four runs with different baseline initialization strategies. Each run uses the same pairwise selection rule and DreamSim distance.

### 3.1 Summary Table

| Test | Epochs | Baseline d(ref) | Improved | Improved rate | Best d | Best epoch | First d | Last d | First10 mean | Last10 mean | Best-so-far updates |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| Test 1 (ref = baseline) | 3 | 0.4342 | 0 | 0.000 | 0.5374 | 3 | 0.5578 | 0.5374 | 0.5479 | 0.5479 | 3 |
| Test 2 (random baseline) | 100 | 0.6205 | 5 | 0.050 | 0.3147 | 25 | 0.5445 | 0.4862 | 0.4970 | 0.4843 | 6 |
| Test 3 (similar baseline) | 100 | 0.2915 | 0 | 0.000 | 0.4090 | 56 | 0.5156 | 0.5555 | 0.5154 | 0.5331 | 4 |
| Test 4 (K=16 best baseline) | 10 | 0.4608 | 1 | 0.100 | 0.3852 | 8 | 0.5219 | 0.4892 | 0.4937 | 0.4937 | 4 |

### 3.2 Per-test Notes

**Test 1 (ref = baseline).** Baseline image is the reference (or intended to be identical). Baseline distance $d(x_b, x_{ref}) = 0.4342$. Best observed candidate distance was 0.5374 at epoch 3. Improved epochs: 0 / 3.

**Test 2 (random baseline).** Baseline generated randomly by the model. Baseline distance $d(x_b, x_{ref}) = 0.6205$. Best observed candidate distance was 0.3147 at epoch 25. Improved epochs: 5 / 100.

**Test 3 (similar baseline).** Baseline chosen to be visually close to reference. Baseline distance $d(x_b, x_{ref}) = 0.2915$. Best observed candidate distance was 0.4090 at epoch 56. Improved epochs: 0 / 100.

**Test 4 (K=16 best baseline).** Baseline selected as the closest among K=16 random samples. Baseline distance $d(x_b, x_{ref}) = 0.4608$. Best observed candidate distance was 0.3852 at epoch 8. Improved epochs: 1 / 10.

## 4. Interpretation (Stability vs. Convergence)

**Observation A — Stable but non-convergent regimes.**

Tests 1 and 3 show zero improvements. In Test 3, the baseline is already very close to the reference ($d=0.2915$), while the best candidate found is much worse ($d=0.4090$). Under the strict update rule, the process cannot improve and thus remains stable but non-convergent.


**Observation B — Rare improvements when the baseline is weak.**

Test 2 (random baseline) achieves 5 improvements in 100 epochs (5% rate) and reaches its best at epoch 25. However, most subsequent epochs regress relative to the best-so-far, indicating limited exploration gains and a plateau.


**Observation C — K-best baseline helps early but still saturates.**

Test 4 (K=16 best baseline) yields 1 improvement in 10 epochs and a lower minimum distance than Test 2 in a shorter run, but no clear trend toward convergence.

## 5. Why the System Fails to Converge (Some guess)

1) **Ceiling effect from a strong baseline.** If $d(x_b, x_{ref})$ is already below the reachable distribution of the generator under the current prompt/latent parameterization, the update rule never fires. This is exactly what happens in Test 3.

2) **Generator is fixed; learning only reshapes sampling.** The policy updates the distribution over latents, but the generative model is not optimized to minimize DreamSim. If the model cannot represent the reference within its prompt/latent manifold, pairwise signals cannot force convergence.

3) **Baseline latent mismatch (external baseline).** When a baseline image is provided externally, its latent $z_b$ is not known and is sampled randomly in feature space. The pairwise learner sees comparisons in $\Delta x$ that may not correspond to the true baseline, reducing the informativeness of the preference signal.

4) **Exploration radius fixed.** The radius $R$ is constant here (1.5). If the optimal region lies outside the current sampling radius, convergence cannot occur.

## 6. Selection and Ranking Formalization

Given a batch of candidates, the ranking uses DreamSim distance as the utility proxy:


$$\text{score}(x) = -d(x, x_{ref})$$

**Pairwise label:**


$$y_{t,b} = \mathbb{I}[\text{score}(x_{t,b}) > \text{score}(x_b)]$$

**Batch ranking:**


$$x_{t,\hat{b}_t} = \arg\max_{b} \text{score}(x_{t,b})$$

This ranking is deterministic given DreamSim and is stable under repeated evaluation (no label flips observed for identical pairs).

## 7. Conclusions

- The system is **stable** under pairwise selection (labels are consistent), but **convergence is not guaranteed**.

- When the baseline is strong, the strict update rule prevents any improvement (Test 3).

- When the baseline is weak, improvements occur but saturate early (Test 2).

- The current setup optimizes sampling, not the generator; thus it cannot exceed the generator’s representational limits.

