## 1. Expression of $ Z(\alpha, \beta)$

The partition function $ Z(\alpha, \beta) $ ensures that $ p(x) $ is a valid probability distribution:

$$
Z(\alpha, \beta) = \sum_{x \in \{0,1\}^n} \exp \left( \alpha \sum_i x_i + \beta \sum_{i \sim j} \mathbf{1}_{x_i = x_j} \right)
$$

where the sum runs over all possible configurations \( x \) of the system.

### Why is $ Z(\alpha, \beta)$ difficult to compute?
- The sum has **exponential complexity** in $ n $.
- Computing it exactly requires summing over $ 2^n $ configurations, which is infeasible for large $ n $.
- Instead, we use **sampling methods** (like Monte Carlo methods) to approximate it.






## Likelihood and Log-Likelihood 

Let $ x^1, x^2, \dots, x^n $ represent the **different observed configurations** of the Ising model, where each $ x^i $ is a vector of binary values representing the spin configurations.

### Likelihood Function

The **likelihood** is the **product of the probabilities** of these observed configurations given the model parameters $ \alpha $ and $ \beta $. For each configuration $ x^i $, the probability is given by the Ising model distribution:

$$
p(x^i \mid \alpha, \beta) = \frac{1}{Z(\alpha, \beta)} \exp \left( \alpha \sum_j x^i_j + \beta \sum_{j \sim k} \mathbf{1}_{x^i_j = x^i_k} \right)
$$

The **likelihood** is the product of the probabilities for all observed configurations $ x^1, x^2, \dots, x^n $:

$$
L(\alpha, \beta \mid x^1, x^2, \dots, x^n) = \prod_{i=1}^{n} p(x^i \mid \alpha, \beta)
$$

Substituting the probability for each observation:

$$
L(\alpha, \beta \mid x^1, x^2, \dots, x^n) = \prod_{i=1}^{n} \frac{1}{Z(\alpha, \beta)} \exp \left( \alpha \sum_j x^i_j + \beta \sum_{j \sim k} \mathbf{1}_{x^i_j = x^i_k} \right)
$$

### Log-Likelihood Function

To simplify the computation, we take the **logarithm** of the likelihood to obtain the **log-likelihood**:

$$
\log L(\alpha, \beta \mid x^1, x^2, \dots, x^n) = \sum_{i=1}^{n} \left( \alpha \sum_j x^i_j + \beta \sum_{j \sim k} \mathbf{1}_{x^i_j = x^i_k} \right) - n \log Z(\alpha, \beta)
$$

Where:
- $ x^i_j $ refers to the $ j $-th element (spin) of the $ i $-th configuration vector $ x^i $.
- The sum $ \sum_j x^i_j $ is the total sum of spins in the $ i $-th configuration.
- The sum $ \sum_{j \sim k} \mathbf{1}_{x^i_j = x^i_k} $ counts the number of neighboring pairs of spins that are equal in the $ i $-th configuration.
- $ Z(\alpha, \beta) $ is the partition function, which normalizes the distribution.

---

### Why is MLE difficult?
- Computing $ Z(\alpha, \beta)$ is **intractable**.
- Gradient-based optimization is challenging because computing gradients requires evaluating $ Z(\alpha, \beta)$.

👉 **Solution**: Instead of MLE, we use **Approximate Bayesian Computation (ABC)**.

------------------

In [1]:
from modules.ABC_reject import *
from modules.Gibbs_sampler import *

In [4]:
# Example run
n = 10  # Grid size
true_alpha, true_beta = 0.5, 0.8

observed_grid = run_gibbs(n, true_alpha, true_alpha, steps=200)
# observed_grid = ising_model(n, true_alpha, true_beta)
obs_stats = sufficient_statistics(observed_grid)

theta_samples = abc_reject(
    obs_stats,
    prior_alpha=(0, 2),
    prior_beta=(0, 2),
    n=n,
    epsilon=0.5,
    num_samples=1000,
)
print("Estimated alpha and beta:", np.mean(theta_samples, axis=0))
min_idx = np.argmin(theta_samples[:, 2])

# Get the corresponding alpha and beta
best_alpha, best_beta = theta_samples[min_idx, 0], theta_samples[min_idx, 1]

print("nearest alpha and beta: ", (best_alpha, best_beta))
print(
    " MSE :",
    0.5
    * (
        abs(0.5 - np.mean(theta_samples, axis=0)[0])
        + abs(0.8 - np.mean(theta_samples, axis=0)[1])
    ),
)

# Approximate Bayesian Computation (ABC-Reject) for Ising Model

## 1. Understanding the Sufficient Statistics

In the Ising model, we define the **sufficient statistics**:

$ S(x) = \sum_i x_i, \quad S_2(x) = \sum_{i \sim j} \mathbf{1}_{x_i = x_j} $

### What is $ S_2 $?
- It represents the number of **neighboring pairs** where the spins are the same.
- This is a key statistic because it captures the effect of the interaction parameter $ \beta $, which favors alignment.



## 2. Choosing the Distance Function

Since ABC-Reject compares **simulated** and **observed** data, we define a distance function:

$ d(S(x_{\text{obs}}), S(x_{\text{sim}})) = \left| S(x_{\text{obs}}) - S(x_{\text{sim}}) \right| + \left| S_2(x_{\text{obs}}) - S_2(x_{\text{sim}}) \right| $

### Why this choice?
- $ S(x) $ captures the **overall magnetization** (effect of $ \alpha $).
- $ S_2(x) $ captures **neighboring alignment** (effect of $ \beta $).
- The absolute difference ensures simple, interpretable comparison.
- A **small threshold $ \epsilon $** ensures we accept only close matches.



## 3. ABC-Reject Algorithm Explained

Since computing the likelihood function is intractable, **ABC** estimates parameters using simulation:

1. **Generate observed data** $ x_{\text{obs}} $ from the Ising model with **unknown** $ (\alpha^*, \beta^*) $.
2. **For many iterations**:
   - Sample $ (\alpha, \beta) $ from a **prior distribution**.
   - Simulate $ x_{\text{sim}} $ using the Ising model with $ (\alpha, \beta) $.
   - Compute sufficient statistics $ S_2(x_{\text{sim}}) $.
   - **Accept** $ (\alpha, \beta) $ if the distance is **less than** $ \epsilon $.
3. The set of **accepted** $ (\alpha, \beta) $ values approximates their posterior distribution.



## 4. Why Use ABC-Reject?

- **No need for likelihood computation** (which is intractable due to $ Z(\alpha, \beta) $).
- **Flexible**—can be applied to models with unknown normalizing constants.
- **Drawback**: Inefficient for high-dimensional problems (many rejections).

---


## Why are $ S(x) $ and $ S_2(x) $ sufficient statistics for the Ising Model?

For the Ising model, the parameters we want to estimate are $ \alpha $ and $ \beta $, which affect the magnetization and the interactions between neighboring spins, respectively.

- **Magnetization (summed spins):**

  $$
  S(x) = \sum_i x_i
  $$

  This statistic captures the overall magnetic alignment (effect of $ \alpha $) in the system.

- **Neighbor interactions:**

  $$
  S_2(x) = \sum_{i \sim j} \mathbf{1}_{x_i = x_j}
  $$

  This statistic counts how many neighboring spins are aligned (effect of $ \beta $).


### Intuition:

- The **magnetization statistic** $ S(x) $ captures how many spins are aligned, which is directly influenced by $ \alpha $ (external field).
- The **interaction statistic** $ S_2(x) $ captures how many neighboring spins are aligned, which is directly influenced by $ \beta $ (interaction strength).
- Together, these statistics summarize the system in a way that no additional information (about individual spins or the precise configuration) is needed to estimate $ \alpha $ and $ \beta $.

---

### What Does "Sufficiency" Mean in Practice?

- **In Ising model**: Knowing the total magnetization and the number of aligned neighbors in the system gives us everything we need to estimate the interaction strength $ \beta $ and external field $ \alpha $.
- **For parameter estimation**: Once we have $ S(x) $ and $ S_2(x) $, we can use methods like Maximum Likelihood Estimation (MLE) or Approximate Bayesian Computation (ABC) to find the values of $ \alpha $ and $ \beta $ that best explain the observed data.




RESULTS for $\alpha$ = 0.5 and $\beta$ = 0.8 : 

| $ n $ (System Size) | Number of Samples | $ \epsilon $ (Tolerance) | $ \alpha $ Estimate | $ \beta $ Estimate | Time (s) | MSE |
|---------------------|-------------------|--------------------------|---------------------|--------------------|----------|-----|
| 10                  | 1000             | 0.001                      | 0.11                | 1.01               | 120       | 0.31 |
| 20                  | 1000              | 0.001                     | 0.31                | 0.90               | 112       | 0.15 |
| 30                  | 1000              | 0.001                     | 0.92                | 0.71               | 114       | 0.27 |
