# ML Diagnostics Generator (ROC, Calibration, Violin)

## Detailed Analysis

This script is used to evaluate the performance and reliability of the machine learning classifiers. It generates three key diagnostic figures: a Receiver Operating Characteristic (ROC) curve comparison, a Calibration (Reliability) diagram, and a Split Violin plot visualizing the physical separation of the classes.

These plots are essential for quantifying how well the models distinguish between Hadronic and Quark stars (AUC score) and whether the predicted probabilities correspond to true empirical frequencies (calibration). The violin plot provides a physical context, showing where in the Mass-Radius parameter space the two populations differ most.

## Physics and Math

### 1. Classification Metrics
The performance of the binary classifiers is evaluated using standard statistical metrics.

**ROC Curve and AUC:**
The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. The Area Under the Curve (AUC) represents the probability that the classifier ranks a random positive instance (Quark star) higher than a random negative instance (Hadronic star).

**Calibration (Reliability):**
For the output of a classifier to be interpreted as a probability $P(\text{Quark})$, it must be calibrated. A perfectly calibrated model satisfies:

$$
P(y = 1 \mid \hat{p} = x) = x
$$

where $\hat{p}$ is the predicted probability and $y$ is the true label. The script assesses this using a calibration curve computed via quantile binning.

### 2. Physical Regimes
To visualize the physical differences between the populations, the test set is divided into three mass regimes:
1.  **Low Mass:** $M < 1.1 M_{\odot}$ (Crust-dominated / low density).
2.  **Canonical Mass:** $1.1 \le M \le 1.7 M_{\odot}$ (Typical observational range).
3.  **High Mass:** $M > 1.7 M_{\odot}$ (Core-dominated / high density).

## Code Walkthrough

### 1. Configuration and Model Setup
The function `plot_diagnostics` accepts a dictionary of trained models and the test dataset. Feature sets are defined for each model level (`Geo` through `D`) to ensure the correct inputs are passed to each classifier.

```python
feature_sets = {
    'A': ['Mass', 'Radius', 'LogLambda'],
    'B': ['Mass', 'Radius', 'LogLambda', 'Eps_Central'],
    # ...
    'Geo': ['Mass', 'Radius']
}
```

### 2. ROC Curve Comparison
The script iterates through the models in order of increasing complexity. For each model, the predicted probabilities are computed, and the false positive and true positive rates are calculated.

```python
y_probs = model.predict_proba(X_test_all[features])[:, 1]
fpr, tpr, _ = roc_curve(y_test, y_probs)
roc_auc = auc(fpr, tpr)
```
These curves are plotted on a single axis to demonstrate the performance gain achieved by adding physical features (like Tidal Deformability or Topology) compared to the baseline Geometric model.

### 3. Calibration Curve
The reliability of the probabilities is checked using `calibration_curve`.

```python
# Strategy='quantile' ensures bins have equal number of samples
prob_true, prob_pred = calibration_curve(y_test, y_probs, 
                                         n_bins=10, strategy='quantile')
```
Quantile binning is used because the models are highly accurate, meaning most predictions cluster near 0 or 1. Uniform binning would result in empty bins in the middle range.

### 4. Split Violin Plot
This section visualizes the distribution of Radii for Hadronic vs. Quark stars across mass bins.
*   **Data Preparation:** The test set is augmented with the true labels.
*   **Binning:** A `Mass_Bin` column is created using `pd.cut`.
*   **Plotting:** `sns.violinplot` with `split=True` draws the Hadronic distribution on the left half and the Quark distribution on the right half of each violin.

## Visualization Output

The script saves three files to the `plots/` directory:

1.  **`fig_6_roc_combined.pdf`**:
    *   **X-axis:** False Positive Rate.
    *   **Y-axis:** True Positive Rate.
    *   **Content:** A comparison of ROC curves. The Geometric model typically shows the lowest AUC, while models including topological features (Model D) approach perfect classification (AUC $\approx$ 1.0).

2.  **`fig_7b_calibration.pdf`**:
    *   **X-axis:** Mean Predicted Probability.
    *   **Y-axis:** True Positive Fraction.
    *   **Content:** A reliability diagram. Points falling on the diagonal dotted line indicate that the model's confidence scores are accurate (e.g., when the model predicts 80% probability, it is correct 80% of the time).

3.  **`fig_8_violin_radius.pdf`**:
    *   **X-axis:** Mass Regime (Low, Canonical, High).
    *   **Y-axis:** Radius [km].
    *   **Content:** A split violin plot.
        *   **Green (Left):** Hadronic Radius distribution.
        *   **Magenta (Right):** Quark Radius distribution.
        *   **Insight:** This plot reveals that while distributions overlap significantly at low masses, they diverge at high masses, which explains why the classifier performance improves when adding features that capture high-mass behavior.