# Master Pipeline Orchestrator

## Detailed Analysis

This script serves as the central entry point for the entire project. It orchestrates the workflow, which consists of four main stages: synthetic data generation, machine learning model training, comprehensive visualization, and inference on real astrophysical candidates.

The script manages the lifecycle of the dataset. If a valid dataset (`data/thesis_dataset.csv`) exists, it is loaded; otherwise, a parallelized simulation routine is triggered to generate thousands of Equations of State (EoS) for both Hadronic and Quark stars. It ensures class balance between the two populations before passing the data to the machine learning pipeline and the visualization suite.

## Physics and Math

While this script primarily handles logic and data flow, it enforces statistical balance in the dataset. To prevent bias in the machine learning models, the dataset is undersampled such that:

$$
N_{samples} = \min(N_{hadronic}, N_{quark})
$$

This ensures that the priors for the classification task are balanced ($P(H) \approx P(Q) \approx 0.5$).

The script also utilizes the **homologous scaling** baseline masses calculated in `calculate_baselines.py`, which are essential for the "Inverse Sampling" technique used in the hadronic generation workers.

## Code Walkthrough

### 1. Configuration and Initialization
The script begins by setting up the necessary directories (`data/` and `plots/`) and loading the physical constants and baselines.

```python
# Directory Setup
os.makedirs(DATA_DIR, exist_ok=True)
os.makedirs("plots", exist_ok=True)

# Physics Initialization
baselines = calculate_baselines()
```
The `baselines` dictionary contains the maximum stable masses for pure hadronic models, which are required inputs for the worker functions.

### 2. Data Management and Generation
The script checks for an existing dataset. If the file is missing or the schema does not match the current configuration, a parallel generation process is initiated using `joblib`.

```python
if should_generate:
    # Define tasks (interleaved Hadronic and Quark batches)
    for i in range(num_batches):
        t_type = 'hadronic' if i % 2 == 0 else 'quark'
        tasks.append((t_type, CURVES_PER_BATCH, i, i))

    # Execute Parallel Workers
    res = Parallel(n_jobs=N_JOBS)(
        delayed(run_worker_wrapper)(t, baselines) for t in tqdm(tasks)
    )
```
This block distributes the computational load across all available CPU cores. Each task generates a batch of TOV sequences.

### 3. Class Balancing
After generation, the results are consolidated into a pandas DataFrame. The script then balances the classes to ensure the machine learning models are not biased toward the class that was easier to generate (typically Quark stars).

```python
# Undersample the majority class
min_count = counts.min()
df = df.groupby('Label').sample(n=min_count, random_state=42)
df = df.sample(frac=1, random_state=42).reset_index(drop=True)
```

### 4. Machine Learning Pipeline
The balanced DataFrame is passed to the training module. This function trains the hierarchy of models (Geometric, Model A, B, C, D) and returns the trained objects along with a held-out test set for independent evaluation.

```python
models_dict, X_test, y_test = train_model(df)
```

### 5. Visualization and Inference Suite
Finally, the script calls a series of specialized plotting functions. These cover everything from ML diagnostics (ROC curves, confusion matrices) to physical insights (Sound speed profiles, M-R diagrams, Stability windows).

It also performs inference on real astrophysical objects (e.g., GW170817, PSR J0740+66) using the `analyze_candidates` function.

```python
# ML Diagnostics
plot_diagnostics(models_dict, X_test, y_test)

# Physics Manifolds
plot_grand_summary(df)
plot_physics_manifold(df)

# Inference on Real Data
analyze_candidates(models_dict)
```

## Visualization Output

This script does not produce plots directly but acts as the trigger for the entire visualization suite. Upon successful execution, the `plots/` directory will contain:

*   **ML Diagnostics**: ROC curves, calibration plots, and learning curves.
*   **Physics Manifolds**: Mass-Radius diagrams, Tidal Deformability plots, and EoS comparisons.
*   **Microphysics**: 3D phase space plots, stability windows, and sound speed correlations.
*   **Interpretability**: Partial Dependence Plots (PDP) and SHAP summaries.