In [None]:
# Allow imports from parent directory - robust incase run twice 
import os, sys
if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")
    sys.path.append(os.path.abspath(".")) 

# Imports for this notebook 
from astropy.table import Table
from analysis import compare_assignments

# **Investigating Apogee Issues in Resolving Aurora**
## **A Comparison with GALAH**
### High Dimensional Evidence for need of more dimensions
- The Gaussian Mixture Model performance shows that Aurora is the only meaningful population (ie. non background noise) lost when reducing from 7 to 5 Gaussian components in APOGEE.
- This show that high-dimensional clustering fails to reliably distinguish Aurora, particularly from GS/E 2 (the “alpha plateau” group) and the Splash population — a contrast to GALAH, where Aurora is more cleanly separated.
- This shows the issues of the current dimensionality to fully resolve overlapping populations.
- Suggests the need for additional informative dimensions, either chemical or dynamical.
### Low Dimensional Evidence for the need of dimensions
- Dimensionality reduction has helped support this case as we look at the `evolution` and instability of the cluster boundaries, where Aurora often overlaps or splits from Splash and GS/E 2.
- These lack of clear and incorrect separation in low-dimensional projections, reinforcing the idea that more discriminative features are needed.

### **Astrophysical Independence**
- Shown in Belokurov et al. (2020) in “From dawn till disc”:
    - Aurora stars formed before the Milky Way spun up into a coherent disc, representing an early, chaotic epoch of galaxy assembly.
	- Splash stars, by contrast, formed from a disrupted disc during the GS/E merger, retaining high rotational coherence.
- From a kinematic perspective, the key difference is in the azimuthal velocity ($V_\phi$):
	- Aurora stars exhibit broad, low-spin $V_\phi$ distributions (i.e., minimal net rotation).
	- Splash stars retain higher, more coherent $V_\phi$ values consistent with their disc origin ($V_\phi$ $\approx$ 150 km/s)

### **Quantative Investiagation**
- The above disucssion has been purely visual in nature, we can show this quantiatively by investiagating the probabilstic cluster assignment. 
- Ie taking the Aurora population and identifying the next most likley assignment and how similiar the probability assignment is



### **Conclussion of this Investigation**
- We see as we expected that the second best fits are Splash than GS/E
- Although attempts are made comparing them numerical ie difference in percentage and absolute assignment probability
- There is a lot of caveats to this - ie gaussian probabilitys in higher dimensions are orders of magnitude smaller 
- They have fitted a different number of components and thus the fractional split is difference
- So we dont focus on these massively

In [2]:
## Import the APOGEE results and assignment probababilities
apogee_data_path = 'XD_Results/Apogee_postGMM/apogee_GMM_scaled_Gauss7_results.fits'

# Load the result from the Apogee High Dimensional XD
apogee_results = Table.read(apogee_data_path)

# We match the result's prob_gauss_1 and assignment to true labels
apogee_labels_name = {
    1: "GS/E 1",
    6: "GS/E 2",
    3: "Splash",
    7: "Aurora",
    2: "Eos",
    5: "Back 1",
    4: "Back 2"
}

In [3]:
## Import the GALAH results and assignment probababilities
galah_data_path = 'XD_Results/Galah_postGMM/Galah_GMM_scaled_Gauss5_results.fits'

# Load the result from the GALAH High Dimensional XD
galah_results= Table.read(galah_data_path)

# We match the result's prob_gauss_1 and assignment to true labels
galah_labels_name = {
    4: "GS/E",
    2: "Splash",
    1: "Aurora",
    5: "Eos",
    3: "Background",
}

In [4]:
compare_assignments(apogee_results, "Aurora", apogee_labels_name)


Detailed second-best breakdown for stars primarily assigned to 'Aurora':
Total stars: 95

| Second-Best Component   |   # Stars |   Mean % of 1st |   Std % of 1st |   Median % of 1st |   Mean Abs Diff |   Median Abs Diff |   Std Abs Diff |
|-------------------------|-----------|-----------------|----------------|-------------------|-----------------|-------------------|----------------|
| Splash                  |        49 |            9.41 |          15.26 |              1.26 |        0.001368 |          0.000686 |       0.001511 |
| GS/E 2                  |        22 |           16.16 |          19.36 |              6.23 |        0.000722 |          0.000512 |       0.000783 |
| GS/E 1                  |        13 |            6.25 |           9    |              2.45 |        0.001254 |          0.001023 |       0.000867 |
| Back 1                  |         8 |           10.05 |          19.07 |              0.57 |        0.000203 |          5.1e-05  |       0.000271 |
| Eos    

In [5]:
compare_assignments(galah_results, "Aurora", galah_labels_name)


Detailed second-best breakdown for stars primarily assigned to 'Aurora':
Total stars: 141

| Second-Best Component   |   # Stars |   Mean % of 1st |   Std % of 1st |   Median % of 1st |   Mean Abs Diff |   Median Abs Diff |   Std Abs Diff |
|-------------------------|-----------|-----------------|----------------|-------------------|-----------------|-------------------|----------------|
| Splash                  |        47 |           11.02 |          20.84 |              1.94 |         1.8e-05 |             7e-06 |        3.2e-05 |
| GS/E                    |        45 |           15.16 |          25.64 |              1.7  |         3e-05   |             4e-06 |        7.1e-05 |
| Background              |        45 |            8.53 |          14.57 |              1.61 |         4e-06   |             0     |        1.1e-05 |
| Eos                     |         4 |           24.03 |          29.33 |             10.62 |         7e-06   |             6e-06 |        6e-06   |
