# Preproccessing 
- This notebook preforms the following tasks
- Extinction correction (de-reddening)
- Red Giant Branch Filtering 

In [1]:
from Analysis import reddening_correction
from Analysis import rgb_filter
from Analysis import add_galpy_orbital_parameters

## Extinction Correction

- Applies Galactic extinction corrections to Gaia DR3 photometry using the Schlegel, Finkbeiner & Davis (1998) (SFD) dust map and extinction coefficients from Casagrande et al. (2021).

In [2]:
raw_data_path_glob_clust = 'data/Allsky_Gaia_45599440.fits'
raw_data_path_stream = 'data/Allsky_Gaia_8910601_rv.fits'

In [5]:
reddening_correction(raw_data_path_glob_clust, dustmaps_dir='dustmaps/')

2025-03-17 22:25:16,570 - INFO - Converting to a Pandas Dataframe...
2025-03-17 22:29:57,987 - INFO - Applying extinction correction...
2025-03-17 22:30:10,277 - INFO - Converting back to FITS format...
2025-03-17 22:31:06,200 - INFO - Saving to new file...
2025-03-17 22:45:37,190 - INFO - Extinction-corrected FITS file saved as: data/Allsky_Gaia_45599440_extinction_corrected.fits


In [3]:
reddening_correction(raw_data_path_stream, dustmaps_dir='dustmaps/')

2025-03-17 21:57:47,722 - INFO - Converting to a Pandas Dataframe...
2025-03-17 21:57:50,601 - INFO - Applying extinction correction...
2025-03-17 21:57:53,716 - INFO - Converting back to FITS format...
2025-03-17 21:57:55,397 - INFO - Saving to new file...
2025-03-17 21:58:02,299 - INFO - Extinction-corrected FITS file saved as: data/Allsky_Gaia_8910601_rv_extinction_corrected.fits


## Red Giant Branch Selection

Filtering is applied to increase the fraction of red giant stars, enhancing the halo population as:
- **Bright tracers** of old stellar populations, needed in the study of **globular clusters** and **Milky Way substructures**.
- **Observable at large distances**, thus able to detect faint halo features.
- **Key indicators** of tidal streams and accreted structures, helping reconstruct the Milky Way’s formation history.

### **Filters Applied**
The filters are justified within notebooks 1-3, and are as follows:

#### **In Gaia Query**

| Parameter                 | Condition Applied |
|---------------------------|------------------|
| **Photometric Magnitude (G-band)** | `10 ≤ G ≤ 20.5` (Filter for brighter - red giant stars)|
| **Parallax** | `-0.3 ≤ parallax ≤ 0.3` (Selecting distant stars, minimizing foreground contamination) |
| **RUWE (Renormalized Unit Weight Error)** | `ruwe < 1.4` (Ensuring good astrometric solutions) |
| **Proper Motion Constraint** | `(pmra² + pmdec²) < 144` (Selecting stars with relatively small proper motion, likely halo members) |
| **Photogeometric Distance** | `r_med_photogeo IS NOT NULL` (Ensuring a valid distance estimate from Bailer-Jones) |
| **Random Index Range** | `500000000 ≤ random_index ≤ 685000000` (Random subset selection for managable data size) |

#### **Additional Cuts (RGB Filter)**
| Parameter                | Proposed Value |
|--------------------------|---------------|
| **BP-RP Color Cut (Lower)** | `BP-RP ≥ 0.5` (Selecting redder stars, excluding very blue main-sequence stars) |
| **BP-RP Color Cut (Upper)** | `BP-RP ≤ 2` (Excluding low-tempreture early stage stars, red dwarfs ) |
| **G Magnitude Limit** | `G ≤ 19` (Removig dim stars, main sequence and non red giants) |
| **Absolute Magnitude Cut** | `M_G ≤ 5.2` (Selecting evolved stars, avoiding faint dwarfs) |

This filtering strategy refines the sample to increase the likelihood of selecting **red giant stars** and identifying key **halo structures**.



In [7]:
extinction_corrected_data_path_glob_clust = 'data/Allsky_Gaia_45599440_extinction_corrected.fits'
extinction_corrected_data_path_stream = 'data/Allsky_Gaia_8910601_rv_extinction_corrected.fits'

In [8]:
rgb_filter(extinction_corrected_data_path_glob_clust, min_bp_rp=0.5, max_bp_rp=2, max_app_mag=19, max_abs_mag=5.2)

2025-03-17 22:46:41,818 - INFO - Loaded 45599440 from FITS File ...
2025-03-17 22:46:41,819 - INFO - Converting to a Pandas Dataframe...
2025-03-17 22:47:12,096 - INFO - Applying RGB filter...
2025-03-17 22:47:15,029 - INFO - 
Total stars before filtering: 45599440
Stars passing BP-RP color filter: 38350020 (84.10%)
Stars passing apparent magnitude filter: 37476340 (82.19%)
Stars passing absolute magnitude filter: 35746085 (78.39%)
Stars passing all filters: 26150553 (57.35%)
Halo RGB filtered FITS file saved as: data/Allsky_Gaia_45599440_extinction_corrected_filtered.fits
2025-03-17 22:47:15,029 - INFO - Saving filtered data to new FITS file...
2025-03-17 22:48:20,772 - INFO - Halo RGB filtered FITS file saved as: data/Allsky_Gaia_45599440_extinction_corrected_filtered.fits


In [6]:
rgb_filter(extinction_corrected_data_path_stream, min_bp_rp=0.5, max_bp_rp=2, max_app_mag=19, max_abs_mag=5.2)

2025-03-17 21:58:20,189 - INFO - Loaded 8910601 from FITS File ...
2025-03-17 21:58:20,190 - INFO - Converting to a Pandas Dataframe...
2025-03-17 21:58:22,812 - INFO - Applying RGB filter...
2025-03-17 21:58:23,609 - INFO - 
Total stars before filtering: 8910601
Stars passing BP-RP color filter: 7438858 (83.48%)
Stars passing apparent magnitude filter: 8910601 (100.00%)
Stars passing absolute magnitude filter: 8910600 (100.00%)
Stars passing all filters: 7438858 (83.48%)
Halo RGB filtered FITS file saved as: data/Allsky_Gaia_8910601_rv_extinction_corrected_filtered.fits
2025-03-17 21:58:23,609 - INFO - Saving filtered data to new FITS file...
2025-03-17 21:58:30,256 - INFO - Halo RGB filtered FITS file saved as: data/Allsky_Gaia_8910601_rv_extinction_corrected_filtered.fits


## Add Orbital Parameters for Tidal Stream Information

This adds **Galpy-derived orbital parameters** to a dataset of Gaia stars. 

### **Functionality**
3. **Computes orbital parameters** using `galpy` and `SkyCoord`:
   - **Energy (`E`)** – The total energy of the star in the Milky Way potential.
   - **Z-component of Angular Momentum (`Lz`)** – Measures rotation about the Galactic center.
   - **Galactocentric Radius (`R_gal`)** – The current radial position of the star.
   - **Vertical Action (`Jz`)** – Describes oscillations above and below the Galactic plane.

---

### **Reasoning**
- **Tidal streams share (`E`, `Lz`, `Jz`)** – Stars from the same stream have nearly identical orbital properties.  
- **Energy (`E`) is conserved** – Helps separate streams from field stars.  
- **Angular momentum (`Lz`, `Jz`) clusters streams** – Groups stars with common origins.  
- **Galactocentric radius (`R_gal`) refines selection** – Ensures clustering aligns with stream positions.  


In [2]:
rgb_data_path_stream = 'data/Allsky_Gaia_8910601_rv_extinction_corrected_filtered.fits'

In [3]:
add_galpy_orbital_parameters(rgb_data_path_stream)

Processing stars:   0%|          | 71/7438858 [00:00<8:41:38, 237.67star/s] 



Processing stars: 100%|██████████| 7438858/7438858 [8:23:12<00:00, 246.38star/s]  


Galpy orbital parameters added. Updated FITS file saved as: data/Allsky_Gaia_8910601_rv_extinction_corrected_filtered_galpy.fits
