# Preproccessing 
- This notebook preforms the following tasks
- Extinction correction (de-reddening)
- Red Giant Branch Filtering 
- Galpy Orbit parameter determination

In [None]:
# Allow imports from parent directory 
import os, sys
if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")
    sys.path.append(os.path.abspath(".")) 
    
from Analysis import reddening_correction
from Analysis import rgb_filter
from Analysis import add_galpy_orbital_parameters

## Extinction Correction

- Applies Galactic extinction corrections to Gaia DR3 photometry using the Schlegel, Finkbeiner & Davis (1998) (SFD) dust map and extinction coefficients from Casagrande et al. (2021).

In [1]:
raw_data_path_glob_clust = 'data/Allsky_Gaia_45599440.fits'
raw_data_path_glob_clust_2 = 'data/Allsky_Gaia_42481846.fits'
raw_data_path_stream = 'data/Allsky_Gaia_394217_rv.fits'


In [8]:
?reddening_correction

[0;31mSignature:[0m [0mreddening_correction[0m[0;34m([0m[0mgaia_data_or_path[0m[0;34m,[0m [0mdustmaps_dir[0m[0;34m=[0m[0;34m'dustmaps/'[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Applies Galactic extinction corrections to Gaia DR3 photometry using the 
Schlegel, Finkbeiner & Davis (1998) (SFD) dust map and extinction coefficients 
from Casagrande et al. (2021).

The function adds new extinction-corrected columns to the dataset:
    - `dered_G`: Extinction-corrected G-band magnitude.
    - `dered_BP`: Extinction-corrected BP-band magnitude.
    - `dered_RP`: Extinction-corrected RP-band magnitude.
    - `dered_BP_RP`: Extinction-corrected BP-RP color index.
    - `M_G`: Absolute magnitude in the G-band, calculated using the extinction-corrected G-band magnitude
             and the Bailer-Jones median photogeometric distance (`r_med_photogeo`).

Parameters:
    gaia_data_or_path (str or pd.DataFrame): Either a DataFrame containing Gaia photometric 
  

### Higher Proper Motion Cut (<12)

In [5]:
reddening_correction(raw_data_path_glob_clust, dustmaps_dir='dustmaps/')

2025-03-17 22:25:16,570 - INFO - Converting to a Pandas Dataframe...
2025-03-17 22:29:57,987 - INFO - Applying extinction correction...
2025-03-17 22:30:10,277 - INFO - Converting back to FITS format...
2025-03-17 22:31:06,200 - INFO - Saving to new file...
2025-03-17 22:45:37,190 - INFO - Extinction-corrected FITS file saved as: data/Allsky_Gaia_45599440_extinction_corrected.fits


### Lower Proper Motion Cut (<4)

In [3]:
reddening_correction(raw_data_path_glob_clust_2, dustmaps_dir='dustmaps/')

2025-03-18 17:15:24,630 - INFO - Converting to a Pandas Dataframe...
2025-03-18 17:15:43,025 - INFO - Applying extinction correction...
2025-03-18 17:15:54,539 - INFO - Converting back to FITS format...
2025-03-18 17:16:24,162 - INFO - Saving to new file...
2025-03-18 17:17:40,680 - INFO - Extinction-corrected FITS file saved as: data/Allsky_Gaia_42481846_extinction_corrected.fits


### Lower Proper Motion Cut And Contains Radial Velocity 
- Used for stellar stream analysis as can calculate orbital elements

In [8]:
reddening_correction(raw_data_path_stream, dustmaps_dir='dustmaps/')

2025-03-18 13:25:06,104 - INFO - Converting to a Pandas Dataframe...
2025-03-18 13:25:06,278 - INFO - Applying extinction correction...
2025-03-18 13:25:06,833 - INFO - Converting back to FITS format...
2025-03-18 13:25:06,903 - INFO - Saving to new file...
2025-03-18 13:25:07,170 - INFO - Extinction-corrected FITS file saved as: data/Allsky_Gaia_394217_rv_extinction_corrected.fits


## Red Giant Branch Selection

Filtering is applied to increase the fraction of red giant stars, enhancing the halo population as:
- **Bright tracers** of old stellar populations, needed in the study of **globular clusters** and **Milky Way substructures**.
- **Observable at large distances**, thus able to detect faint halo features.
- **Key indicators** of tidal streams and accreted structures, helping reconstruct the Milky Way’s formation history.

### **Filters Applied**
The filters are justified within notebooks 1-3, and are as follows:

#### **In Gaia Query**

| Parameter                 | Condition Applied |
|---------------------------|------------------|
| **Photometric Magnitude (G-band)** | `10 ≤ G ≤ 20.5` (Filter for brighter - red giant stars)|
| **Parallax** | `-0.1 ≤ parallax ≤ 0.1` (Selecting distant stars, minimizing foreground contamination) |
| **RUWE (Renormalized Unit Weight Error)** | `ruwe < 1.4` (Ensuring good astrometric solutions) |
| **Proper Motion Constraint** | `(pmra² + pmdec²) < 144 or 16` (Selecting stars with relatively small proper motion, likely halo members) |
| **Photogeometric Distance** | `r_med_photogeo IS NOT NULL` (Ensuring a valid distance estimate from Bailer-Jones) |
| **Random Index Range** | `0 ≤ random_index ≤ 700000000` (Random subset selection for managable data size) |

#### **Additional Cuts (RGB Filter)**
| Parameter                | Proposed Value |
|--------------------------|---------------|
| **BP-RP Color Cut (Lower)** | `BP-RP ≥ 0.8` (Selecting redder stars, excluding very blue main-sequence stars) |
| **G Magnitude Limit** | `G ≤ 18` (Removig dim stars, main sequence and non red giants) |
| **Absolute Magnitude Cut** | `M_G ≤ 5` (Selecting evolved stars, avoiding faint dwarfs) |
| **Galactic Latitude, b** | `Abs(b) > 10` (Removing halo saturating population)| 

This filtering strategy refines the sample to increase the likelihood of selecting **red giant stars** and identifying key **halo structures**.



In [2]:
extinction_corrected_data_path_glob_clust = 'data/Allsky_Gaia_45599440_extinction_corrected.fits'
extinction_corrected_data_path_glob_clust_2 = 'data/Allsky_Gaia_42481846_extinction_corrected.fits'
extinction_corrected_data_path_stream = 'data/Allsky_Gaia_394217_rv_extinction_corrected.fits'

## Higher PM Data

In [5]:
rgb_filter(extinction_corrected_data_path_glob_clust, min_bp_rp=0.8, max_app_mag=18)

2025-03-30 17:30:34,648 - INFO - Loaded 45599440 from FITS File ...
2025-03-30 17:30:34,648 - INFO - Converting to a Pandas Dataframe...


2025-03-30 17:38:01,518 - INFO - Applying RGB filter...
2025-03-30 17:38:02,494 - INFO - 
Total stars before filtering: 45599440
Stars passing Galactic latitude cut (|b| > 10°): 13920769 (30.53%)Stars passing BP-RP color filter: 27868223 (61.12%)
Stars passing apparent magnitude filter: 25708640 (56.38%)
Stars passing absolute magnitude filter: 33741017 (73.99%)
Stars passing all filters: 2452277 (5.38%)
Halo RGB filtered FITS file saved as: data/Allsky_Gaia_45599440_extinction_corrected_filtered.fits
2025-03-30 17:38:02,494 - INFO - Saving filtered data to new FITS file...
2025-03-30 17:38:04,887 - INFO - Halo RGB filtered FITS file saved as: data/Allsky_Gaia_45599440_extinction_corrected_filtered.fits


## Lower PM Data

In [4]:
rgb_filter(extinction_corrected_data_path_glob_clust_2, min_bp_rp=0.8, max_app_mag=18)

2025-03-30 17:22:19,302 - INFO - Loaded 42481846 from FITS File ...
2025-03-30 17:22:19,303 - INFO - Converting to a Pandas Dataframe...


2025-03-30 17:28:00,483 - INFO - Applying RGB filter...
2025-03-30 17:28:01,493 - INFO - 
Total stars before filtering: 42481846
Stars passing Galactic latitude cut (|b| > 10°): 16227516 (38.20%)Stars passing BP-RP color filter: 23593053 (55.54%)
Stars passing apparent magnitude filter: 23239142 (54.70%)
Stars passing absolute magnitude filter: 29320489 (69.02%)
Stars passing all filters: 3105304 (7.31%)
Halo RGB filtered FITS file saved as: data/Allsky_Gaia_42481846_extinction_corrected_filtered.fits
2025-03-30 17:28:01,493 - INFO - Saving filtered data to new FITS file...
2025-03-30 17:28:04,894 - INFO - Halo RGB filtered FITS file saved as: data/Allsky_Gaia_42481846_extinction_corrected_filtered.fits


## Radial Velocity Data

In [None]:
rgb_filter(extinction_corrected_data_path_stream, min_bp_rp=0.8, max_app_mag=18)

2025-03-18 13:25:46,007 - INFO - Loaded 394217 from FITS File ...
2025-03-18 13:25:46,007 - INFO - Converting to a Pandas Dataframe...
2025-03-18 13:25:46,268 - INFO - Applying RGB filter...
2025-03-18 13:25:46,361 - INFO - 
Total stars before filtering: 394217
Stars passing BP-RP color filter: 301642 (76.52%)
Stars passing apparent magnitude filter: 394217 (100.00%)
Stars passing absolute magnitude filter: 394216 (100.00%)
Stars passing all filters: 301642 (76.52%)
Halo RGB filtered FITS file saved as: data/Allsky_Gaia_394217_rv_extinction_corrected_filtered.fits
2025-03-18 13:25:46,362 - INFO - Saving filtered data to new FITS file...
2025-03-18 13:25:46,668 - INFO - Halo RGB filtered FITS file saved as: data/Allsky_Gaia_394217_rv_extinction_corrected_filtered.fits


## Add Orbital Parameters for Tidal Stream Information

This adds **Galpy-derived orbital parameters** to a dataset of Gaia stars. 

### **Functionality**
3. **Computes orbital parameters** using `galpy` and `SkyCoord`:
   - **Energy (`E`)** – The total energy of the star in the Milky Way potential.
   - **Z-component of Angular Momentum (`Lz`)** – Measures rotation about the Galactic center.
   - **Galactocentric Radius (`R_gal`)** – The current radial position of the star.
   - **Vertical Action (`Jz`)** – Describes oscillations above and below the Galactic plane.

---

### **Reasoning**
- **Tidal streams share (`E`, `Lz`, `Jz`)** – Stars from the same stream have nearly identical orbital properties.  
- **Energy (`E`) is conserved** – Helps separate streams from field stars.  
- **Angular momentum (`Lz`, `Jz`) clusters streams** – Groups stars with common origins.  
- **Galactocentric radius (`R_gal`) refines selection** – Ensures clustering aligns with stream positions.  


In [11]:
rgb_data_path_stream = 'data/Allsky_Gaia_394217_rv_extinction_corrected_filtered.fits'

In [6]:
add_galpy_orbital_parameters?

[0;31mSignature:[0m [0madd_galpy_orbital_parameters[0m[0;34m([0m[0mgaia_data_or_path[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Computes and adds orbital parameters using galpy for Gaia stars using SkyCoord.
Accepts either a FITS file or a Pandas DataFrame.

Parameters:
    gaia_data_or_path (str or pd.DataFrame): Either a DataFrame containing Gaia data or a file path to a FITS file.

Returns:
    None or pd.DataFrame:
        - If a FITS file is provided, saves the updated data to a new FITS file with `_galpy.fits` suffix.
        - If a DataFrame is provided, returns the modified DataFrame.

Raises:
    ValueError: If required columns are missing from the input data.
[0;31mFile:[0m      ~/Desktop/MPhil_DIS/Gal_Arc/Coursework_GA/Analysis/GA_analysis.py
[0;31mType:[0m      function

In [12]:
add_galpy_orbital_parameters(rgb_data_path_stream)

Processing stars:   0%|          | 27/301642 [00:00<37:38, 133.54star/s]



Processing stars: 100%|██████████| 301642/301642 [22:05<00:00, 227.56star/s]


Galpy orbital parameters added. Updated FITS file saved as: data/Allsky_Gaia_394217_rv_extinction_corrected_filtered_galpy.fits
