# III. Filter data using Cordoni's algorithm

> Francisco Carrasco Varela - Pontificia Universidad Católica de Chile (PUC) - ffcarrasco@uc.cl ⭐

<center>
<mark>The following Jupyter Notebook is used to extract and work with Gaia DR3 data<br>
    (and other data releases) </mark>
</center>

In [1]:
# Import all the libraries we will need

%matplotlib inline
from dataclasses import dataclass, field
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib
import numpy as np
import os
from astropy.io import ascii
from astropy.table import vstack, Table
from tabulate import tabulate

import sys
sys.path.insert(0, '../Scripts/')
import Parameters as PR

# 1 -. Creating Bins and extracting parameters

The following step on this Notebook will follow the procedure given by [Cordoni et. al (2018)](https://ui.adsabs.harvard.edu/abs/2018ApJ...869..139C/abstract) and [Cordoni et al. (2020)](https://ui.adsabs.harvard.edu/abs/2020ApJ...889...18C/abstract). Even if it has been applied for Gaia DR2, this method avoid fake members using multiple filters and there is no current reason why this method could not be applied to Gaia DR3 as well.

## 1.1 -. Reading Previous Data

First of all, read the data from previous steps. If you do not have done previous steps, please follow the previous instructions before proceeding with this step; since some files generated by previous steps are needed.

In [2]:
# Get data from Vasiliev (2019) file

###################################################
object_name = "NGC104"  # <--- OBJECT NAME FROM PREVIOUS STEPS, EDITABLE
vasiliev_file = "../ObservedData/Vasiliev_2019_Gaia_parameters.dat"
###################################################


data_list = PR.get_GC_params(vasiliev_file)
obj, success = PR.get_selected_GC(object_name, data_list)

if success:
    n_times = 60
    print("Object detected succesfully!")
    print("-"*n_times)
    print(f"Object name: {obj.name}")
    print(f"Mean Proper Motion RA (mas/yr): {obj.pm_RA} +- {obj.err_pm_RA}")
    print(f"Mean Proper Motion DEC (mas/yr): {obj.pm_DEC} +- {obj.err_pm_DEC}")
    print("-"*n_times)
    
def check_if_file_exists(filename_path: str) -> None:
    """
    Checks if a file with filtered data that should have been created in the previous step
    of this Notebook is created. If it is not created it will exit the program.
    You should run the previous step of this Notebook then.
    """
    isExist = os.path.exists(filename_path)
    if not isExist:
        print(f"Warning! {filename_path} file not found.")
        print("You must fully run the previous step in this Notebook and create a file with ", end='')
        print("filtered data before running this cell.")
        sys.exit("Create filtered file in previous Notebook step and retry.")
    return

dir_path = f"../Objects/{obj.name.upper()}/"
save_filename = f"2_{obj.name.upper()}_f_data.dat"

filename_filtered_path = f"{dir_path}{save_filename}"

# Check if the filtered file created in the previous step of thi Notebook exists
check_if_file_exists(filename_filtered_path)

gaia_data = Table.read(filename_filtered_path, format='ascii.ecsv') # get data from previous Notebook step
print('Data read sucessfully')

Object detected succesfully!
------------------------------------------------------------
Object name: NGC104
Mean Proper Motion RA (mas/yr): 5.237 +- 0.039
Mean Proper Motion DEC (mas/yr): -2.524 +- 0.039
------------------------------------------------------------
Data read sucessfully


## 1.2 -. Create Bins
First of all, [Cordoni et. al (2018)](https://ui.adsabs.harvard.edu/abs/2018ApJ...869..139C/abstract) (hereafter [C18](https://ui.adsabs.harvard.edu/abs/2018ApJ...869..139C/abstract)) say that they use $G_\text{RP}$ parameter to divide the data into sub-sections. They use values between $11.0$ and $18.5$ for $G_\text{RP}$, where every sub-section has a "size" of $0.5$ magnitudes. For this Notebook purposes, **every sub-section will be called a bin**. Hence, [C18](https://ui.adsabs.harvard.edu/abs/2018ApJ...869..139C/abstract) uses a total of 15 bins. 

However, to do this more "customizable" for the user who wants to use this notebook, I decided to make 2 littles changes:
- You can select the minimum and maximum magnitudes in $G_\text{RP}$
- You can select the number of bins that will be created between these two limits.

So, for instance, if you want to exactly replicate what has been done by [C18](https://ui.adsabs.harvard.edu/abs/2018ApJ...869..139C/abstract), you can set the min. for $G_\text{RP}$ as $11.0$, the max. as $18.5$ and set the number of bins to 15 (see the cell below, or search for `EDITABLE` comment).

Nevertheless, it is first recommendable to "let the data speak". So before setting custom upper/lower limits, you must check the range of magnitudes your data is. Based on my experience, if magnitudes are fainter than $19.0$ data starts to be less reliable; since its errors start to grown. For that reason, if a data is fainter than $19.0$ in $G_\text{RP}$, we "cut it" into $19.0$. You can modify this "lower limit cut", but I do not recommend it. A lower limit (for stars/objects that are too bright) set is usually around $ G_\text{RP} \sim 11$ mags, as has been pointed out by [C18](https://ui.adsabs.harvard.edu/abs/2018ApJ...869..139C/abstract). So, in summary, if a value is outside the range [11, 19] it will be automatically "cut". If you do not like this you can manually modify `checkMinAndMaxValues` function in cell below; however, and again, I do not recommend it.

For this method, we are interested in the following 4 parameters **for every bin**:
- $G_\text{RP}$: Gaia RP magnitude mean and its standard deviation (in mags)
- $\mu_\text{R}$: Mean proper montion for the bin.
- $\texttt{as_gof_al}$: "indicative of the goodness-of-fit statistics of the astrometric solution for the source in the along-scan direction". Its mean value and its standard deviation
- $\pi$: Parallax measured by Gaia. Its mean value and its standard deviation (mili-arcseconds)

Strictly speaking

In [9]:
##############################################################################################
nDiv = 20 # EDITABLE <---- Number of bins
setLimits = False # <---- Do you want to set your own upper/lower limits for G_RP magnitude?
G_RP_upper_limit = 19.5 # If above is True, put your custom upper limit value here (in mags)
G_RP_lower_limit = 11.0 # Your custom lower limit value here (in mags)
##############################################################################################

def checkMinAndMaxValues(minValue: float, maxValue: float, useCustomLimits: bool,
                        ) -> (float, float):
    """
    As suggested by Cordoni et al. (2018), values for RP Gaia magnitudes
    should have an upper and lower limit. If the min/max value is
    lower/bigger than these values, then simply "cut" min/max values
    in those limits.
    """
    if useCustomLimits: # if the user wants to use custom values for upper and lower limits...
        Cordoni_lower_mag = G_RP_lower_limit
        Cordini_upper_mag = G_RP_upper_limit
        
    if not useCustomLimits: # if not, check if they surpass acceptables values. If they do, replace them...
         Cordoni_lower_mag = 10.5
         Cordoni_upper_mag = 19.0
    
    if maxValue > Cordoni_lower_mag:
        maxValue = Cordoni_upper_mag
    if minValue < Cordoni_lower_mag:
        minValue = Cordoni_lower_mag
    
    return minValue, maxValue
                

def getBinSize(values: list[float], 
               numberOfDivisions: int
              ) -> (float, float, float):
    """
    Obtains the maximum and minimum value of a list and returns the difference value
    between each number that results dividing the max - min divided into N parts,
    i.e., bin size. 
    Returns the minimum, maximum and size of each bin after dividing this interval
    into N parts.
    """
    assert (numberOfDivisions != 0), "You cannot divide by zero"
    assert (numberOfDivisions != 1),  "Dividing by 1 division is nonsense"
    
    maxValue = np.amax(values)
    minValue = np.amin(values)
    
    minValue, maxValue = checkMinAndMaxValues(minValue, maxValue, setLimits)

    return maxValue, minValue, (maxValue - minValue)/ (1.0*numberOfDivisions)

@dataclass
class parameterList:
    G_BP: list[float] = field(default_factory=list)
    G_RP: list[float] = field(default_factory=list)
    as_gof_al: list[float] = field(default_factory=list)
    parallax: list[float] = field(default_factory=list)
        
    

@dataclass(kw_only=True)
class Bin:
    ID: int = 0
    params: parameterList = field(default_factory=list)
    minVal_G_RP: float 
    maxVal_G_RP: float
        
    def __post_init__(self):
        self.mean_G_RP = np.mean(self.params.G_RP)
        self.mean_G_BP = np.mean(self.params.G_BP)
        self.mean_as_gof_al = np.mean(self.params.as_gof_al)
        self.mean_parallax = np.mean(self.params.parallax)
        self.std_dev_G_RP = np.std(self.params.G_RP, ddof=1)
        self.std_dev_G_BP = np.std(self.params.G_BP, ddof=1)
        self.std_dev_as_gof_al = np.std(self.params.as_gof_al, ddof=1)
        self.std_dev_parallax = np.std(self.params.parallax, ddof=1)


@dataclass(kw_only=True)
class TotalBins:
    bins: list[Bin] = field(default_factory=list)
 

def printValuesBins(maxValue: float, 
                    minValue: float,
                    nBins: int,
                    binValue: float,
                    totalBins: TotalBins
                   ) -> None:
    """
    A simple print statement to check values obtained from
    getBinSize function.
    """
    print("\n\n")
    len_marker = 90
    print(len_marker*"=")
    text = "Estimated values are: "
    for j in range(1, 5):
        if j == 1:
            print(f"{text}{j}) Max Value G_RP (mag): {maxValue:.3f} # Maximum value for G_RP magnitude")
        
        if j != 1:
            print(len(text)*" " + f"{j}) ", end='')
            
            if j == 2:
                print(f"Min Value G_RP (mag): {minValue:.3f} # Minimum value for G_RP magnitude")
            
            if j == 3:
                print(f"Number of req. Bins: {nBins} # Number of requested Bins")
            
            if j == 4:
                print(f"Bin Range G_RP (mag): {binValue:.3f} # Value of size/range for every bin")
    print(len_marker*"=", end="\n\n")
    
    data_list = []
    
    for data in totalBins.bins:
        temp_list = []
        temp_list.append(data.ID)
        temp_list.append(len(data.params.G_RP))
        temp_list.append(f"{data.mean_G_RP:.2f} ± {data.std_dev_G_RP:.2f}")
        temp_list.append(f"{data.mean_G_BP:.2f} ± {data.std_dev_G_BP:.2f}")
        temp_list.append(f"{data.mean_as_gof_al:.2f} ± {data.std_dev_as_gof_al:.2f}")
        temp_list.append(f"{data.mean_parallax:.2f} ± {data.std_dev_parallax:.2f}")
        data_list.append(temp_list)
    print(tabulate(data_list, headers=["Bin ID", "N elems", "Mean RP (mag)", "Mean BP (mag)",\
                                      "as_gof_al", "Parallax (mas)"], tablefmt='grid'))
        
        
    return

##########################################################


G_RP_gaia_data = gaia_data['phot_rp_mean_mag']    
nDiv = 20

maxVal, minVal, binVal = getBinSize(G_RP_gaia_data, nDiv)


totBins = TotalBins()

print("Creating Bins...", end="\n\n")

# Add data for every bin
for j in range(0, nDiv):
    minMag_G_RP_bin = minVal+(binVal * j)
    maxMag_G_RP_bin = minVal + (binVal *(j+1))
    print(f'Bin number -> {j+1}: [{minMag_G_RP_bin:.3f}, {maxMag_G_RP_bin:.3f}]')
    tempParamater = parameterList()
    for data in gaia_data:
        # Main condition, given by Cordoni et al. (2018)
        mainCondition = minMag_G_RP_bin < data['phot_rp_mean_mag'] <= maxMag_G_RP_bin
        # Second simple condition to avoid masked numpy null values
        secondCondition = -25 < data['phot_bp_mean_mag'] <= 25
        if mainCondition and secondCondition:
            tempParamater.G_BP.append(data['phot_bp_mean_mag'])
            tempParamater.G_RP.append(data['phot_rp_mean_mag'])
            tempParamater.as_gof_al.append(data['astrometric_gof_al'])
            tempParamater.parallax.append(data['parallax'])
    newBin = Bin(ID=j+1, params=tempParamater ,minVal_G_RP=minMag_G_RP_bin, maxVal_G_RP=maxMag_G_RP_bin)
    totBins.bins.append(newBin)
 
printValuesBins(maxVal, minVal, nDiv, binVal, totBins)

Creating Bins...

Bin number -> 1: [10.527, 10.951]
Bin number -> 2: [10.951, 11.374]
Bin number -> 3: [11.374, 11.798]
Bin number -> 4: [11.798, 12.222]
Bin number -> 5: [12.222, 12.645]
Bin number -> 6: [12.645, 13.069]
Bin number -> 7: [13.069, 13.493]
Bin number -> 8: [13.493, 13.916]
Bin number -> 9: [13.916, 14.340]
Bin number -> 10: [14.340, 14.764]
Bin number -> 11: [14.764, 15.187]
Bin number -> 12: [15.187, 15.611]
Bin number -> 13: [15.611, 16.034]
Bin number -> 14: [16.034, 16.458]
Bin number -> 15: [16.458, 16.882]
Bin number -> 16: [16.882, 17.305]
Bin number -> 17: [17.305, 17.729]
Bin number -> 18: [17.729, 18.153]
Bin number -> 19: [18.153, 18.576]
Bin number -> 20: [18.576, 19.000]



Estimated values are: 1) Max Value G_RP (mag): 19.000 # Maximum value for G_RP magnitude
                      2) Min Value G_RP (mag): 10.527 # Minimum value for G_RP magnitude
                      3) Number of req. Bins: 20 # Number of requested Bins
                      4) Bin Range

# 2 -. Interpolate and extrapolate data

In [None]:
for tempBin in totBins.bins:
    print(f"Bin numero {tempBin.ID}", end="\n\n")
    print(sum(tempBin.params.G_BP)/len(tempBin.params.G_BP))
    if tempBin.ID == 7:
        for x in tempBin.params.G_BP:
            print(type(x), x)
    if tempBin.ID == 8:
        break

In [None]:
gaia_mini.colnames

In [None]:
G_RP = gaia_data['phot_rp_mean_mag']
G_BP = gaia_data['phot_bp_mean_mag']
print(G_BP - G_RP)
plt.gca().invert_yaxis()
plt.scatter(G_BP-G_RP , G_BP, s = 0.5)
plt.show()

In [None]:
hola = [1,2,3,45,5]
hola_np = np.asarray(hola)
mean_val = np.mean(hola_np)
print(mean_val, type(mean_val))

In [None]:
(18.5 - 11.0)/0.5