### Info
We are able to use two catalogues:
- `LSBS_no_par_sel` contains a number of objects, including many of the objects used in SpaceFluff.
- `FDSDWARF_LSB` contains objects expected to be LSB dwarfs or UDGs in the Fornax cluster, on the basis of their physical properties.

### Goal
The goal of this notebook is to extract the physical properties (color, surface brightness, effective radius, etc.) of all the objects present in the catalogue, so we can attach them to our SpaceFluff data for analysis. We also want to extract (the names of) the objects thought to be LSB/UDG Fornax cluster members, as a supposed form of 'ground truth' against which to compare the voting behavior of SpaceFluff volunteers.

### Findings

In this notebook, we'll come to find out that:
- the object names (e.g. 'UDGcand_102') from the catalogue(s) match those used in SpaceFluff, so we can use the objects' names as handles to compare. Alternatively, coordinates (RA/DEC) also be used.
- the two catalogues mentioned above find the same object properties (color, concentration, effective radius, etc.), so it doesn't matter from which of the two we extract the properties. There are a few parameters present in each of the catalogues that aren't in the other, and sometimes the names differ (e.g. `PA` vs `pos_angle`). The only relevant difference for us is the presence of a surface brightness (`mue_r`, $\mu_{e,r}$) in one catalogue. We need this.

In [1]:
from astropy.io import fits
import numpy as np
import pandas as pd

# Extract catalogue data

## Extract `FDSDWARF_LSB.fits`

In [2]:
# load fits file, and extract column names and object data

hdul = fits.open('./FDSDWARF_LSB.fits')
header = hdul[0].header
data_selective = hdul[1].data

col_names = data_selective.columns.names
print('FDSDWARF_LSB columns:', col_names)

# extract each target's name to a list. 
#  these target names match those used in SpaceFluff (we'll verify this later in this notebook)
targets_selective = [d['target'] for d in data_selective if 'UDGcand' in d['target']]

FDSDWARF_LSB columns: ['target', 'RA', 'DEC', 'PA', 'PA_e', 'arat', 'arat_e', 'r_mag', 'r_mag_e', 'g_mag', 'g_mag_e', 'r_nuc', 'g_nuc', 'reff', 'reff_e', 'n', 'n_e', 'u', 'u_e', 'g', 'g_e', 'r', 'r_e', 'i', 'i_e', 'C', 'RFF', 'Class', 'Ref.']


See http://cdsarc.u-strasbg.fr/ftp/J/A+A/620/A165/ReadMe for description of the columns printed above

In [3]:
# Loop over every entry in data_selective (which is the `FDSDWARF_LSB.fits` file) and map its properties to a dictionary.

selected_data = []

for d in data_selective:
    if 'UDGcand' in d['target']:
        
        object_properties = {
            "name": d[0]  # first manually assign 'name', since I prefer 'name' to 'target'
        }
        
        for idx, column in enumerate(data_selective.columns.names[1:]):  # then loop over the rest of the properties 
            object_properties[column] = d[idx+1]                         # and assign the property using its existing name
        
        
        selected_data.append(object_properties)

In [4]:
# save target names to txt file for later comparison to classification votes
#  we only need to this this once. Can uncomment the cell if we need to run it again.

# np.savetxt('sf_catalogue_targets.txt', targets, delimiter=',', fmt="%s")

In [5]:
# in notebook `sf_12-04-2021`, I extracted a list of unique target names from `classify-classifications.csv`. 
candidate_names_classify = np.loadtxt('../analysis/sf_candidate_names__classification-classify.txt', dtype=str)

In [6]:
# find intersection of names between FDSDWARF_LSB.fits and classify_classifications.csv
intersecting = list(set(targets_selective) & set(candidate_names_classify))

print('Number of intersecting targets:', len(intersecting))

Number of intersecting targets: 238


## Extract `LSBS_no_par_sel.fits`

In [7]:
hdul = fits.open('./LSBS_no_par_sel.fits')
header = hdul[0].header
data_no_selection = hdul[1].data

# extract all UDGcand_* targets from LSBS_no_par_sel
#  note that this fits file also contains other targets. we might want to check if any of those happen to be 
#   spacefluff candidates, but with another name. check using RA/dec
targets_no_selection = [d['target'] for d in data_no_selection if 'UDGcand' in d['target']]

print('LSBS_no_par_sel columns:', data_no_selection.columns.names)

LSBS_no_par_sel columns: ['target', 'RA', 'DEC', 'Reff', 'r_mag', 'g_mag', 'axis_ratio', 'pos_angle', 'n', 'u', 'g', 'r', 'i', 'ue', 'ge', 're', 'ie', 'Reffe', 'r_mage', 'ne', 'C', 'mue_r', 'bae', 'RFF']


In [8]:
# map targets' properties to a list of objects, same as above with FDS_DWARF_LSB

spacefluff_data = []

for d in data_no_selection:
    if d['target'] in set(candidate_names_classify):
        
        object_properties = {
            "name": d[0]
        }
        
        for idx, column in enumerate(data_no_selection.columns.names[1:]):
            object_properties[column] = d[idx+1]
        
        spacefluff_data.append(object_properties)

In [9]:
# convert the list of objects to a DataFrame (and inspect the head to see if it worked properly)
df_spacefluff_data = pd.DataFrame(spacefluff_data)

#df_spacefluff_data.head()

In [10]:
# save the DataFrame to csv for later use:
df_spacefluff_data.to_csv('./sf_spacefluff_object_data.csv', sep=",", index=False)

# load and inspect created .csv to see if it saved correctly:
df_spacefluff_data_read = pd.read_csv('./sf_spacefluff_object_data.csv', comment="#")

---
# Compare properties

Compare object properties between the selective and non-selective catalogues, to see if they match or if a different (more resource-intensive) method was used to extract objects' properties in the selective catalogue.

In [11]:
def check_object_property_match(index_sel, index_nosel):
    '''
    @param {int} index_sel: index of the object by this name in selected_data
    @param {int} index_nosel: index of the object by this name in spacefluff_data
    
    '''
    candidate_sel = selected_data[index_sel]        # properties of candidate according to `FDSDWARF_LSB.fits`
    candidate_nosel = spacefluff_data[index_nosel]  # properties of candidate according to `LSBS_no_par_sel.fits`

    columns_match = []
    
    if candidate_sel['name'] == candidate_nosel['name']:  
        col_sel = set(candidate_sel.keys())      # get properties of objects offered by FDSDWARF_LSB.fits
        col_nosel = set(candidate_nosel.keys())  # ^, but for LSBS_no_par_sel.fits

        col_intersection = col_sel.intersection(col_nosel)  # get the properties present in both the .fits files, 
                                                            #  so we can compare them in a loop
        for column in col_intersection:
            match = candidate_sel[column] == candidate_nosel[column]
            columns_match.append(match)

        set_match = set(columns_match)
        return set_match

In [12]:
# check if properties of UDGcand_3 match (I picked the indices by inspection)
check_object_property_match(0, 2)  

{True}

So all properties that exist in both .fits files match for UDGcand_3. I manually also checked the other properties, and some don't have the same naming (PA vs. pos_angle), but for this candidate, they also match.

Now, since we have most of the code anyway, let's check for all other objects that exist in SpaceFluff (note: in the creation of `spacefluff_data` I only selected objects in SpaceFluff, not all `UDGcand` objects. We could also check for all `UDGcand` objects, but I won't be doing that here).

In [13]:
objects_sel = [d['target'] for d in data_selective]      # extract target names
objects_nosel = [d['target'] for d in data_no_selection] # ^

objects_intersection = set(objects_sel).intersection(set(objects_nosel))  # get the intersection of target names

In [14]:
object_index_lookup = {}

for object_name in objects_intersection:  # create object like {'UDGcand_1': { 'sel': None, 'nosel': None }}
    object_index_lookup[object_name] = {  #  so we only have to loop each *_data list once
        'sel': None,
        'nosel': None
    }

In [15]:
# loop through each list (selected_data and spacefluff_data), and assign the object's index in the list to the lookup

for index, obj in enumerate(selected_data):
    if obj['name'] in objects_intersection:
        object_index_lookup[obj['name']]['sel'] = index
        
for index, obj in enumerate(spacefluff_data):
    if obj['name'] in objects_intersection:
        object_index_lookup[obj['name']]['nosel'] = index

In [16]:
# loop through the lookup and compare each object's properties. if any don't match, the loop'll print the object's name
#  and we can manually investigate what's up

all_match = 0
not_in_spacefluff = 0

for (name, indices) in object_index_lookup.items():
    if type(indices['sel']) == int and type(indices['nosel']) == int:
        set_match = check_object_property_match(indices['sel'], indices['nosel'])
        
        if set_match == set([True]):
            all_match += 1
        else:
            print("Properties don't match, investigate!", name, indices) 
    
    else:
        # object doesn't exist in SpaceFluff, which is not a problem
        not_in_spacefluff += 1
        continue
    
len(objects_intersection) == all_match + not_in_spacefluff

True