## Remove Object Class from Training Set

**Objective:** Learn how to remove objects within the `avocado` training set so that we can isolate specfic objects as "anomalous" in our training process.

**Notes:**
* Quick-view all attributes of an object with `pprint(vars(object_name))`
* In [avocado_paper_figures.ipynb](https://github.com/lfulmer/avocado/blob/master/notebooks/avocado_paper_figures.ipynb), what does "predictions" mean?

In [1]:
# Imports

# Standard
import sys
import copy

import numpy as np
import pandas as pd
from pprint import pprint
import matplotlib.pyplot as plt
from matplotlib import rcParams

# Auxillary
import avocado
avocado.__file__

'/astro/users/lfulmer/.conda/envs/earthseed/lib/python3.8/site-packages/avocado-0.1-py3.8.egg/avocado/__init__.py'

In [2]:
### Load the plasticc training set from a .h5 file, including both observations and metadata
train = avocado.Dataset.load("plasticc_train")

### Load the plasstic testing set from a .h5 file, including both observations and metadata
### THIS TAKES A WHILE. DON'T DO THIS MORE THAN ONCE IF YOU CAN HELP IT.
# testing_set = avocado.Dataset.load("plasticc_test")

### Explore the training set

In [4]:
# How many astronomical objects are in the training set?
print(f"When we first load the training set, it contains {len(train.metadata)} objects.")

When we first load the training set, it contains 7848 objects.


In [7]:
# What attributes do these avocado objects have?
pprint(vars(train))
pprint(vars(train.get_object(0)))

{'chunk': None,
 'classifier': None,
 'features': None,
 'metadata':                           ra     decl    ddf  host_specz  host_photoz  \
object_id                                                               
plasticc_000000615  349.0461 -61.9438   True       0.000        0.000   
plasticc_000000713   53.0859 -27.7844   True       1.818        1.627   
plasticc_000000730   33.5742  -6.5796   True       0.232        0.226   
plasticc_000000745    0.1899 -45.5867   True       0.304        0.281   
plasticc_000001124  352.7113 -63.8237   True       0.193        0.241   
...                      ...      ...    ...         ...          ...   
plasticc_130739978   26.7188 -14.9403  False       0.000        0.000   
plasticc_130755807  120.1013 -62.6967  False       0.172        2.561   
plasticc_130762946  203.1081 -55.6821  False       0.000        0.000   
plasticc_130772921   79.1016 -35.5018  False       0.000        0.000   
plasticc_130779836  301.9922 -17.4263  False       0.00

In [8]:
# What columns do the metadata and observations have?
print(train.metadata.columns)
print(train.get_object(0).observations.columns)

Index(['ra', 'decl', 'ddf', 'host_specz', 'host_photoz', 'host_photoz_error',
       'mwebv', 'class', 'true_submodel', 'redshift', 'true_distmod',
       'true_lensdmu', 'true_vpec', 'true_rv', 'true_av', 'true_peakmjd',
       'libid_cadence', 'tflux_u', 'tflux_g', 'tflux_r', 'tflux_i', 'tflux_z',
       'tflux_y', 'galactic'],
      dtype='object')
Index(['object_id', 'time', 'flux', 'flux_error', 'detected', 'band'], dtype='object')


In [12]:
# Print one class
# train.metadata[train.metadata['class'] == 92]
# train.objects[train.metadata['class'] == 92]

In [11]:
# Print everything but one class
# train.metadata[train.metadata['class'] != 92]
# train.objects[train.metadata['class'] != 92]

In [4]:
### Make copies for the isolate_ and remove_ training sets
isolate_train = copy.copy(train)
remove_train = copy.copy(train)

### Create a mask that removes an astronomical object class from the training set
Class 64 is a [kilonova](https://en.wikipedia.org/wiki/Kilonova), "when two neutronstars or a neutron star and a black hole merge into each other."

In [14]:
# Choose a specific class, and create masks that isolate and remove the chosen class

specific_class = 64
isolate_mask = train.metadata['class'] == specific_class
remove_mask = train.metadata['class'] != specific_class

### Redefine the training set 
... to include only selected object classes (or to remove a selected object class)

In [6]:
# Redefine

isolate_train.metadata = train.metadata[isolate_mask]
isolate_train.objects = train.objects[isolate_mask]
isolate_train.name = 'plasticc_isolate_train'

remove_train.metadata = train.metadata[remove_mask]
remove_train.objects = train.objects[remove_mask]
remove_train.name = 'plasticc_remove_train'

**Next:** Save the new training set with a single object class removed. Adjust settings within avocado so that I can run <br> `avocado_augment plasticc_remove_train plasticc_remove_augment` on Epyc. See [avocado documentation](https://avocado-classifier.readthedocs.io/en/latest/plasticc.html#augmenting-the-plasticc-dataset).

### Resting Place