# ELAsTiCC taxonomy

_Alex Malz (GCCL@RUB)_ , _Rob Knop (raknop@lbl.gov)_

The original thought behind this was to define a "bit" mask (really, the decimal equivalent), but in pratice that's not what we have.  Rather, what we have is a hierarchical classification, where each level of the hierarchy is one power of 10, and categories the next level down are different for each parent category.  The lowest power of 10 that is not zero represents how specific the category is; if the ones digit is not zero, then the category is as specific as the taxonomy gets.


## Use of the taxonomy

Brokers will ingest alerts produced by the ELAsTiCC team (eventually, produced by LSST).  Each alert will have information about one _DiaSource_ (DIA="Differential Imaging Analysis"); this is a single observation at one time of a _DiaObject_.  A _DiaObject_ represents a single astronomical object or event; in practice, it's defined by a position on the sky.  A new source found at the position of an existing object will be assigned to that existing object.  The alert for a _DiaSource_ will also include the data for that source's _DiaObject_, as well as previously-detected sources for that same object, and (if the object was discovered at least a day ago) forced photometry (stored using the _DiaForcedSource_ schema) dating back to at most 30 days before the first detection fo the object.

Brokers will then apply whatever algorithms they have to estimate classifications for the source.  Each different algorithm a broker uses is called a _classifier_.  When a classifier responds to an alert, it should produce a set of (_classId_, _probability_) pairs.  All of the _probability_ values for a single source should sum to 1.  The _classId_ values are described by this taxonomy; the actual hierarchical list of values can be found near the bottom of this notebook.

### Avro schema

All schema for ELAsTiCC can be found in https://github.com/LSSTDESC/elasticc/tree/main/alert_schema

Brokers will ingest the "alert" schema; as of this writing, the current version is 0.9.1 : https://github.com/LSSTDESC/elasticc/blob/main/alert_schema/elasticc.v0_9_1.alert.avsc  That schema includes the object, source, and forcedsource schema (all in the same directory).

Brokers will publish the "brokerClasification" schema; again, as of this writing, the current verison is 0.9.1 : https://github.com/LSSTDESC/elasticc/blob/main/alert_schema/elasticc.v0_9_1.brokerClassification.avsc


### Documentation of specific categories

* **Meta/Residual** -- All of the probabilities returned for a single alert should sum to 1.  This is the categorory to put the probaiblity for "not any of the other things I've assigned a probability for".  One use of this would be for yes/no binary categorizer.  Suppose you just want to report the probability that an event is a SNIa.  You'd assign that probability to the SNIa categority, and one minus that probability to this category.  So, if the algorithm produced a 33% chance it's a SNIa, the SNIa category would get 0.33, and the Meta/Residual category would get 0.67.

* **Meta/NotClassified** -- Use this category to report that your algoirthm chose not to classify a source; assign a probability of 1 to this category in that case.  The purpose of this is so that we can diagnose whether and why alerts are getting dropped.  If we receive a this classification, we know the alert made it all the way through the system from, but that the algorithm did not think it had enough information to actually supply a classification.

### Generating the integer codes

The idea is that every level of the tree corresponds to one digit in the number for the classification.

* 1000s : General category (Meta info, static object, non-recurring object, recurring object)
* 100s : Variable vs. Static object. (Static object would be something like a persistent subtraction artifact that doesn't get caught by the LSST R/B system.)
* 10s : Specific category (e.g. is it a SN-like variable, a periodic recurring object, non-perodic recurring object, etc.)
* 1s : Specific cateogorization (SNIa, SNIb, AGN, etc.)

In [1]:
from treelib import Node, Tree
import string

## Building a phylogenetic tree

Given the hierarchical class relationships, make a tree diagram (and record some hopefully useful information).

## Housekeeping

We need to think about how to sort through the classification information.
`directory` and `index` are very simplistic starting points.
It'll be easier when we have a better idea of what subsampling operations we'll perform.

In [2]:
directory = {}
index = {}

In [3]:
maxdep = 3
def branch(tree, parent, children, prepend=["Other"], append=None, directory=directory, index=index):
    if prepend is not None:
        proc_pre = [parent + "/" + pre for pre in prepend]
        children = proc_pre + children
    if append is not None:
        proc_app = [parent + "/" + appe for app in append]
        children = children + proc_app
    tmp = parent
    level = 0
    while tree.ancestor(tmp) is not None:
        level += 1
        tmp = tree.ancestor(tmp)
    directory[parent] = {}
    for i, child in enumerate( children ):
        directory[parent][child] = i
#         print(index[parent], type(index[parent]))
        if index[parent] != '':
            index[child] = str(int(index[parent]) + (i+1)* 10 ** (maxdep-level))
        else:
            index[child] = str((i+1)* 10 ** (maxdep-level))
        tree.create_node(index[child]+" "+child, child, parent=parent)

It would be better to start with something like `directory` than to build it as we go along, but, hey, this is a hack.

In [4]:
tree = Tree()

basename = "Alert"

index[basename] = ''
tree.create_node(index[basename] + " " + basename, basename)

# need spot for residual, choose not to classify -- metacategory? possibly rename to "Flagged"?
index["Meta"] = "0"
tree.create_node(index["Meta"] + " Meta", "Meta", parent = basename)
branch(tree, "Meta", ["Residual", "NotClassified"])

branch(tree, basename, ["Static", "Variable"], prepend=[] )

branch(tree, "Static", [] )

branch(tree, "Variable", ["Non-Recurring", "Recurring"] )

branch(tree, "Recurring", ["Periodic", "Non-Periodic"])

branch(tree, "Periodic", ["Cepheid", "RR Lyrae", "Delta Scuti", "EB", "LPV/Mira"])

branch(tree, "Non-Periodic", ["AGN"])

branch(tree, "Non-Recurring", ["SN-like", "Fast", "Long"])

branch(tree, "SN-like", ["Ia", "Ib/c", "II", "Iax", "91bg"])

branch(tree, "Fast", ["KN", "M-dwarf Flare", "Dwarf Novae", "uLens"])

branch(tree, "Long", ["SLSN", "TDE", "ILOT", "CART", "PISN"])

tree.show()

 Alert
├── 0 Meta
│   ├── 100 Meta/Other
│   ├── 200 Residual
│   └── 300 NotClassified
├── 1000 Static
│   └── 1100 Static/Other
└── 2000 Variable
    ├── 2100 Variable/Other
    ├── 2200 Non-Recurring
    │   ├── 2210 Non-Recurring/Other
    │   ├── 2220 SN-like
    │   │   ├── 2221 SN-like/Other
    │   │   ├── 2222 Ia
    │   │   ├── 2223 Ib/c
    │   │   ├── 2224 II
    │   │   ├── 2225 Iax
    │   │   └── 2226 91bg
    │   ├── 2230 Fast
    │   │   ├── 2231 Fast/Other
    │   │   ├── 2232 KN
    │   │   ├── 2233 M-dwarf Flare
    │   │   ├── 2234 Dwarf Novae
    │   │   └── 2235 uLens
    │   └── 2240 Long
    │       ├── 2241 Long/Other
    │       ├── 2242 SLSN
    │       ├── 2243 TDE
    │       ├── 2244 ILOT
    │       ├── 2245 CART
    │       └── 2246 PISN
    └── 2300 Recurring
        ├── 2310 Recurring/Other
        ├── 2320 Periodic
        │   ├── 2321 Periodic/Other
        │   ├── 2322 Cepheid
        │   ├── 2323 RR Lyrae
        │   ├── 2324 Delta Scuti
        │

## Building a structure for hierarchical classification

The whole point of this, for me, is for the classification to have corresponding posterior probabilities, or at least confidence flags or scores, because I'd want to use them to rapidly select follow-up candidates.
[This](https://community.lsst.org/t/projects-involving-irregularly-shaped-data/4466) looks potentially relevant.
I guess it could also be used for packaging up additional features into an alert without bloating it up too much.