# PLAsTiCC v2 taxonomy

_Alex Malz (GCCL@RUB)_ (add your name here)

The purpose of this notebook is to outline a bitmask schema for hierarchical classes of LSST alerts.
The bitmask corresponds to a "best" classification to be included in the alert.
Each digit in the bitmask, however, corresponds to a vector of classification probabilities, confidence flags, or scores that can be used to subsample the alert stream.
Persistent features could be queried for the subsampled objects from a separate database, which could be used for further selection.

In [1]:
from treelib import Node, Tree
import string

## Housekeeping

We need to think about how to sort through the classification information.
`directory` and `index` are very simplistic starting points.
It'll be easier when we have a better idea of what subsampling operations we'll perform.

In [2]:
directory = {}
index = {}

### Generating the integer codes

The idea is that every level of the tree corresponds to one digit in the bitmask.
The number of objects in the

In [3]:
### TODO Alex: Please reverse intger mask so no leading zeros
### map real to 1 and bogus to 0 - doesn't need a 0 for alert digit

digs = string.digits + string.ascii_letters

def int2base(x, base):
    if x < 0:
        sign = -1
    elif x == 0:
        return digs[0]
    else:
        sign = 1

    x *= sign
    digits = []

    while x:
        digits.append(digs[int(x % base)])
        x = int(x / base)

    if sign < 0:
        digits.append('-')

    # digits.reverse()

    return ''.join(digits)

## Building a phylogenetic tree

Given the hierarchical class relationships, make a tree diagram (and record some hopefully useful information).

In [4]:
def branch(tree, parent, children, prepend=["Other"], append=None, directory=directory, index=index):
    directory[parent] = {}
    if prepend is not None:
        proc_pre = [parent + "/" + pre for pre in prepend]
        children = proc_pre + children
    if append is not None:
        proc_app = [parent + "/" + appe for app in append]
        children = children + proc_app
    bigbase = len(children)
    for i, child in enumerate(children):
        directory[parent][child] = i
        index[child] = index[parent] + int2base(i, bigbase)# + index[parent]
        tree.create_node(index[child]+" "+child, child, parent=parent)
    return(bigbase, directory, index)

It would be better to start with something like `directory` than to build it as we go along, but, hey, this is a hack.

In [5]:
tree = Tree()

# index["Alert"] = ''#int2base(0, 1)
# tree.create_node(index["Alert"] + " " + "Alert", "Alert")

# branch(tree, "Alert", ["Bogus", "Real"])#, prepend=["Unclassified"])

# index["Real"] = ''#int2base(0, 1)
# tree.create_node(index["Real"] + " " + "Alert/Real", "Real")

# branch(tree, "Real", ["Static", "Moving"])#, prepend=['Unclassified'])

index["Static"] = ''#int2base(0, 1)
tree.create_node(index["Static"] + " " + "Alert/Real/Static", "Static")

branch(tree, "Static", ["Non-Recurring", "Recurring"])

branch(tree, "Recurring", ["Periodic", "Non-Periodic"])

branch(tree, "Periodic", ["Cepheid", "RR Lyrae", "Delta Scuti", "EB", "LPV/Mira"])

branch(tree, "Non-Periodic", ["AGN"])

branch(tree, "Non-Recurring", ["SN-like", "Fast", "Long"])

branch(tree, "SN-like", ["Ia", "Ib/c", "II", "Iax", "91bg"])

branch(tree, "Fast", ["KN", "M-dwarf Flare", "Dwarf Novae", "uLens"])

branch(tree, "Long", ["SLSN", "TDE", "ILOT", "CART", "PISN"])

tree.show()

 Alert/Real/Static
├── 0 Static/Other
├── 1 Non-Recurring
│   ├── 10 Non-Recurring/Other
│   ├── 11 SN-like
│   │   ├── 110 SN-like/Other
│   │   ├── 111 Ia
│   │   ├── 112 Ib/c
│   │   ├── 113 II
│   │   ├── 114 Iax
│   │   └── 115 91bg
│   ├── 12 Fast
│   │   ├── 120 Fast/Other
│   │   ├── 121 KN
│   │   ├── 122 M-dwarf Flare
│   │   ├── 123 Dwarf Novae
│   │   └── 124 uLens
│   └── 13 Long
│       ├── 130 Long/Other
│       ├── 131 SLSN
│       ├── 132 TDE
│       ├── 133 ILOT
│       ├── 134 CART
│       └── 135 PISN
└── 2 Recurring
    ├── 20 Recurring/Other
    ├── 21 Periodic
    │   ├── 210 Periodic/Other
    │   ├── 211 Cepheid
    │   ├── 212 RR Lyrae
    │   ├── 213 Delta Scuti
    │   ├── 214 EB
    │   └── 215 LPV/Mira
    └── 22 Non-Periodic
        ├── 220 Non-Periodic/Other
        └── 221 AGN



Yeah, not sure these are really useful. . .

In [6]:
print(directory)

{'Static': {'Static/Other': 0, 'Non-Recurring': 1, 'Recurring': 2}, 'Recurring': {'Recurring/Other': 0, 'Periodic': 1, 'Non-Periodic': 2}, 'Periodic': {'Periodic/Other': 0, 'Cepheid': 1, 'RR Lyrae': 2, 'Delta Scuti': 3, 'EB': 4, 'LPV/Mira': 5}, 'Non-Periodic': {'Non-Periodic/Other': 0, 'AGN': 1}, 'Non-Recurring': {'Non-Recurring/Other': 0, 'SN-like': 1, 'Fast': 2, 'Long': 3}, 'SN-like': {'SN-like/Other': 0, 'Ia': 1, 'Ib/c': 2, 'II': 3, 'Iax': 4, '91bg': 5}, 'Fast': {'Fast/Other': 0, 'KN': 1, 'M-dwarf Flare': 2, 'Dwarf Novae': 3, 'uLens': 4}, 'Long': {'Long/Other': 0, 'SLSN': 1, 'TDE': 2, 'ILOT': 3, 'CART': 4, 'PISN': 5}}


In [7]:
print(index)

{'Static': '', 'Static/Other': '0', 'Non-Recurring': '1', 'Recurring': '2', 'Recurring/Other': '20', 'Periodic': '21', 'Non-Periodic': '22', 'Periodic/Other': '210', 'Cepheid': '211', 'RR Lyrae': '212', 'Delta Scuti': '213', 'EB': '214', 'LPV/Mira': '215', 'Non-Periodic/Other': '220', 'AGN': '221', 'Non-Recurring/Other': '10', 'SN-like': '11', 'Fast': '12', 'Long': '13', 'SN-like/Other': '110', 'Ia': '111', 'Ib/c': '112', 'II': '113', 'Iax': '114', '91bg': '115', 'Fast/Other': '120', 'KN': '121', 'M-dwarf Flare': '122', 'Dwarf Novae': '123', 'uLens': '124', 'Long/Other': '130', 'SLSN': '131', 'TDE': '132', 'ILOT': '133', 'CART': '134', 'PISN': '135'}


## Building a structure for hierarchical classification

The whole point of this, for me, is for the classification to have corresponding posterior probabilities, or at least confidence flags or scores, because I'd want to use them to rapidly select follow-up candidates.
[This](https://community.lsst.org/t/projects-involving-irregularly-shaped-data/4466) looks potentially relevant.
I guess it could also be used for packaging up additional features into an alert without bloating it up too much.