# Batch Aggregation for SLSN

In [1]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import pandas as pd

import SLSN_batch_aggregation

WORKFLOW_ID_SLSN = 13193





Download the raw classifications and subjects from Dropbox if you don't have them already:

In [2]:
!wget https://www.dropbox.com/scl/fi/t9kcfsrcix3mpj0u3lyqs/superluminous-supernovae-classifications.csv?rlkey=n0hgnnhxeapa0vfhie30hti4s&dl=0
!wget https://www.dropbox.com/scl/fi/l9c85k2qirvo7yvkw0uxr/superluminous-supernovae-subjects.csv?rlkey=8c1ii7d9rig287zvxe7m9wrca&dl=0

Loading in SLSN subjects:

In [3]:
subjects = pd.read_csv('superluminous-supernovae-subjects.csv')
subjects = subjects[subjects.workflow_id==WORKFLOW_ID_SLSN]

Using the `batch_aggregation` function, create a DataFrame of our aggregated data:

In [4]:
data = SLSN_batch_aggregation.batch_aggregation(generate_new_classifications=False)

Loading classifications
Extracting annotations
Aggregating data


In [5]:
classification_data = pd.read_csv('superluminous-supernovae-classifications.csv')

In [6]:
data = data.merge(subjects, on='subject_id')

The SLSN workflow has two questions, so our data can be separated out by those:

In [7]:
q1_aggregated = data[data.task=='T0'].copy()
q1_aggregated.sort_values('agreement', ascending=False, inplace=True)

q2_aggregated = data[data.task=='T1'].copy()
q2_aggregated.sort_values('agreement', ascending=False, inplace=True)

Then, showing the subjects that have the highest likelihood of being a SLSN:

In [8]:
candidates = q1_aggregated[(q1_aggregated.most_likely=='yes')].copy() # (q1_aggregated.agreement>=0.7) 
# if you wanted to add a cutoff, add the commented line to the mask

candidates['index'] = candidates['subject_id']
candidates.set_index('index', inplace=True)

In [9]:
urls = []
for i in range(len(candidates)):
    urls.append('<a href=https://zooniverse.github.io/slsn-data/subjects/'+str(candidates.subject_id.iloc[i])+'><div>'+str(candidates.subject_id.iloc[i])+'</div></a>')

candidates['subject_page_url'] = urls

Below is an exported HTML table that is more practical than it is nice to look at, but it includes links to the subject pages previously built by Zooniverse. I have these currently in order of descending agreement percentage, but this can be updated to whatever is most helpful.

In [10]:
candidates.to_html('candidates.html', escape=False, index=False,
                    columns=['subject_id', 'subject_page_url', 'task', 'num_votes', 'agreement', 'classifications_count', 'retirement_reason'])

You can see an example of this on the GitHub page: [https://htmlpreview.github.io/?https://github.com/astrohayley/SLSN-Aggregation-Example/blob/main/candidates.html](https://htmlpreview.github.io/?https://github.com/astrohayley/SLSN-Aggregation-Example/blob/main/candidates.html)

Here's also how you can interact with this table if you open it in your own Jupyter Notebook. The commented cell below enables the interactive module, so uncomment and run it if you're in your own notebook.

In [11]:
# from itables import init_notebook_mode
# init_notebook_mode(all_interactive=True)

In [12]:
candidates[['subject_id', 'subject_page_url', 'task', 'num_votes', 'agreement', 'classifications_count', 'retirement_reason']].head()

Unnamed: 0_level_0,subject_id,subject_page_url,task,num_votes,agreement,classifications_count,retirement_reason
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
91002190,91002190,<a href=https://zooniverse.github.io/slsn-data...,T0,10,1.0,10,classification_count
84504566,84504566,<a href=https://zooniverse.github.io/slsn-data...,T0,10,1.0,10,classification_count
91041605,91041605,<a href=https://zooniverse.github.io/slsn-data...,T0,10,1.0,10,classification_count
91041523,91041523,<a href=https://zooniverse.github.io/slsn-data...,T0,10,1.0,10,classification_count
91041525,91041525,<a href=https://zooniverse.github.io/slsn-data...,T0,10,1.0,10,classification_count
