# Classify the entire Test Set and obtain results

In this notebook we used the previously trained Gradient Boosting Decision Tree (see notebook [6_train_classifier](6_train_classifier.ipynb) for how to train it) to classify the test set. Additionally, we show some performance metrics.

In this notebook we also provide some analysis result plots (e.g. recall and precision as a function of the light curve length).

#### Index<a name="index"></a>
1. [Import Packages](#imports)
2. [Test Set Features](#testFeatures)
    1. [Compute Features](#computeTestFeatures)
    2. [Load Features](#loadTestFeatures)
3. [Load Classifier](#loadClassifier)
4. [Classify Test Set](#testClassify)
5. [Performance](#performance)
    1. [Metrics](#metrics)
    2. [Confusion Matrix](#cm)
    3. [ROC Curves](#roc)
6. [Recall and Precision](#precisionRecall)
    1. [Load test sets](#loadTest)
    1. [Light curve length](#lcLength)
        1. [Calculate results](#lcLengthCalc)
        2. [Setup](#lcLengthSetup)
        3. [Recall](#lcLengthRecall) / [Precision](#lcLengthPrecision) / [Density](#lcLengthDensity)
    2. [Median inter-night gap](#medianGap)
        1. [Calculate results](#medianGapCalc)
        2. [Setup](#medianGapSetup)
        3. [Recall](#medianGapRecall) / [Precision](#medianGapPrecision) / [Density](#medianGapDensity)
    3. [Longest gap](#maxGap)
        1. [Calculate results](#maxGapCalc)
        2. [Setup](#maxGapSetup)
        3. [Recall](#maxGapRecall) / [Precision](#maxGapPrecision) / [Density](#maxGapDensity)
    4. [Number of gaps $\geq$ 10 days](#numberLargeGap)
        1. [Setup](#numberLargeGapSetup)
        2. [Recall](#numberLargeGapRecall) / [Precision](#numberLargeGapPrecision) / [Density](#numberLargeGapDensity)
    2. [Number of obs in [-10, 30] obs-frame performance](#nearPeak)
        1. [Calculate results](#nearPeakCalc)
        2. [Setup](#nearPeakSetup)
        3. [Recall](#nearPeakRecall) / [Precision](#nearPeakPrecision) / [Density](#nearPeakDensity)

## 1. Import Packages<a name="imports"></a>

In [None]:
!pip install ../snmachine/

In [None]:
import os
import pickle
import sys
import time

In [None]:
import lightgbm as lgb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats
import seaborn as sns

In [None]:
from snmachine import snclassifier, gps, snfeatures, analysis
from utils.plasticc_pipeline import create_folder_structure, get_directories, load_dataset
from utils.plasticc_utils import plot_confusion_matrix, plot_roc_curve

#### Aestetic settings

In [None]:
%matplotlib inline
%config Completer.use_jedi = False  # enable autocomplete

size_default = 1.5
size_larger = 1.9
sns.set(font_scale=size_default, style="ticks", context="paper")

## 2. Test Set Classification Features<a name="testFeatures"></a>

Before classifying the test set events, we need to obtain their features. 

In [None]:
has_features = 1  # has calculated features
save_features = 0
save_updated_metadata = 0
is_only_roll = 1
is_updated = 1

is_show_good = 1  # only show bins with > threshold objs
threshold = 300  # 300 = 0.01% of the 3-years baseline test set
if is_show_good:
    threshold = threshold
else:
    threshold = None

### 2.1. Compute Features<a name="computeTestFeatures"></a>

If the test set features were never calculated, compute them now. 

In [None]:
os_name = 'baseline_v2_0_paper'
# os_name = 'noroll_v2_0_paper'
# os_name = 'presto_v2_0_paper'

# dataset_name = '3-years baseline'
dataset_name = '1.5-years baseline'

folder_path = f'/folder/to/path/'

In [None]:
folder_aug_name = 'aug_wfd_46k'
if is_only_roll:
    folder_aug_name = folder_aug_name[:-3] + 'roll_46k'
if is_updated:
    folder_aug_name = folder_aug_name + '_updated'

Then, **write** which `batch_ids` to compute the features on, the path with the basis to project the features, and the number of components to keep.

~1h45 per batch_id

* baseline: 001 to 012: 13h34min; 10 ids : 18h09min
* no roll: 001 to 012: 18h16min


In [None]:
if has_features == 0:
    extra_name_to_save = 'wfd'
    batch_ids = ['000', '001', '002', '003', '004', '005', '006', 
                 '007', '008', '009', '010', '011', '012']
    
    # Path were the basis to project was saved; 
    # this path changes if we want a different train set
    folder_aug_name = 'aug_wfd_46k'
#     folder_aug_name = 'aug_wfd_46k_v2'
    if is_only_roll:
        folder_aug_name = folder_aug_name[:-3] + 'roll_46k'
    if is_updated:
        folder_aug_name = folder_aug_name + '_updated'
    
    path_saved_eigendecomp = folder_path+f'../../analyses/{folder_aug_name}/wavelet_features'
    print(path_saved_eigendecomp)

    # Number of reduced wavelet components to keep
    number_comps = 40

### 2.2. Load Features<a name="loadTestFeatures"></a>

Load previously saved test set features. It takes <1min for all the test data.

In [None]:
if has_features == 1:
    time_ini = time.time()
    extra_name_to_save = 'wfd'

    batch_ids = ['000', '001', '002', '003', '004', '005', '006', 
                 '007', '008', '009', '010', '011', '012']

    # Collect the aggregated data
    test_data_ids = []
    X_test_ids = []  # features
    y_test_ids = []  # classes
    metadata_test_ids = []

    for batch_id in batch_ids:
        print(f'Batch {batch_id}')

        # Name and path of the test subset
        data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
        if is_only_roll:
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
        if is_updated:
            data_file_name = data_file_name[:-5] + '_updated.pckl'
        data_path = os.path.join(folder_path, data_file_name)

        # Path to the test subset features
        analysis_name = data_file_name[:-5]
        folder_analysis_path = folder_path[:-14] + 'analyses'
        directories = get_directories(folder_analysis_path, analysis_name) 
        path_saved_reduced_wavelets = directories['features_directory']

        # Load the features and extended metadata
        with open(os.path.join(path_saved_reduced_wavelets, 'features.pckl'), 'rb') as input:
            features = pickle.load(input)
        with open(os.path.join(path_saved_reduced_wavelets, 'extended_metadata.pckl'), 'rb') as input:
            extended_metadata = pickle.load(input)

        # Aggregate the data
        print(np.shape(features))
        print(np.shape(extended_metadata.target.astype(int)))
        print('')
        X_test_ids.append(features)
        y_test_ids.append(extended_metadata.target.astype(int))
        metadata_test_ids.append(extended_metadata)
    print(time.time()-time_ini)

## 3. Load Classifier<a name="loadClassifier"></a>

First, **write** in `path_saved_classifier` the path to the folder that contains the trained classifier instance. Additionally, **write** in `classifier_name` the name under which the classifier was saved.

In [None]:
do_classification = 0  # make classification or not

In [None]:
# Path were the classifier was saved; 
# this path changes if we want a different classifier
path_saved_classifier = folder_path+f'../../analyses/{folder_aug_name}/classifications'
path_saved_plots = folder_path+f'../../analyses/{folder_aug_name}/plots'
print(path_saved_classifier)
print(path_saved_plots)
    
classifier_name = 'full_opt.pck'

Load classifier instance.

In [None]:
with open(os.path.join(path_saved_classifier, classifier_name), 'rb') as input:
    classifier_instance = pickle.load(input)

Obtain the classifier.

In [None]:
classifier = classifier_instance.classifier

[Go back to top.](#index)

## 4. Classify Test Set<a name="testClassify"></a>

Compute the predicted class (`y_pred`) and the probability of belonging to each different class (`y_probs`). Note that the predicted class is the one with the highest probability.

In [None]:
if do_classification:
    time_ini = time.time()

    y_pred_test_ids = []
    y_probs_test_ids = []
    for i in np.arange(len(batch_ids)):
        batch_id = batch_ids[i]
        print(f'Batch {batch_id}')

        # Compute the classification results
        y_pred_test = classifier.predict(X_test_ids[i])
        y_probs_test = classifier.predict_proba(X_test_ids[i])

        # Save results to a list
        y_pred_test_ids.append(pd.DataFrame(y_pred_test))
        y_probs_test_ids.append(pd.DataFrame(y_probs_test))

        # Update metadata
        metadata = metadata_test_ids[i]
        metadata['y_pred'] = y_pred_test
        metadata['y_probs_0'] = y_probs_test[:, 0]
        metadata['y_probs_1'] = y_probs_test[:, 1]
        metadata['y_probs_2'] = y_probs_test[:, 2]

        if save_updated_metadata:
            print('Save metadata')
            # Path to the saved metadata
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
            if is_only_roll:
                print('only roll')
                data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
            if is_updated:
                data_file_name = data_file_name[:-5] + '_updated.pckl'
            analysis_name = data_file_name[:-5]
            folder_analysis_path = folder_path[:-14] + 'analyses'
            directories = get_directories(folder_analysis_path, analysis_name) 
            path_saved_metadata = directories['features_directory']

            metadata.to_pickle(os.path.join(path_saved_metadata, 'extended_metadata.pckl'))

    print(time.time()-time_ini)

In [None]:
# Concatenate also the true label and the extended metadata
y_test_all = pd.concat(y_test_ids)
metadata_test_all = pd.concat(metadata_test_ids)

In [None]:
y_pred_test_all = metadata_test_all['y_pred']
y_probs_test_all = np.array(metadata_test_all[['y_probs_0', 'y_probs_1', 'y_probs_2']])
X_test_all = pd.concat(X_test_ids)

[Go back to top.](#index)

## 5. Overall Performance<a name="performance"></a>

If we know the true class label of each event we can calculate the performance of the classifier. Otherwise, our predictions are saved in `y_pred_test` and `y_probs_test`.

In this example we know the true class labels.

### 5.1. Metrics<a name="metrics"></a>

We start by computing the Area under the ROC Curve (AUC) and the PLAsTiCC logloss. For that, choose which class to consider as *positive* (the other classes will be considered *negative*). Then, **write** in `which_column` the column that corresponds to that class. Note that the class order is accessed through the classifier.

In [None]:
which_column = 2 # we are interested in SN Ia vs others

In [None]:
classifier.which_column = which_column
auc_test = snclassifier.auc_score(classifier=classifier, X_features=X_test_all, 
                                  y_true=y_test_all, which_column=which_column)
logloss_test = snclassifier.logloss_score(classifier=classifier, X_features=X_test_all, 
                                          y_true=y_test_all)
print('{:^10} {:^10} {:^10}'.format('', 'AUC', 'Logloss'))
print('{:^10} {:^10.3f} {:^10.3f}'.format('test', auc_test, logloss_test))

In [None]:
number_boot = 300
ini_time = time.time()
logloss_s = np.zeros(number_boot)
number_objs = y_test_all.shape[0]
for k in range(number_boot):  # 300 for bootstrapping the values
    indexes = np.random.choice(number_objs, size=int(number_objs), 
                               replace=True)
    logloss_test_i = - snclassifier.logloss_score(classifier=classifier, 
                                                  X_features=X_test_all.iloc[indexes], 
                                                  y_true=y_test_all.iloc[indexes])
    logloss_s[k] = logloss_test_i
print(time.time() - ini_time)

In [None]:
percentil_025 = np.percentile(logloss_s, 2.5)
percentil_975 = np.percentile(logloss_s, 97.5)
boot_data_ci_s = [percentil_025, percentil_975]

In [None]:
boot_data_ci_s  # 300

Check how many events we correctly classified.

In [None]:
is_pred_right = y_pred_test_all == y_test_all.astype(str)
np.sum(is_pred_right), np.sum(is_pred_right)/len(is_pred_right)

[Go back to top.](#index)

### 5.2. Confusion Matrix<a name="cm"></a>

Now, plot the confusion matrix.

In [None]:
sns.set(font_scale=size_default, style="ticks")

In [None]:
title = f'Recall\nLog-loss = {-logloss_test:.3f}'
cm = analysis.plot_confusion_matrix(y_test_all.astype(str), y_pred_test_all.astype(str), 
                                    normalise='accuracy', title=title, figsize=(5,5),
                                    dict_label_to_real=analysis.dict_label_to_real_plasticc)

### 5.3. ROC Curves<a name="roc"></a>

In [None]:
analysis.plot_classifier_roc_curve(y_test_all, y_probs_test_all,
                                   dict_label_to_real=analysis.dict_label_to_real_plasticc)

In [None]:
sns.set(font_scale=size_default, style="ticks", context="paper")

[Go back to top.](#index)

## 6. Precision and Recall<a name="precisionRecall"></a>

In this section we produce precision and recall plots for diverse quantities.

In [None]:
calculate_results = 0

### 6.0. Load test sets<a name="loadTest"></a>

In [None]:
if calculate_results:  # calculating results envolve loading the test sets
    batch_ids = ['003'] 
#     batch_ids = ['001', '002', '003'] 
#     batch_ids = ['004', '005', '006', '007', '008'] 
    
    time_ini = time.time() # 1h45 per batch_id

    # Keep some data during this run for possible debug
    test_data_ids = []

    for batch_id in batch_ids:
        print(f'Batch {batch_id}')

        # Name and path of the test subset
        data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
        if is_only_roll:
            print('only roll')
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
        if is_updated:
            data_file_name = data_file_name[:-5] + '_updated.pckl'
        data_path = os.path.join(folder_path, data_file_name)
        print(0, data_path)

        # Load the test subset
        dataset = load_dataset(data_path)
        test_data_ids.append(dataset)
        print('')
    print(time.time()-time_ini)

In [None]:
is_true_snia = (y_test_all == 'SN Ia') | (y_test_all == '90') | (y_test_all == 90)
is_true_snii = (y_test_all == 'SN II') | (y_test_all == '42') | (y_test_all == 42)
is_true_snibc = (y_test_all == 'SN Ibc') | (y_test_all == '62') | (y_test_all == 62)

is_pred_snia = (y_pred_test_all == 'SN Ia') | (y_pred_test_all == '90') | (y_pred_test_all == 90)
is_pred_snii = (y_pred_test_all == 'SN II') | (y_pred_test_all == '42') | (y_pred_test_all == 42)
is_pred_snibc = (y_pred_test_all == 'SN Ibc') | (y_pred_test_all == '62') | (y_pred_test_all == 62)

# Use the same class order for the two lists below
is_true_type_list = [is_true_snia, is_true_snibc, is_true_snii] 
is_pred_type_list = [is_pred_snia, is_pred_snibc, is_pred_snii]

**Write** in `sn_order` an ordered list of the names of the classes. This should correspond to the class order used in `is_true_type_list`. Additionally, you can provide the colours with which to plot the classes results.

In [None]:
sn_order = ['SN Ia', 'SN Ibc', 'SN II']
diverg_color = sns.color_palette("Set2", 3, desat=1)
sn_colors = [diverg_color[2], diverg_color[0], diverg_color[1]]

[Go back to top.](#index)

### 6.1. Light curve length<a name="lcLength"></a>

#### 6.1.0. Calculate results<a name="lcLengthCalc"></a>

For the entire test set, it takes ~2min.

In [None]:
if calculate_results: 
    time_ini = time.time()
#     batch_ids = ['000', '001', '002', '003', '004', '005', '006', 
#                  '007', '008', '009', '010', '011', '012']

    lc_length_ids = []
    for i in np.arange(len(batch_ids)):
        batch_id = batch_ids[i]
        print(f'Batch {batch_id}')

        # Compute LC length/duration
        lc_length_id = analysis.compute_lc_length(test_data_ids[i])

        # Save the results to the lists
        lc_length_ids.append(lc_length_id)
        metadata_test_ids[i]['lc_length'] = lc_length_id
        metadata = metadata_test_ids[i]

        if save_updated_metadata:
            # Path to the saved metadata
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
            if is_only_roll:
                print('only roll')
                data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
            if is_updated:
                data_file_name = data_file_name[:-5] + '_updated.pckl'
            analysis_name = data_file_name[:-5]
            folder_analysis_path = folder_path[:-14] + 'analyses'
            directories = get_directories(folder_analysis_path, analysis_name) 
            path_saved_metadata = directories['features_directory']

            metadata.to_pickle(os.path.join(path_saved_metadata, 'extended_metadata.pckl'))

    lc_length_all = np.concatenate(lc_length_ids)
    metadata_test_all = pd.concat(metadata_test_ids)
    print(time.time() - time_ini)

#### 6.1.1. Setup<a name="lcLengthSetup"></a>

In [None]:
quantity_name = 'lc_length'
quantity = metadata_test_all[quantity_name]

**Choose** bins for the plot and whether or not to plot the values in the middle of the bins. Additionally, to consider only a subset of events, **mask** those events in `extra_subset`. If `extra_subset = True`, all the events are used.

In [None]:
bins = np.linspace(0, 300, 61)
use_mid_bins = True  # plot using the middle of the bins
extra_subset = True  # use all events

#### 6.1.2. Recall<a name="lcLengthRecall"></a>

Compute the recall and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
recall_s, boot_recall_ci, number_in_bin_s = analysis.compute_recall_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_true_type_list=is_true_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < 10])
print(show_bins[np.array(number_in_bin_s[:, 1]) < 10])
print(show_bins[np.array(number_in_bin_s[:, 2]) < 10])

x_label = 'Light curve length (days)'
x_min, x_max = -.5, 218

x_vline1 = 50
x_vline2 = 175

analysis.plot_sne_has_something(
    something_s=recall_s, boot_has_something_ci=boot_recall_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Recall')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.vlines(x=x_vline1, ymin=0, ymax=1., colors='gray', linewidth=3, ls='--')
plt.vlines(x=x_vline2, ymin=0, ymax=1., colors='gray', linewidth=3, ls='--')
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4, loc='lower center')
plt.title(dataset_name)

# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_recall_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.1.3. Precision<a name="lcLengthPrecision"></a>

Compute the precision and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
precision_s, boot_precision_ci, number_in_bin_s = analysis.compute_precision_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_pred_type_list=is_pred_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < 10])
print(show_bins[np.array(number_in_bin_s[:, 1]) < 10])
print(show_bins[np.array(number_in_bin_s[:, 2]) < 10])

analysis.plot_sne_has_something(
    something_s=precision_s, boot_has_something_ci=boot_precision_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Precision')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.vlines(x=x_vline1, ymin=0, ymax=1., colors='gray', linewidth=3, ls='--')
plt.vlines(x=x_vline2, ymin=0, ymax=1., colors='gray', linewidth=3, ls='--')
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4, loc='lower center')
plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                          f'{quantity_name}_precision_{os_name[:-11]}.pdf'), 
#             bbox_inches='tight')

#### 6.1.4. Density<a name="lcLengthDensity"></a>

In [None]:
cumulative = 0
show_sne = False

if cumulative != 0 :
    bins_hist = 10**4
    plt.ylabel('1 - CDF')
else:
    bins_hist = bins
    plt.ylabel('Density')
if show_sne:
    for j in np.arange(len(is_true_type_list)):
        sn_type = sn_order[j]
        sn_number = sn_name_to_number[sn_type]
        plt.hist(quantity[is_true_type_list[j] & extra_subset],
                 density=True, histtype='step', bins=bins_hist,
                 label=sn_type, color=sn_type_color[sn_number], 
                 cumulative=cumulative)
    plt.legend()
else:
    try:
        x_vals = quantity[extra_subset]
        print('Using a subset of the full test set.')
    except KeyError:
        x_vals = quantity
    plt.hist(x_vals,
             density=True, histtype='step', bins=bins_hist,
             cumulative=cumulative, linewidth=3)
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
#plt.title(title)
# plt.savefig(os.path.join(path_saved_plots, 
#                          f'{quantity_name}_density_{os_name[:-11]}.pdf'), 
#             bbox_inches='tight')

[Go back to top.](#index)

### 6.2. Median inter-night gap<a name="medianGap"></a>

Added requirement: more than 2 observations.

#### 6.2.0. Calculate results<a name="medianGapCalc"></a>

For the entire test set, it takes 2h20min.

In [None]:
if calculate_results: 
    time_ini = time.time()
#     batch_ids = ['000', '001', '002', '003', '004', '005', '006', 
#                  '007', '008', '009', '010', '011', '012']

    median_gap_ids = []
    for i in np.arange(len(batch_ids)):
        batch_id = batch_ids[i]
        print(f'Batch {batch_id}')

        # Compute LC length/duration
        median_gap_id = analysis.compute_median_internight_gap(
            test_data_ids[i])

        # Save the results to the lists
        median_gap_ids.append(median_gap_id)
        metadata_test_ids[i]['median_internight_gap'] = median_gap_id
        metadata = metadata_test_ids[i]

        if save_updated_metadata:
            # Path to the saved metadata
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
            if is_only_roll:
                print('only roll')
                data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
            if is_updated:
                data_file_name = data_file_name[:-5] + '_updated.pckl'
            analysis_name = data_file_name[:-5]
            folder_analysis_path = folder_path[:-14] + 'analyses'
            directories = get_directories(folder_analysis_path, analysis_name) 
            path_saved_metadata = directories['features_directory']
            print(path_saved_metadata)

            metadata.to_pickle(os.path.join(path_saved_metadata, 'extended_metadata.pckl'))

    median_gap_all = np.concatenate(median_gap_ids)
    metadata_test_all = pd.concat(metadata_test_ids)
    print(time.time() - time_ini)

#### 6.2.1. Setup<a name="medianGapSetup"></a>

In [None]:
quantity_name = 'median_internight_gap'
quantity = metadata_test_all[quantity_name]

**Choose** bins for the plot and whether or not to plot the values in the middle of the bins. Additionally, to consider only a subset of events, **mask** those events in `extra_subset`. If `extra_subset = True`, all the events are used.

In [None]:
# is_good_lc = (metadata_test_all['lc_length'] > 75)

In [None]:
bins = np.linspace(-0.5, 50.5, 52)
use_mid_bins = True  # plot using the middle of the bins
extra_subset = True  # use all events
# extra_subset = is_good_lc

#### 6.2.2. Recall<a name="medianGapRecall"></a>

Compute the recall and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
recall_s, boot_recall_ci, number_in_bin_s = analysis.compute_recall_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_true_type_list=is_true_type_list, 
    extra_subset=extra_subset)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < 10])
print(show_bins[np.array(number_in_bin_s[:, 1]) < 10])
print(show_bins[np.array(number_in_bin_s[:, 2]) < 10])

x_label = 'Median inter-night gap (days)'
x_min, x_max = 1, 27
#x_min, x_max = 1, 7

# x_vline1 = 50

analysis.plot_sne_has_something(
    something_s=recall_s, boot_has_something_ci=boot_recall_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Recall')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
# plt.vlines(x=x_vline1, ymin=0, ymax=1., colors='gray', linewidth=3, ls='--')
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4, loc='lower right')
plt.title(dataset_name)

# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_recall_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.2.3. Precision<a name="medianGapPrecision"></a>

Compute the precision and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
precision_s, boot_precision_ci, number_in_bin_s = analysis.compute_precision_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_pred_type_list=is_pred_type_list, 
    extra_subset=extra_subset)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 1]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 2]) < threshold])

analysis.plot_sne_has_something(
    something_s=precision_s, boot_has_something_ci=boot_precision_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Precision')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4, loc='lower right')
plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_precision_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.2.4. Density<a name="medianGapDensity"></a>

In [None]:
cumulative = 0
show_sne = False

if cumulative != 0 :
    bins_hist = 10**4
    plt.ylabel('1 - CDF')
else:
    bins_hist = bins
    plt.ylabel('Density')
if show_sne:
    for j in np.arange(len(is_true_type_list)):
        sn_type = sn_order[j]
        sn_number = sn_name_to_number[sn_type]
        plt.hist(quantity[is_true_type_list[j] & extra_subset],
                 density=True, histtype='step', bins=bins_hist,
                 label=sn_type, color=sn_type_color[sn_number], 
                 cumulative=cumulative)
    plt.legend()
else:
    try:
        x_vals = quantity[extra_subset]
        print('Using a subset of the full test set.')
    except KeyError:
        x_vals = quantity
    plt.hist(x_vals,
             density=True, histtype='step', bins=bins_hist,
             cumulative=cumulative, linewidth=3)
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
#plt.title(title)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_density_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

[Go back to top.](#index)

### 6.3. Longest gap<a name="maxGap"></a>

#### 6.3.0. Calculate results<a name="maxGapCalc"></a>

For the entire test set baseline, it takes 1h36min.
For the entire test set no roll, it takes 2h09min.

In [None]:
if calculate_results: 
    time_ini = time.time()
#     batch_ids = ['000', '001', '002', '003', '004', '005', '006', 
#                  '007', '008', '009', '010', '011', '012']

    max_gap_ids, number_big_gap_ids = [], []
    for i in np.arange(len(batch_ids)):
        batch_id = batch_ids[i]
        print(f'Batch {batch_id}')

        # Compute LC length/duration
        max_gap_id, number_big_gap_id = analysis.compute_max_and_threshold_gaps(
            test_data_ids[i], threshold=10)

        # Save the results to the lists
        max_gap_ids.append(max_gap_id)
        metadata_test_ids[i]['max_gap'] = max_gap_id
        number_big_gap_ids.append(number_big_gap_id)
        metadata_test_ids[i]['number_big_gap_10'] = number_big_gap_id
        metadata = metadata_test_ids[i]

        if save_updated_metadata:
            # Path to the saved metadata
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
            if is_only_roll:
                print('only roll')
                data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
            if is_updated:
                data_file_name = data_file_name[:-5] + '_updated.pckl'
            analysis_name = data_file_name[:-5]
            folder_analysis_path = folder_path[:-14] + 'analyses'
            directories = get_directories(folder_analysis_path, analysis_name) 
            path_saved_metadata = directories['features_directory']
            print(path_saved_metadata)

            metadata.to_pickle(os.path.join(path_saved_metadata, 'extended_metadata.pckl'))

    max_gap_all = np.concatenate(max_gap_ids)
    number_big_gap_10_all = np.concatenate(number_big_gap_ids)
    metadata_test_all = pd.concat(metadata_test_ids)
    print(time.time() - time_ini)

#### 6.3.1. Setup<a name="maxGapSetup"></a>

In [None]:
quantity_name = 'max_gap'
quantity = metadata_test_all[quantity_name]

In [None]:
print(np.min(quantity))
print(np.max(quantity))

**Choose** bins for the plot and whether or not to plot the values in the middle of the bins. Additionally, to consider only a subset of events, **mask** those events in `extra_subset`. If `extra_subset = True`, all the events are used.

In [None]:
np.linspace(0, 50, 51)

In [None]:
bins = np.linspace(0, 50, 51)
use_mid_bins = True  # plot using the middle of the bins
extra_subset = True  # use all events

#### 6.3.2. Recall<a name="maxGapRecall"></a>

Compute the recall and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
recall_s, boot_recall_ci, number_in_bin_s = analysis.compute_recall_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_true_type_list=is_true_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 1]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 2]) < threshold])

x_label = 'Longest inter-night gap (days)'
x_min, x_max = 0, 50
#x_min, x_max = 1, 7

analysis.plot_sne_has_something(
    something_s=recall_s, boot_has_something_ci=boot_recall_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Recall')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4)

plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_recall_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.3.3. Precision<a name="maxGapPrecision"></a>

Compute the precision and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
precision_s, boot_precision_ci, number_in_bin_s = analysis.compute_precision_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_pred_type_list=is_pred_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 1]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 2]) < threshold])

analysis.plot_sne_has_something(
    something_s=precision_s, boot_has_something_ci=boot_precision_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Precision')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4)
plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_precision_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.3.4. Density<a name="maxGapDensity"></a>

In [None]:
cumulative = 0
show_sne = False

if cumulative != 0 :
    bins_hist = 10**4
    plt.ylabel('1 - CDF')
else:
    bins_hist = bins
    plt.ylabel('Density')
if show_sne:
    for j in np.arange(len(is_true_type_list)):
        sn_type = sn_order[j]
        sn_number = sn_name_to_number[sn_type]
        plt.hist(quantity[is_true_type_list[j] & extra_subset],
                 density=True, histtype='step', bins=bins_hist,
                 label=sn_type, color=sn_type_color[sn_number], 
                 cumulative=cumulative)
    plt.legend()
else:
    try:
        x_vals = quantity[extra_subset]
        print('Using a subset of the full test set.')
    except KeyError:
        x_vals = quantity
    plt.hist(x_vals,
             density=True, histtype='step', bins=bins_hist,
             cumulative=cumulative, linewidth=3)
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
#plt.title(title)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_density_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

[Go back to top.](#index)

### 6.4. Number of gaps $\geq$ 10 days<a name="numberLargeGap"></a>

Number of big gaps ($\geq$10 days).

#### 6.4.1. Setup<a name="numberLargeGapSetup"></a>

In [None]:
quantity_name = 'number_big_gap_10'
quantity = metadata_test_all[quantity_name]
large_gap = 10

In [None]:
print(np.min(quantity))
print(np.max(quantity))

**Choose** bins for the plot and whether or not to plot the values in the middle of the bins. Additionally, to consider only a subset of events, **mask** those events in `extra_subset`. If `extra_subset = True`, all the events are used.

In [None]:
np.linspace(0, 13, 14)

In [None]:
bins = np.linspace(0, 13, 14)
use_mid_bins = False  # plot using the middle of the bins
extra_subset = True  # use all events

#### 6.4.2. Recall<a name="numberLargeGapRecall"></a>

Compute the recall and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
recall_s, boot_recall_ci, number_in_bin_s = analysis.compute_recall_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_true_type_list=is_true_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 1]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 2]) < threshold])

x_label = f'Number of gaps with > {large_gap} days'
x_min, x_max = 0, 9

analysis.plot_sne_has_something(
    something_s=recall_s, boot_has_something_ci=boot_recall_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Recall')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4)
plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_recall_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.4.3. Precision<a name="numberLargeGapPrecision"></a>

Compute the precision and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
precision_s, boot_precision_ci, number_in_bin_s = analysis.compute_precision_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_pred_type_list=is_pred_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 1]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 2]) < threshold])

analysis.plot_sne_has_something(
    something_s=precision_s, boot_has_something_ci=boot_precision_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Precision')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4)
# plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_precision_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.4.4. Density<a name="numberLargeGapDensity"></a>

In [None]:
cumulative = 0
show_sne = False

if cumulative != 0 :
    bins_hist = 10**4
    plt.ylabel('1 - CDF')
else:
    bins_hist = bins
    plt.ylabel('Density')
if show_sne:
    for j in np.arange(len(is_true_type_list)):
        sn_type = sn_order[j]
        sn_number = sn_name_to_number[sn_type]
        plt.hist(quantity[is_true_type_list[j] & extra_subset],
                 density=True, histtype='step', bins=bins_hist,
                 label=sn_type, color=sn_type_color[sn_number], 
                 cumulative=cumulative)
    plt.legend()
else:
    try:
        x_vals = quantity[extra_subset]
        print('Using a subset of the full test set.')
    except KeyError:
        x_vals = quantity
    plt.hist(x_vals,
             density=True, histtype='step', bins=bins_hist,
             cumulative=cumulative, linewidth=3)
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
#plt.title(title)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_density_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

[Go back to top.](#index)

### 6.5. Number of obs in [-10, 30] obs-frame performance<a name="nearPeak"></a>

#### 6.5.0. Calculate results<a name="nearPeakCalc"></a>

For the entire test set baseline and presto, it takes 15h09min to calculate `t_peak` and 4h29min to calculate the number of observations near the peak.

For the entire test set no roll, it takes 41h30min to calculate `t_peak` and 4h29min to calculate the number of observations near the peak.

In [None]:
batch_ids = ['001', '002', '003', '004', '005', '006', 
             '007', '008']
batch_ids = ['004', '005', '006', '007', '008'] 

for i, batch_id in enumerate(batch_ids):
    print(f'Batch {batch_id}')

    # Name and path of the test subset
    data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
    if is_only_roll:
        print('only roll')
        data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
    if is_updated:
        data_file_name = data_file_name[:-5] + '_updated.pckl'
    data_path = os.path.join(folder_path, data_file_name)
    print(0, data_path)
    analysis_name = data_file_name[:-5]
    folder_analysis_path = folder_path[:-14] + 'analyses'
    directories = get_directories(folder_analysis_path, analysis_name)
    print(1, directories['features_directory'])
    
    dataset = test_data_ids[i]
    
    ini_time = time.time()
    good_objs = []
    for obj in dataset.object_names:
        obj_data = dataset.data[obj]
        if np.sum(obj_data['detected']) > 0:
            good_objs.append(obj)
    time_taken = time.time() - ini_time
    print(time_taken)
    
    if len(good_objs) != len(dataset.object_names):
        print('trimming bad events')
        ini_time = time.time()
        dataset.update_dataset(good_objs)
        dataset.update_dataset(list(dataset.metadata.index))
        time_taken = time.time() - ini_time
        print(time_taken)

        ini_time = time.time()
        with open(data_path, 'wb') as f:
            pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
            
        metadata = metadata_test_ids[i]
        metadata = metadata.loc[good_objs]
        metadata_test_ids[i] = metadata

        if save_updated_metadata:
            path_saved_metadata = directories['features_directory']
            print(path_saved_metadata)

            metadata.to_pickle(os.path.join(path_saved_metadata, 'extended_metadata.pckl'))
        time_taken = time.time() - ini_time
        print(time_taken)
    else:
        print('good number')
        print(len(good_objs))
        print(len(dataset.object_names))
    
    print('')

In [None]:
# The above was done for 1-3; it misses for 4-8

In [None]:
if calculate_results:
    time_ini = time.time()  # ~50min per only roll batch id
#     batch_ids = ['000', '001', '002', '003', '004', '005', '006', 
#                  '007', '008', '009', '010', '011', '012']
    
#     batch_ids = ['001', '002', '003', '004', '005', '006', 
#                  '007', '008']
    
#     batch_ids = ['001', '002', '003']
#     batch_ids = ['004', '005', '006', '007', '008'] 

    t_peak_ids = []
    for i in np.arange(len(batch_ids)):
        batch_id = batch_ids[i]
        print(f'Batch {batch_id}')

        # Path to the saved metadata
        data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
        if is_only_roll:
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
        if is_updated:
            data_file_name = data_file_name[:-5] + '_updated.pckl'
        analysis_name = data_file_name[:-5]
        folder_analysis_path = folder_path[:-14] + 'analyses'
        directories = get_directories(folder_analysis_path, analysis_name)
        path_saved_gps = directories['intermediate_files_directory']

        # Compute LC length/duration
        t_peak_id = analysis.compute_t_peak(test_data_ids[i], path_saved_gps)

        # Save the results to the lists
        t_peak_ids.append(t_peak_id)
        metadata_test_ids[i]['t_peak'] = t_peak_id
        metadata = metadata_test_ids[i]

        if save_updated_metadata:
            path_saved_metadata = directories['features_directory']
            print(path_saved_metadata)

            metadata.to_pickle(os.path.join(path_saved_metadata, 'extended_metadata.pckl'))

    print(time.time() - time_ini)

In [None]:
1

In [None]:
if calculate_results: 
    time_ini = time.time()  # 4h29min for all 13 ; # 32min for 3 roll only ; 11min per roll batch id
#     batch_ids = ['000', '001', '002', '003', '004', '005', '006', 
#                  '007', '008', '009', '010', '011', '012']
    
#     batch_ids = ['001', '002', '003', '004', '005', '006', 
#                  '007', '008']
#     batch_ids = ['001', '002', '003'] 
#     batch_ids = ['004', '005', '006', '007', '008'] 

    number_obs_peak_ids = []
    for i in np.arange(len(batch_ids)):
        batch_id = batch_ids[i]
        print(f'Batch {batch_id}')

        # Compute LC length/duration
        t_peak_id = metadata_test_ids[i]['t_peak']
        number_obs_peak_id = analysis.compute_number_obs_peak(
            test_data_ids[i], t_peak_id)

        # Save the results to the lists
        number_obs_peak_ids.append(number_obs_peak_id)
        metadata = pd.concat([metadata_test_ids[i], number_obs_peak_id], axis=1)
        metadata_test_ids[i] = metadata

        if save_updated_metadata:
            # Path to the saved metadata
            data_file_name = f'test_{extra_name_to_save}_{batch_id}_gapless50.pckl'
            if is_only_roll:
                data_file_name = f'test_{extra_name_to_save}_{batch_id}_roll_gapless50.pckl'
            if is_updated:
                data_file_name = data_file_name[:-5] + '_updated.pckl'
            analysis_name = data_file_name[:-5]
            folder_analysis_path = folder_path[:-14] + 'analyses'
            directories = get_directories(folder_analysis_path, analysis_name) 
            path_saved_metadata = directories['features_directory']
            print(path_saved_metadata)

            metadata.to_pickle(os.path.join(path_saved_metadata, 'extended_metadata.pckl'))
    metadata_test_all = pd.concat(metadata_test_ids)
    print(time.time() - time_ini)

#### 6.5.1. Setup<a name="nearPeakSetup"></a>

In [None]:
quantity_name = 'number_near_peak'
quantity = metadata_test_all['prepeak_10'] + metadata_test_all['postpeak_30']

In [None]:
print(np.min(quantity))
print(np.max(quantity))

**Choose** bins for the plot and whether or not to plot the values in the middle of the bins. Additionally, to consider only a subset of events, **mask** those events in `extra_subset`. If `extra_subset = True`, all the events are used.

In [None]:
bins = np.linspace(0, 50, 51)
use_mid_bins = False  # plot using the middle of the bins
extra_subset = True  # use all events

#### 6.5.2. Recall<a name="nearPeakRecall"></a>

Compute the recall and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
recall_s, boot_recall_ci, number_in_bin_s = analysis.compute_recall_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_true_type_list=is_true_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 1]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 2]) < threshold])

x_label = 'Number of observations near SNe peak'
x_min, x_max = 0, 45
#x_min, x_max = 0, 22

analysis.plot_sne_has_something(
    something_s=recall_s, boot_has_something_ci=boot_recall_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Recall')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4)

plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_recall_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.5.3. Precision<a name="nearPeakPrecision"></a>

Compute the precision and bootstrapped confidence intervals. Then, plot the figure.

In [None]:
precision_s, boot_precision_ci, number_in_bin_s = analysis.compute_precision_values(
    quantity=quantity, bins=bins, is_pred_right=is_pred_right, 
    use_mid_bins=use_mid_bins, is_pred_type_list=is_pred_type_list)

In [None]:
if use_mid_bins:
    mid_bins = (bins[:-1]+bins[1:])/2
    show_bins = mid_bins
else:
    show_bins = bins
print(show_bins[np.array(number_in_bin_s[:, 0]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 1]) < threshold])
print(show_bins[np.array(number_in_bin_s[:, 2]) < threshold])

analysis.plot_sne_has_something(
    something_s=precision_s, boot_has_something_ci=boot_precision_ci,
    bins=show_bins, sn_order=sn_order, 
    **{'colors': sn_colors, 'number_in_bin_s': number_in_bin_s, 
       'threshold': threshold, 'linestyle': ['-', '--', '-.']})

plt.ylabel('Precision')
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
plt.ylim(-0.05, 1.05)
plt.legend(handletextpad=.4, borderaxespad=.3, handlelength=1.5,
           labelspacing=.2, borderpad=.3, columnspacing=.4)
plt.title(dataset_name)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_precision_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

#### 6.5.4. Density<a name="nearPeakDensity"></a>

In [None]:
cumulative = 0
show_sne = False

if cumulative != 0 :
    bins_hist = 10**4
    plt.ylabel('1 - CDF')
else:
    bins_hist = bins
    plt.ylabel('Density')
if show_sne:
    for j in np.arange(len(is_true_type_list)):
        sn_type = sn_order[j]
        sn_number = sn_name_to_number[sn_type]
        plt.hist(quantity[is_true_type_list[j] & extra_subset],
                 density=True, histtype='step', bins=bins_hist,
                 label=sn_type, color=sn_type_color[sn_number], 
                 cumulative=cumulative)
    plt.legend()
else:
    try:
        x_vals = quantity[extra_subset]
        print('Using a subset of the full test set.')
    except KeyError:
        x_vals = quantity
    plt.hist(x_vals,
             density=True, histtype='step', bins=bins_hist,
             cumulative=cumulative, linewidth=3)
plt.xlabel(x_label)
plt.xlim(x_min, x_max) 
#plt.title(title)
# plt.savefig(os.path.join(path_saved_plots, 
#                         f'{quantity_name}_density_{os_name[:-11]}.pdf'), 
#            bbox_inches='tight')

[Go back to top.](#index)