# Disposition

Disposition is the attempt to categorise KOI's into CANDIDATE or FALSE POSITIVE (with a score). 

This section also contains a list of boolean and string status flags (which could be one-hot-encoded = 126 fields)

In [147]:
# Python imports and settings
import numpy  as np
import pandas as pd
import seaborn as sns
import scipy.stats
import re
from pydash import py_ as _
from sklearn.preprocessing import OneHotEncoder

from src.dataset_koi import koi, koi_columns, koi_column_types
from src.utilities import onehot_encode_comments

# https://stackoverflow.com/questions/11707586/how-do-i-expand-the-output-display-to-see-more-columns-of-a-pandas-dataframe
pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', -1)
pd.set_option('display.max_rows', 8)  # 8 is required for .describe()

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Load Dataset

In [267]:
dataset            = pd.DataFrame.join( koi['archive'], koi['disposition'] ); 
onehot_disposition = pd.get_dummies(dataset[['koi_disposition']])
onehot_comments    = onehot_encode_comments(dataset, 'koi_comment', '---', without=['NO_COMMENT'])


onehot_dataset = (
    pd.concat([ 
        koi['archive'], 
        onehot_disposition,
        koi['disposition'],
        onehot_comments,
    ], axis=1)
    .drop(columns=['koi_comment'])
)
onehot_dataset

Unnamed: 0_level_0,kepler_name,koi_disposition,koi_disposition_CANDIDATE,koi_disposition_CONFIRMED,koi_disposition_FALSE POSITIVE,koi_pdisposition,koi_score,koi_fpflag_nt,koi_fpflag_ss,koi_fpflag_co,koi_fpflag_ec,koi_comment_ALL_TRANS_CHASES,koi_comment_ALT_ROBO_ODD_EVEN_TEST_FAIL,koi_comment_ALT_SEC_SAME_DEPTH_AS_PRI_COULD_BE_TWICE_TRUE_PERIOD,koi_comment_ALT_SIG_PRI_MINUS_SIG_POS_TOO_LOW,koi_comment_ALT_SIG_PRI_MINUS_SIG_TER_TOO_LOW,koi_comment_ALT_SIG_PRI_OVER_FRED_TOO_LOW,koi_comment_CENTROID_SIGNIF_UNCERTAIN,koi_comment_CENT_CROWDED,koi_comment_CENT_FEW_DIFFS,koi_comment_CENT_FEW_MEAS,koi_comment_CENT_KIC_POS,koi_comment_CENT_NOFITS,koi_comment_CENT_RESOLVED_OFFSET,koi_comment_CENT_SATURATED,koi_comment_CENT_UNCERTAIN,koi_comment_CENT_UNRESOLVED_OFFSET,koi_comment_CLEAR_APO,koi_comment_CROWDED_DIFF,koi_comment_DEEP_V_SHAPED,koi_comment_DEPTH_ODDEVEN_ALT,koi_comment_DEPTH_ODDEVEN_DV,koi_comment_DV_SIG_PRI_MINUS_SIG_POS_TOO_LOW,koi_comment_DV_SIG_PRI_OVER_FRED_TOO_LOW,koi_comment_EPHEM_MATCH,koi_comment_EYEBALL,koi_comment_FIT_FAILED,koi_comment_HALO_GHOST,koi_comment_HAS_SEC_TCE,koi_comment_INCONSISTENT_TRANS,koi_comment_INDIV_TRANS_CHASES,koi_comment_INDIV_TRANS_CHASES_MARSHALL,koi_comment_INDIV_TRANS_CHASES_MARSHALL_SKYE,koi_comment_INDIV_TRANS_CHASES_MARSHALL_ZUMA,koi_comment_INDIV_TRANS_CHASES_SKYE,koi_comment_INDIV_TRANS_MARSHALL,koi_comment_INDIV_TRANS_MARSHALL_SKYE,koi_comment_INDIV_TRANS_MARSHALL_ZUMA,koi_comment_INDIV_TRANS_RUBBLE,koi_comment_INDIV_TRANS_RUBBLE_MARSHALL_SKYE,koi_comment_INDIV_TRANS_RUBBLE_SKYE,koi_comment_INDIV_TRANS_RUBBLE_SKYE_ZUMA,koi_comment_INDIV_TRANS_RUBBLE_SKYE_ZUMA_TRACKER,koi_comment_INDIV_TRANS_SKYE,koi_comment_INDIV_TRANS_SKYE_ZUMA,koi_comment_INDIV_TRANS_SKYE_ZUMA_TRACKER,koi_comment_INDIV_TRANS_ZUMA,koi_comment_INVERT_DIFF,koi_comment_IS_SEC_TCE,koi_comment_KIC_OFFSET,koi_comment_LPP_ALT,koi_comment_LPP_ALT_TOO_HIGH,koi_comment_LPP_DV,koi_comment_LPP_DV_TOO_HIGH,koi_comment_MARSHALL_FAIL,koi_comment_MOD_NONUNIQ_ALT,koi_comment_MOD_NONUNIQ_DV,koi_comment_MOD_ODDEVEN_ALT,koi_comment_MOD_ODDEVEN_DV,koi_comment_MOD_POS_ALT,koi_comment_MOD_POS_DV,koi_comment_MOD_SEC_ALT,koi_comment_MOD_SEC_DV,koi_comment_MOD_TER_ALT,koi_comment_MOD_TER_DV,koi_comment_OTHER_TCE_AT_SAME_PERIOD_DIFF_EPOCH,koi_comment_PARENT_IS_002305372-pri,koi_comment_PARENT_IS_002449084-pri,koi_comment_PARENT_IS_003352751-pri,koi_comment_PARENT_IS_003858884-01,koi_comment_PARENT_IS_004482641-01,koi_comment_PARENT_IS_005024292-01,koi_comment_PARENT_IS_005036538-01,koi_comment_PARENT_IS_005343976-pri,koi_comment_PARENT_IS_005471619-pri,koi_comment_PARENT_IS_005513861-pri,koi_comment_PARENT_IS_006367628-pri,koi_comment_PARENT_IS_007258889-pri,koi_comment_PARENT_IS_007598128-pri,koi_comment_PARENT_IS_008265951-pri,koi_comment_PARENT_IS_008380743-pri,koi_comment_PARENT_IS_009541127-pri,koi_comment_PARENT_IS_009777062-01,koi_comment_PARENT_IS_010485137-pri,koi_comment_PARENT_IS_010858720-pri,koi_comment_PARENT_IS_012004679-pri,koi_comment_PARENT_IS_3597.01,koi_comment_PARENT_IS_3895.01,koi_comment_PARENT_IS_4673.01,koi_comment_PARENT_IS_489.01,koi_comment_PARENT_IS_5335.01,koi_comment_PARENT_IS_970.01,koi_comment_PARENT_IS_FL-Lyr-pri,koi_comment_PARENT_IS_RR-Lyr-pri,koi_comment_PARENT_IS_UZ-Lyr-pri,koi_comment_PARENT_IS_V2277-Cyg-pri,koi_comment_PARENT_IS_V380-Cyg-pri,koi_comment_PARENT_IS_V380-Cyg-sec,koi_comment_PARENT_IS_V850-Cyg-pri,koi_comment_PERIOD_ALIAS_ALT,koi_comment_PERIOD_ALIAS_DV,koi_comment_PERIOD_ALIAS_IN_ALT_DATA_SEEN_AT_3:1,koi_comment_PERIOD_ALIAS_IN_DV_DATA_SEEN_AT_3:1,koi_comment_PLANET_IN_STAR,koi_comment_PLANET_OCCULT_ALT,koi_comment_PLANET_OCCULT_DV,koi_comment_PLANET_PERIOD_IS_HALF_ALT,koi_comment_PLANET_PERIOD_IS_HALF_DV,koi_comment_RESIDUAL_TCE,koi_comment_RESID_OF_PREV_TCE,koi_comment_SAME_NTL_PERIOD,koi_comment_SAME_P_AS_PREV_NTL_TCE,koi_comment_SATURATED,koi_comment_SEASONAL_DEPTH_ALT,koi_comment_SEASONAL_DEPTH_DIFFS_IN_ALT,koi_comment_SEASONAL_DEPTH_DV,koi_comment_SIGNIF_OFFSET,koi_comment_SIG_SEC_IN_ALT_MODEL_SHIFT,koi_comment_SIG_SEC_IN_DV_MODEL_SHIFT,koi_comment_SWEET_EB,koi_comment_SWEET_NTL,koi_comment_TOO_FEW_CENTROIDS,koi_comment_TOO_FEW_QUARTERS,koi_comment_TRANSITS_NOT_CONSISTENT,koi_comment_TRANS_GAPPED
kepoi_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1
K00752.01,Kepler-227 b,CONFIRMED,0,1,0,CANDIDATE,1.000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
K00752.02,Kepler-227 c,CONFIRMED,0,1,0,CANDIDATE,0.969,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
K00753.01,,CANDIDATE,1,0,0,CANDIDATE,0.000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
K00754.01,,FALSE POSITIVE,0,0,1,FALSE POSITIVE,0.000,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
K07986.01,,CANDIDATE,1,0,0,CANDIDATE,0.497,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
K07987.01,,FALSE POSITIVE,0,0,1,FALSE POSITIVE,0.021,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
K07988.01,,CANDIDATE,1,0,0,CANDIDATE,0.092,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
K07989.01,,FALSE POSITIVE,0,0,1,FALSE POSITIVE,0.000,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Correlation between disposition and fpflags + comments 

In [122]:
print("Frequency - fpflags by koi_disposition")
display(
    onehot_dataset.groupby('koi_disposition', sort=False)
        .apply(lambda df: 
            df.filter(regex='fpflag')
                .sum(axis=1)
                .agg([min, np.mean, max, np.sum])
        )
)
print("Frequency - comment flags by koi_disposition")
display(
    onehot_dataset
        .groupby('koi_disposition', sort=False)
        .apply(lambda df: 
            df.filter(regex='comment_')
                .sum(axis=1)
                .agg([min, np.mean, max, np.sum ])
        )
)

Frequency - fpflags by koi_disposition


Unnamed: 0_level_0,min,mean,max,sum
koi_disposition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CONFIRMED,0.0,0.006947,1.0,16.0
CANDIDATE,0.0,0.00124,1.0,3.0
FALSE POSITIVE,0.0,1.399504,4.0,6775.0


Frequency - comment flags by koi_disposition


Unnamed: 0_level_0,min,mean,max,sum
koi_disposition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CONFIRMED,0.0,0.19366,6.0,446.0
CANDIDATE,0.0,0.397521,7.0,962.0
FALSE POSITIVE,0.0,2.596158,11.0,12568.0


Observation: in the vast majority of cases the presence of fpflags or comments is used to indicate the reason for a FALSE POSITIVE

- fpflags - only in 19 (0.2%) cases, does a single fpflag refer to a CONFIRMED or CANDIDATE exoplanets

- comments - only 1408 (10%) of comments are used in labelling CONFIRMED or CANDIDATE exoplanets

In [491]:
koi_disposition_columns = ['koi_disposition_CONFIRMED','koi_disposition_CANDIDATE','koi_disposition_FALSE POSITIVE']
    
def onehot_flag_corr( regex='koi_disposition_|koi_score|koi_fpflag_|koi_comment_' ) -> pd.DataFrame:
    onehot_flag_corr = (
        onehot_dataset
            .filter(regex=regex).astype('int32')
            .corr()
            .filter(regex='koi_disposition_')
            .apply(lambda x: round(x, 2))
            .drop( onehot_dataset.filter(regex='koi_disposition_', axis=1) )
    )

    # Sort by maximum absolute correlation, and reorder columns in conceptual order
    onehot_flag_corr = onehot_flag_corr.reindex(
        columns=koi_disposition_columns,    
        index=onehot_flag_corr
            .abs()
            .max(axis=1)
            .sort_values(ascending=False)
            .index
    )
    return onehot_flag_corr
    
def display_corr( corr: pd.DataFrame ) -> pd.io.formats.style.Styler:
    display( corr.style.background_gradient(cmap='coolwarm', high=0.1, low=-0.1) )

In [492]:
display_corr( onehot_flag_corr('koi_disposition_|koi_fpflag_|koi_score') )

Unnamed: 0,koi_disposition_CONFIRMED,koi_disposition_CANDIDATE,koi_disposition_FALSE POSITIVE
koi_fpflag_ss,-0.3,-0.32,0.54
koi_score,0.53,0.01,-0.46
koi_fpflag_co,-0.28,-0.29,0.49
koi_fpflag_nt,-0.24,-0.25,0.43
koi_fpflag_ec,-0.21,-0.21,0.36


Observations:

- koi_score: is [defined](https://exoplanetarchive.ipac.caltech.edu/docs/API_kepcandidate_columns.html) by the fraction of Monte Carlo iterations where the Robovetter yields a disposition of CANDIDATE. This almost by definition it is strongly-correlated with CONFIRMED and strongly-anti-correlated with FALSE POSITIVE. CANDIDATE is almost completely uncorrleated with koi_score.

- koi_fpflags: each of the flags strongly-correlates with FALSE POSITIVE, and shares a half-strength anti-correlation with both CONFIRMED and CANDIDATE. This matches the previous observation that the vast majority (99.8%) of fpflags indicated a FALSE POSITIVE.

## Correlation Matrices

In [497]:
for column in koi_disposition_columns:
    limits = {
        "koi_disposition_CONFIRMED":      (0.02, -0.22),
        "koi_disposition_CANDIDATE":      (0.03, -0.22),
        "koi_disposition_FALSE POSITIVE": (0.10, -0.02)        
    }
    print(f"Top correlations: {column}: >= {limits[column][0]} | {limits[column][1]} >=")
    display_corr( 
        onehot_flag_corr('koi_disposition_|koi_comment_')
            .where(lambda row: (row[column] >= limits[column][0] ) | (row[column] <= limits[column][1] ) ).dropna()
            .sort_values(by=column, ascending=False)
    )
    
print(f"Top correlations differences between CONFIRMED and CANDIDATE: >= 0.05")    
display_corr( 
    onehot_flag_corr('koi_disposition_|koi_comment_')
        .where(lambda row: (row['koi_disposition_CONFIRMED'] - row['koi_disposition_CANDIDATE'] ).abs() >= 0.05 ).dropna()
        .sort_values(by=column, ascending=False)
)    

Top correlations: koi_disposition_CONFIRMED: >= 0.02 | -0.22 >=


Unnamed: 0,koi_disposition_CONFIRMED,koi_disposition_CANDIDATE,koi_disposition_FALSE POSITIVE
koi_comment_CENT_KIC_POS,0.05,0.04,-0.07
koi_comment_PLANET_OCCULT_ALT,0.02,-0.01,-0.0
koi_comment_INDIV_TRANS_MARSHALL_ZUMA,0.02,-0.01,-0.01
koi_comment_MOD_SEC_ALT,-0.22,-0.23,0.39


Top correlations: koi_disposition_CANDIDATE: >= 0.03 | -0.22 >=


Unnamed: 0,koi_disposition_CONFIRMED,koi_disposition_CANDIDATE,koi_disposition_FALSE POSITIVE
koi_comment_CENT_FEW_MEAS,-0.04,0.08,-0.04
koi_comment_CENT_KIC_POS,0.05,0.04,-0.07
koi_comment_TOO_FEW_QUARTERS,-0.05,0.04,0.01
koi_comment_INDIV_TRANS_CHASES_SKYE,-0.01,0.03,-0.02
koi_comment_TOO_FEW_CENTROIDS,-0.02,0.03,-0.01
koi_comment_MOD_SEC_ALT,-0.22,-0.23,0.39


Top correlations: koi_disposition_FALSE POSITIVE: >= 0.1 | -0.02 >=


Unnamed: 0,koi_disposition_CONFIRMED,koi_disposition_CANDIDATE,koi_disposition_FALSE POSITIVE
koi_comment_MOD_SEC_ALT,-0.22,-0.23,0.39
koi_comment_EPHEM_MATCH,-0.2,-0.21,0.35
koi_comment_MOD_SEC_DV,-0.2,-0.2,0.34
koi_comment_HAS_SEC_TCE,-0.2,-0.2,0.34
koi_comment_CENT_RESOLVED_OFFSET,-0.2,-0.19,0.33
koi_comment_HALO_GHOST,-0.18,-0.18,0.31
koi_comment_DEEP_V_SHAPED,-0.18,-0.15,0.29
koi_comment_MOD_ODDEVEN_ALT,-0.13,-0.13,0.22
koi_comment_LPP_DV,-0.12,-0.12,0.21
koi_comment_MOD_ODDEVEN_DV,-0.12,-0.12,0.2


Top correlations differences between CONFIRMED and CANDIDATE: >= 0.05


Unnamed: 0,koi_disposition_CONFIRMED,koi_disposition_CANDIDATE,koi_disposition_FALSE POSITIVE
koi_comment_CENT_FEW_DIFFS,-0.12,0.02,0.08
koi_comment_EYEBALL,-0.06,0.02,0.03
koi_comment_TOO_FEW_QUARTERS,-0.05,0.04,0.01
koi_comment_TOO_FEW_CENTROIDS,-0.02,0.03,-0.01
koi_comment_CENT_FEW_MEAS,-0.04,0.08,-0.04


Results:
- A few comment flags positively correlate (weakly) with CONFIRMED
    - CENT_KIC_POS
    - PLANET_OCCULT_ALT
    - INDIV_TRANS_MARSHALL_ZUMA
- More comment flags positively correlate (weakly) with CANDIDATE
    - CENT_KIC_POS
    - CENT_FEW_MEAS
    - TOO_FEW_QUARTERS
    - INDIV_TRANS_CHASES_SKYE
    - TOO_FEW_CENTROIDS
- The comment flags that are most different between CONFIRMED and CANDIDATE are:
    - CENT_FEW_DIFFS
    - EYEBALL
    - TOO_FEW_QUARTERS
    - TOO_FEW_CENTROIDS
    - CENT_FEW_MEAS    
- The highest correlating fields with FALSE POSTIVE are:
    - MOD_SEC_ALT
    - EPHEM_MATCH
    - MOD_SEC_DV
    - HAS_SEC_TCE
    - CENT_RESOLVED_OFFSET
    - HALO_GHOST
    - DEEP_V_SHAPED
    
Observations:
- The vast majority of comment flags are correlated with FALSE POSITIVE and half anti-correlated with CONFIRMED or CANDIDATE
- As observed above, 90% of comments are used to label FALSE POSITIVE
- The fields that most correlate with CANDIDATE, are also those which have the greatest difference in correlation with CONFIRMED
    - These all seem to flag having too little information about a KOI
- According the [documention](https://exoplanetarchive.ipac.caltech.edu/docs/API_kepcandidate_columns.html), FALSE POSTIVE can occur when:
    - 1) the KOI is in reality an eclipsing binary star
    - 2) the Kepler light curve is contaminated by a background eclipsing binary
    - 3) stellar variability is confused for coherent planetary transits
    - 4) instrumental artifacts are confused for coherent planetary transits
- The FALSE POSTIVE flags seem to be related to tests for these conditions
- The CONFIRMED flags seem to be related to tests for ruling out these these conditions
