# Sibling reidentifcation

This notebook can be used for replicating the chemical tagging figure in the paper.

The notebook contains two parts. A first section introduces a script for calculating how many stars are more chemically similar than their sibling. A second script shows how to use this information to replicate the figure used in the paper.

## Estimating "stellar doppelgangers"

The number of chemical doppelgangers can be estimated by running the script ```/tagging/scripts/xx``` for the factorDis and faderDis methods and by running  to get the distances for an existing neural network model or run the script ```/tagging/scripts/xx``` for the polyDis method.

In [None]:
! python ../scripts/calculate_neural_latent_distances.py --help

In [None]:
! python ../scripts/estimate_neural_doppelgangers.py --data_file /share/splinter/ddm/taggingProject/taggingClean/data/final/train/spectra_noiseless.pd --model_file /share/splinter/ddm/taggingProject/taggingRepo/outputs/results_fader/run1/adN7214I800 --n_conditioned 2 --savepath "doppelgangers"

In [None]:
! python ../scripts/estimate_neural_doppelgangers.py --data_file /share/splinter/ddm/taggingProject/taggingClean/data/final/train/spectra_noiseless.pd --model_file conditional_parallel_decoder1.p --savepath "doppelgangers" --n_conditioned 2 --n_bins 1000 


In [None]:
! python ../scripts/estimate_neural_doppelgangers.py --data_file /share/splinter/ddm/taggingProject/taggingClean/data/final/train/spectra_noiseless.pd --model_file conditional_parallel_decoder1.p --savepath "doppelgangers" --n_conditioned 2 --n_bins 1000 


In [None]:
! python ../scripts/estimate_neural_doppelgangers.py --data_file /share/splinter/ddm/taggingProject/taggingClean/data/final/train/spectra_noiseless.pd --model_file conditional_parallel_decoder1.p --savepath "doppelgangers" --n_conditioned 2 --n_bins 1000 


In [None]:
! python ../scripts/estimate_neural_doppelgangers.py --data_file /share/splinter/ddm/taggingProject/taggingClean/data/final/train/spectra_noiseless.pd --model_file conditional_parallel_decoder1.p --savepath "doppelgangers" --n_conditioned 2 --n_bins 1000 


In [None]:
! python ../scripts/estimate_neural_doppelgangers.py --data_file /share/splinter/ddm/taggingProject/taggingClean/data/final/train/spectra_noiseless.pd --model_file conditional_parallel_decoder1.p --savepath "doppelgangers" --n_conditioned 2 --n_bins 1000 


In [None]:
! python ../scripts/estimate_polynomial_doppelgangers.py --data_file /share/splinter/ddm/taggingProject/taggingClean/data/final/train/spectra_noiseless.pd --n_pca 50 --n_degree 4 --savepath "doppelgangers_poly"  --n_bins 1000

## Plotting the doppelgangers

We show below a bit of code for plotting these doppelgangers as was done in the papre. By default, this uses precalculated outputs found in ```outputs/distances/``` feel free to replace with those you calculated yourself in the first part of this notebook.

In [None]:
import numpy as np
import pickle
import matplotlib.pyplot as plt
import matplotlib


def load_ranking(folder,SNR):
    with open('../../outputs/rankings/{}/rankings{}.p'.format(folder,SNR), 'rb') as handle: 
        ranking = pickle.load(handle)
    ranking = [(i-1) for i in ranking]
    return ranking


folders = ["factorDis","polyDis","factorDiswZ","faderDis","faderDiswZ"]
SNRs = [0,100,50,30] #we use 0 for infinity for readability 
rankings = {}
for folder in folders:
    rankings[folder]={}
    for SNR in SNRs:
        rankings[folder][str(SNR)] = load_ranking(folder,SNR)



bins = np.logspace(np.log10(10),np.log10(50000))
bins = np.concatenate((np.arange(0,10),bins))


fig, axes = plt.subplots(2,3,sharex=True, sharey=True,gridspec_kw={'hspace': 0, 'wspace': 0})


def plot_axis(ax,rankings,SNRs,bins,text):
    for i in range(len(SNRs)):
        SNR = SNRs[i]
        if SNR == 0:
            label = r"no noise"
        else:
            label =r"SNR={}".format(SNR)
        n, bins, patches = ax.hist(rankings[str(SNR)], bins=bins, density=True, histtype='step',cumulative=True, label=label)
    ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
    ax.set(xscale="log")
    ax.text(0.7, 0.08, text, transform=ax.transAxes, size=16)
    ax.minorticks_on()
    ax.grid(which="both")
    ax.grid(which='minor', alpha=0.2)
    ax.grid(which='major', alpha=0.5)
    plt.ylim(0,1)
    return ax

axes[0,0] = plot_axis(axes[0,0],rankings["faderDis"],SNRs,bins,"a")
yticks = axes[0,0].yaxis.get_major_ticks() 
yticks[0].label1.set_visible(False)
axes[1,0] = plot_axis(axes[1,0],rankings["faderDiswZ"],SNRs,bins,"c")
axes[0,1] = plot_axis(axes[0,1],rankings["factorDis"],SNRs,bins,"b")
axes[1,1] = plot_axis(axes[1,1],rankings["factorDiswZ"],SNRs,bins,"d")
axes[1,2] = plot_axis(axes[1,2],rankings["polyDis"],SNRs,bins,"e")
#ax3.legend(loc=(0.4,0.4))
#axes[1,2].legend(loc="upper right",bbox_to_anchor=(0., 1.2))
#axes[1,2].legend(loc=((0.18,1.3)),borderpad=3)
axes[1,2].legend(loc=((-0.1,1.1)),borderpad=3,frameon=False)


fig.text(0.5,0.04, "$N_{doppelganger}$", ha="center", va="center")
#fig.text(0.03, 0.5, 'p', va='center', rotation='vertical',fontsize=16)

fig.text(0.035, 0.3, 'with [Fe/H]', va='center', rotation='vertical',fontsize=11)
fig.text(0.05, 0.5, 'p', va='center', rotation='vertical',fontsize=14)
fig.text(0.035, 0.7, 'without [Fe/H]', va='center', rotation='vertical',fontsize=11)
#fig.text(0.035, 0.7, 'p', va='center', rotation='vertical',fontsize=14)
fig.text(0.185,0.9,"FaderDis")
fig.text(0.44,0.9,"FactorDis")
fig.text(0.72,0.9,"PolyDis")

