# DC1 PIT

This short notebook aims to show the difference in PIT values computed now and in the past (for DC1 paper). 

Three sets of PIT values are comapared: 

1. PITs computed now using qp CDF method 
2. PITs computed now using scipy.integrate.quad 
3. PITs computed in the past at the time of writing DC1 paper (reference sample).

A subset of the data (1%) is used for the purpose of illustration. Computing the PITs using scipy.integrate.quad would take many hours for the ~399k galaxies present in DC1 sample.  

In [None]:
from sample import Sample
import matplotlib.pyplot as plt
import numpy as np
import os
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [None]:
my_path = "/Users/julia/TESTDC1FLEXZ"
pdfs_file =  os.path.join(my_path, "1pct_Mar5Flexzgold_pz.out")
ztrue_file =  os.path.join(my_path, "1pct_Mar5Flexzgold_idszmag.out")
oldpitfile = os.path.join(my_path,"1pct_TESTPITVALS.out") # TESTPITVALS.out masked by object ids in 1pct files 

In [None]:
%%time
sample_qp_pit = Sample(pdfs_file, ztrue_file, code="FlexZBoost", name="DC1 paper data", qp_pit=True)
print(sample_qp_pit)

In [None]:
%%time
pit_qp = sample_qp_pit.pit

In [None]:
%%time
sample_scipy_pit = Sample(pdfs_file, ztrue_file, code="FlexZBoost", name="DC1 paper data", qp_pit=False)
print(sample_scipy_pit)

In [None]:
%%time
pit_scipy = sample_scipy_pit.pit

In [None]:
pit_dc1 = np.loadtxt(oldpitfile, skiprows=1,usecols=(1))

In [None]:
plt.figure(figsize=[7,5])
plt.subplot(211)
plt.plot(pit_dc1, pit_qp - pit_dc1, 'k,')
plt.plot([0,1], [0,0], 'r--', lw=3)
plt.xlabel("PIT DC1")
plt.ylabel("PIT$_{qp}$ - PIT DC1")
plt.xlim(0,1)
plt.ylim(-0.01, 0.01)
plt.subplot(212)
plt.plot(pit_dc1, pit_scipy - pit_dc1, 'k,')
plt.plot([0,1], [0,0], 'r--', lw=3)
plt.xlabel("PIT DC1")
plt.ylabel("PIT$_{scipy}$ - PIT DC1")
plt.xlim(0,1)
plt.ylim(-0.01, 0.01)
plt.tight_layout()

**Conclusion:** using `qp` causes small discrepancies in the PIT values compared to the reference results (DC1), which would be negligible if using the `scipy` integration method. However, the latter is not as efficient as qp, and can make the computation unfeasible if the dataset is larger. 