#  Creating the Correlation vs GPS Time Dataframe

**Paolo Marcoccia<sup>1</sup>, Felicia Frederiksson<sup>2</sup>, Alex B. Nielsen<sup>1</sup> and Germano Nardini<sup>1</sup>**

<sub>1. University of Stavanger, Institutt for Matematikk og Fysikk, Kjølv Egelands hus, 5.etg, E-blokk, 4021 Stavanger, Norway </sub> <br>
<sub>2. University of Uppsala, Department of Physics and Astronomy,Ångströmlaboratoriet, Lägerhyddsvägen 1, 751 20 Uppsala, Sweden</sub> 

We encourage use of these data in derivative works. If you use the material provided here, please cite [our paper.]()

In this notebook, we will learn how to generate the [CorrVsTime.csv](https://github.com/GravWaves-IMF/Correlation-Method-first-2019-/blob/master/Code/GW151012Final/GW151012data.py) dataframe containing the Correlation vs GPS Time for the analyzed events.
The pipeline, in particular, will be run for the event _GW151012_ but it may easily be applied to all the other events.
In order to run this pipeline, however, you need the _residuals.hdf_ file generated by running the [CreateResiduals.ipynb](https://github.com/gwastro/gw150914_investigation/blob/master/CreateResiduals.ipynb) notebook.
Let's start my moving into the directory of the event we wish to analyze :

In [1]:
cd GW170104/


/home/kuza91/Documents/IPyNB/GWO1/GW170104


In each event directory, there will be a [init_module.py](https://github.com/GravWaves-IMF/Correlation-Method-first-2019-/blob/master/Code/init_module.py) and a [GW*event_name*.py](https://github.com/GravWaves-IMF/Correlation-Method-first-2019-/blob/master/Code/GW151012Final/GW151012data.py), the first one once launched will automatically load all the modules needed to run the pipeline, the latter instead will set some local variables around the choosed event.
Let's launch them both :

In [2]:
%run init_module.py
#%run GW150914data.py
#%run GW151012data.py
#%run GW151226data.py
%run GW170104data.py

segment length: 16.0


Let's also inizialize some void dictionary :


In [3]:
data, psd_data, ts, ts3, waveforms, psds = {},{},{},{},{},{}

We will analyze the correlations between the two detectors in a _200 ms_ time interval centered around the _LIGO_ claimed coalescence time.
Furthermore, we will estimate the correlations each _0.1 ms_, hence we will estimate _2000_ correlations in total during the following pipeline.
Let's generate some auxiliary vectors in order to do so :

In [4]:
rangenum = range(2000)
timespan = np.linspace(0.,0.2,2000)
void = np.zeros(2000)

Now, we need to load both the Strain Data of the detectors at the time of event and the Residual data obtained by running the [CreateResiduals.ipynb](https://github.com/gwastro/gw150914_investigation/blob/master/CreateResiduals.ipynb) notebook.
The Strain Data of the events may be downloaded at the [GWOSC](https://www.gw-openscience.org/catalog/GWTC-1-confident/) website, however depending on how old your strain data is, it may be saved with both the header name _LOSC_ or _GWOSC_.
In our case, the data of _GW150914_ was saved with the _LOSC_ header name, while the data of the other events was saved with the _GWOSC_ header, for avoiding failing in the loading of the strain data, one should check carefully and change the <em>hd_nm</em> variable in the [GW*event_name*.py](https://github.com/GravWaves-IMF/Correlation-Method-first-2019-/blob/master/Code/GW151012Final/GW151012data.py) script according to the data.
Furthermore, in the [res.py](https://github.com/GravWaves-IMF/Correlation-Method-first-2019-/blob/master/Code/res.py) file, there would be a <em>res.get_LSC_Full_strain()</em> that was built to load _LOSC_ header file, and a <em>res.get_GWSC_Full_strain()</em> that was built for _GWOSC_ header file.
In our case, for _GW151012_ will be :

In [5]:
# Use the first one for GW150914

#strain = res.get_LSC_Full_strain(hd_nm,psd_start_time-pad_data, psd_end_time + pad_data)

# Use the second one for the others event

strain = res.get_GWSC_Full_strain(hd_nm,psd_start_time-pad_data, psd_end_time + pad_data)

resstrain = res.get_Full_residual_strain()

Let's also inizialize an auxiliary dataframe, that will be used to save the correlation data :

In [6]:
df = pd.DataFrame({'GPSTime' : basetime + timespan, 
     'Timeshift' : timespan,
     'SigCorr' : void,
     'ResCorr' : void,
     'DiffCorr' : void})

Now let's cut down the data around the event, and let's whiten that in the frequency band defined by <em>f_low</em>, <em>f_high</em>, the values of the two previous variable are stated for each event in the [GW*event_name*.py](https://github.com/GravWaves-IMF/Correlation-Method-first-2019-/blob/master/Code/GW151012Final/GW151012data.py) :  

In [7]:
for ifo in ifos :
                             ts[ifo] = strain[ifo].time_slice((tc - 75.), (tc + 75.))
                             ts3[ifo] = resstrain[ifo].time_slice((tc - 75.), (tc + 75.))

ts = res.whiten(ts, f_low, f_high)
ts3 = res.whiten(ts3, f_low, f_high)

We may finally define the function that will estimate the correlation among the detectors in function of GPS time, given a certain time shift among the two detectors, we'll define that in a way that may be parallelized by using the _multiprocessing_ module :


In [8]:
def CorrAnal(i) :
      
              if(i%100 == 0 or i == 1999) :
                print ("Percentage of completition : {}".format((df.Timeshift[i]*100)/0.2))                
              tau, corr = res.cross_correlation(ts['H1'], ts['L1'], (basetime + df.Timeshift[i])) 
              tau, corr_null = res.cross_correlation(ts3['H1'], ts3['L1'], (basetime + df.Timeshift[i]))
              df.SigCorr[i], df.ResCorr[i] = res.corr_wsgn_near_ml(corr,timedl,tdlerr), res.corr_wsgn_near_ml(corr_null,timedl,tdlerr)
              df.DiffCorr[i] = abs(df.SigCorr[i] - df.ResCorr[i])
              if(abs(df.ResCorr[i]) > abs(df.SigCorr[i])) : df.DiffCorr[i] = - df.DiffCorr[i]
              return [df.Timeshift[i], df.SigCorr[i], df.ResCorr[i], df.DiffCorr[i]]

The previous function, will be now run over <em>6</em> cores, the reason that lead us to choose <em>6</em> cores is that this pipeline was run in a laptop having <em>4</em> physical cores virtualized to <em>8</em>.
The number of cores used for the parallelization may still be decided in function of the machine used to run that, note though that a number too high in cores may result in slowing down due to processes queue :

In [9]:
if __name__ == '__main__':
         results = {}
         p = Pool(6)
         results = p.map(CorrAnal,rangenum)

Percentage of completition : 0.0
Percentage of completition : 5.00250125063
Percentage of completition : 10.0050025013
Percentage of completition : 15.0075037519
Percentage of completition : 20.0100050025
Percentage of completition : 25.0125062531
Percentage of completition : 30.0150075038
Percentage of completition : 35.0175087544
Percentage of completition : 40.020010005
Percentage of completition : 45.0225112556
Percentage of completition : 50.0250125063
Percentage of completition : 55.0275137569
Percentage of completition : 60.0300150075
Percentage of completition : 65.0325162581
Percentage of completition : 70.0350175088
Percentage of completition : 75.0375187594
Percentage of completition : 80.04002001
Percentage of completition : 85.0425212606
Percentage of completition : 90.0450225113
Percentage of completition : 95.0475237619
Percentage of completition : 100.0


As we're using the _MPI_ module, the execution of the last command wouldn't be linear, that's because when using _mpi_ the steps on which the instructions need to be run, passed to the function through the _rangenum_ array, will be sliced and given to the various core.
Hence, we now need to reorder the created Dataframe in order to save that : 

In [10]:
for i in rangenum:
         for j in rangenum:
             if(df.Timeshift[i] == results[j][0]):
                 df.SigCorr[i] = results[j][1]
                 df.ResCorr[i] = results[j][2]
                 df.DiffCorr[i] = results[j][3]
                 break

Lastly, we just need to save the created dataframe in a _.csv_ file :

In [11]:
df.to_csv('CorrVsTime.csv',index = False)

Your [CorrVsTime.csv](https://github.com/GravWaves-IMF/Correlation-Method-first-2019-/blob/master/Code/GW151012Final/GW151012data.py) dataframe is finally ready to be analyzed ! 