# Running WDF worker on segments

The purpose of this notebook is to show how to use the WDF class `wdfUnitDSWorker.py` to produce a trigger list for a .gwf dataset. The `wdfUnitDSWorker.py` class is initialized through its `__init__` method with a Parameters object that configures the processing workflow, including parameters for downsampling, whitening, wavelet analysis and file paths for input and output, along with a `fullPrint` input parameter to choose which information to keep in the .csv trigger list. The core method, `segmentProcess`, is where a data segment defined by start and end GPS times undergoes the analysis. Its composed of several processing steps:

- Logging and directory setup
- Bandpass downsampling
- (Double) whitening
- Wavelet based detection loop
- Parameter estimation and event logging

Let us see how to use this powerful tool. First let's import the required libraries:

In [25]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from glob import glob
import json
import logging
import coloredlogs
import multiprocessing as mp
from datetime import datetime
from tempfile import NamedTemporaryFile

from wdf.processes.wdfUnitDSWorker import *
from wdf.config.Parameters import Parameters

sys.path.append("../scripts")
import segments
import get_git_repo_root

In [15]:
coloredlogs.install(isatty=True)
matplotlib.rcParams['agg.path.chunksize']=10000

%matplotlib inline

In [18]:
custom_segments = segments.get_list(tmin=1387513845, tmax=1387611370)

results_dir = os.path.join(get_git_repo_root.repo_root(), 'results/')
ffl_list = '/virgoData/ffl/raw.ffl'
f_sampling = 16384.0

In [28]:
# ---- Param (to be converted to a separate file)
configuration = {
        "window":1024,
        "overlap":768,
        "threshold": 0.3,
        "len":10.0,
        "sampling": f_sampling,
        "ResamplingFactor":8, 
        "itf": "V1",                  # itf, run and ID are required by the wdfUnitDSWorker class
        "run": "detchar",                    # Name of the run, don't change
        'ID': 'demo',
        'dir': results_dir,
        'file': ffl_list,
        'channel': 'V1:Hrec_hoft_16384Hz',
        "outdir": results_dir,              # output directory for whitened data, keep same as dir
        "ARorder": 1000,             # Order of the autoregressive model, can set lower for synthetic noise
        "learn": 200,                # number of seconds at beginning of a segment, used to compute AR parameters
        'preWhite':3,                    
        'nproc': 4,
      #  'gps': 1263751887.0, 
      #  'lastGPS': 1265760000.0,
        'segments': custom_segments
        }                
 
# Create a temporary JSON file to store the parameter and load its contet into the par object
tmpjson = NamedTemporaryFile()
with open(tmpjson.name, 'w') as f:
    json.dump(configuration, f)
    f.close()

    # ---- 
logging.info("Read parameters from JSON file...")
par = Parameters()

try:
    par.load(tmpjson.name)
    logging.info("Done.")
except:
    logging.error("Cannot read json file")
    # quit()  # <- Restore in script mode: print(quit); quit()

[32m2024-02-28 15:21:05[0m [35mservergpu1.virgo.infn.it[0m [34mroot[4602][0m [1;30mINFO[0m Read parameters from JSON file...
[32m2024-02-28 15:21:05[0m [35mservergpu1.virgo.infn.it[0m [34mroot[4602][0m [1;30mINFO[0m Done.


## Split segments for parallel processing

In [29]:
seglen = 3600         # If the segment is too long, split it to leverage multiple cpu cores
segment_list = []

for i, seg in enumerate(par.segments):
        start = seg[0]
        end = seg[1]
        if seglen:
            while (end - start) >= seglen*1.5:
                segment_list.append([start,start+seglen])
                start+=seglen
        segment_list.append([start,end])

par.segments = segment_list                   #
logging.info("Segments imported")

par.segments

[32m2024-02-28 15:21:06[0m [35mservergpu1.virgo.infn.it[0m [34mroot[4602][0m [1;30mINFO[0m Segments imported


[[1387513848.0, 1387517448.0],
 [1387517448.0, 1387521048.0],
 [1387521048.0, 1387524648.0],
 [1387524648.0, 1387528248.0],
 [1387528248.0, 1387531848.0],
 [1387531848.0, 1387535448.0],
 [1387535448.0, 1387539048.0],
 [1387539048.0, 1387542648.0],
 [1387542648.0, 1387546248.0],
 [1387546248.0, 1387549848.0],
 [1387549848.0, 1387553448.0],
 [1387553448.0, 1387557048.0],
 [1387557048.0, 1387560648.0],
 [1387560648.0, 1387564248.0],
 [1387564248.0, 1387567848.0],
 [1387567848.0, 1387571448.0],
 [1387571448.0, 1387575048.0],
 [1387575048.0, 1387578648.0],
 [1387578648.0, 1387582248.0],
 [1387582248.0, 1387585848.0],
 [1387585848.0, 1387589448.0],
 [1387589448.0, 1387593048.0],
 [1387593048.0, 1387596648.0],
 [1387596648.0, 1387600248.0],
 [1387600248.0, 1387603848.0],
 [1387603848.0, 1387607448.0],
 [1387607448.0, 1387609426.0]]

## Run WDF worker

In [None]:
# ---- Run multiprocess wdf (without state vectors) ----  
with mp.Pool(par.nproc) as p:
    wdf=wdfUnitDSWorker(par,fullPrint=3)   
    p.map(wdf.segmentProcess, par.segments)
    p.close()   

logging.info("Job complete!")

[32m2024-02-28 15:21:07[0m [35mservergpu1.virgo.infn.it[0m [34mroot[6959][0m [1;30mINFO[0m Analyzing segment: 1387521048.0-1387524648.0 for channel V1:Hrec_hoft_16384Hz downsampled at 2048Hz
[32m2024-02-28 15:21:07[0m [35mservergpu1.virgo.infn.it[0m [34mroot[6958][0m [1;30mINFO[0m Analyzing segment: 1387513848.0-1387517448.0 for channel V1:Hrec_hoft_16384Hz downsampled at 2048Hz
[32m2024-02-28 15:21:07[0m [35mservergpu1.virgo.infn.it[0m [34mroot[6960][0m [1;30mINFO[0m Analyzing segment: 1387528248.0-1387531848.0 for channel V1:Hrec_hoft_16384Hz downsampled at 2048Hz
[32m2024-02-28 15:21:07[0m [35mservergpu1.virgo.infn.it[0m [34mroot[6961][0m [1;30mINFO[0m Analyzing segment: 1387535448.0-1387539048.0 for channel V1:Hrec_hoft_16384Hz downsampled at 2048Hz
[32m2024-02-28 15:21:07[0m [35mservergpu1.virgo.infn.it[0m [34mroot[6960][0m [1;30mINFO[0m Start AR parameter estimation
[32m2024-02-28 15:21:07[0m [35mservergpu1.virgo.infn.it[0m [34mroot[69