In [None]:
import os
import warnings

import math
import pycbc
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from gwpy.timeseries import TimeSeries
from matplotlib.ticker import ScalarFormatter
from sklearn.preprocessing import MinMaxScaler, MaxAbsScaler, StandardScaler
from sklearn.metrics import roc_auc_score, roc_curve, auc

pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

warnings.filterwarnings('ignore')

from modules import statistical_testing

In this project we deal with two main types of signals, the first being ones containing glitches: transient spikes in energy levels caused due to external factors such as terrestrial and electromagnetic disturbances.

In this case we have obtained the glitch readings with a high level of confidence from the O3a run for all the interferometers, out of which we only consider the readings from the Livingston L1 interferometer.

We load the CSV that contains all glitch times from the first half of the third observing run.

In [None]:
glitches = pd.read_csv('./glitches/O3a_allifo.csv', usecols=['GPStime', 'snr', 'duration', 'confidence', 'ifo', 'label'])
glitches = glitches[~glitches.duplicated(subset=['GPStime'], keep='first')]
# glitches["glitch_present"] = 1

glitches.columns

As you can see it contains a bunch of columns. 
- **GPStime**: The timestamp in GPS format indicating the event time.
- **peakFreq**: The frequency at which the signal has the highest intensity.
- **snr**: Signal-to-noise ratio, indicating the clarity of the signal.
- **amplitude**: The strength or height of the signal wave.
- **centralFreq**: The central frequency of the signal's spectral content.
- **duration**: The time span of the signal event.
- **bandwidth**: The range of frequencies covered by the signal.
- **chisq**: The chi-squared statistic for assessing signal fit quality.
- **chisqDof**: The degrees of freedom used in the chi-squared test.
- **confidence**: The likelihood or certainty of the detection.
- **id**: A unique identifier for the signal event.
- **ifo**: The interferometer associated with the signal detection.
- **label**: Classification or annotation of the event.
- **imgUrl**: Link to an image or visual representation of the signal.
- **Q-value**: Quality factor indicating the sharpness of the signal.

I will be removing the columns that are not relevant to us.

Let's take a look at the **label** column to see all the different glitch classes.

In [None]:
glitches['label'].unique()

The second type of signal we are dealing with is clean signals that do not have any glitch present in them. These sections have been sourced from times during the O3a run where there are relatively stable levels of energy between areas of 