<div align="center">
<font size="6"> G2Net Gravitational Wave Detection  </font>  
</div> 


<div align="center">
<font size="4"> Find gravitational wave signals from binary black hole collisions  </font>  
</div> 

<img align="left" src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/header.png" data-canonical-src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/header.png" width="1350" />

<a id="0"></a>
<h1 style='background:#0788f0; font-size:200%; border:0; color:white;'><center> Table of Contents</center></h1>

1. [Gravitational Wave Detection](#1)
2. [Competition Overview](#2)
3. [Import packages](#3)
4. [Processing GW data: GWpy](#4)  
    4.1 [Time Series](#4.1)  
    4.2 [Spectrograms](#4.2)  
   
[References](#100)

<a id="1"></a>
<h2 style='background:#0788f0; font-size:200%; border:0; color:white'><center> 1 Graviational Waves Detection <center><h0>

On September 14, 2015 at 09:50:45 UTC the two detectors of the Laser Interferometer Gravitational-Wave Observatory ([LIGO](https://www.ligo.org/)) simultaneously observed a transient gravitational-wave signal. 

<img style="float:left; padding-right:10px" src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/ripple.jpg" data-canonical-src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/ripple.jpg" width="225" height="225"/>  

It required the collaboration of experts in physics, mathematics, information science, and computing. GW signals have led researchers to observe a new population of massive, stellar-origin **black holes (BH)**, to unlock the mysteries of neutron star mergers, and to measure the expansion of the Universe. These signals are unimaginably **tiny ripples** in the fabric of space-time and even though the global network of GW detectors are some of the most sensitive instruments on the planet, the signals are buried in detector noise. Analysis of GW data and the detection of these signals is a crucial mission for the growing global network of increasingly sensitive GW detectors. These challenges in data analysis and noise characterization could be solved with the help of data science. As with the multi-disciplined approach to the discovery of GWs, additional expertise will be needed to further GW research. In particular, social and natural sciences have taken an interest in machine learning, deep learning, classification problems, data mining, and visualization to develop new techniques and algorithms to efficiently handle complex and massive data sets. The increase in computing power and the development of innovative techniques for the rapid analysis of data will be vital to the exciting new field of GW Astronomy. Potential outcomes may include increased sensitivity to GW signals, application to control and feedback systems for next-generation detectors, noise removal, data conditioning tools, and signal characterization.  


<img style="float:left; padding-right:10px" src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/bh.png" data-canonical-src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/bh.png" width="70" height="70"/>  

[G2Net](www.g2net.eu) is a network of **Gravitational Wave, Geophysics and Machine Learning**. Via an Action from [COST](www.cost.eu) (European Cooperation in Science and Technology), a funding agency for research and innovation networks, G2Net aims to create a broad network of scientists. From four different areas of expertise, namely GW physics, Geophysics, Computing Science and Robotics, these scientists have agreed on a common goal of tackling challenges in data analysis and noise characterization for GW detectors.

In [None]:
import IPython.display
IPython.display.YouTubeVideo('B4XzLDM3Py8', width=768, height=524)

<a id="2"></a>
<h2 style='background:#0788f0; font-size:200%; border:0; color:white'><center> 2 Competition Overview <center><h0>

<img style="float:left; padding-right:10px" src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/ego_logo.png" data-canonical-src="https://raw.githubusercontent.com/kabartay/kaggle-g2net-gravitational-wave-detection/main/pics/ego_logo.png" width="70" height="70"/> 

This competition is hosted by [European Gravitational Observatory (EGO)](https://www.ego-gw.it/). We **aim to detect GW signals** from the mergers of **binary black holes (BBH)**.  It is assumed that we build a model to analyze simulated GW time-series data from a network of 3 Earth-based GW interferometers (LIGO Hanford, LIGO Livingston, and Virgo). We are provided with a **72 GB time-series dataset** of time series data containing simulated GW measurements. Each time series contains either detector noise or detector noise plus a simulated GW signal. 

## Task
Identify when a signal is present in the data (`target=1`).  

## Files
- **train/** - the training set files, one npy file per observation; labels are provided in a files shown below   
- **test/** - the test set files; you must predict the probability that the observation contains a gravitational wave   
- **training_labels.csv** - target values of whether the associated signal contains a gravitational wave   
- **sample_submission.csv** - a sample submission file in the correct format

<a id="3"></a>
<h2 style='background:#0788f0; font-size:200%; border:0; color:white'><center> 3 Import packages <center><h0>

In [None]:
import os
import gc
import json
import random

import numpy as np
import pandas as pd
import pickle as pkl

import cv2
from PIL import Image
import seaborn as sns
import matplotlib.pyplot as plt
#%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

In [None]:
PATH = '/kaggle/input/g2net-gravitational-wave-detection/'
df_labels = pd.read_csv(PATH+'training_labels.csv')

In [None]:
plt.figure(figsize=(7,6))
ax = sns.countplot(x = df_labels['target'])

Each data sample (`.npy` file) contains 3 time series (1 for each detector: LIGO Hanford, LIGO Livingston, and Virgo) and each spans 2 sec and is sampled at 2048 Hz.

In [None]:
# Example 
event = '00000e74ad'
file_npy_ev = '/kaggle/input/g2net-gravitational-wave-detection/train/0/0/0/{}.npy'.format(event)
print(np.load(file_npy_ev).shape)
np.load(file_npy_ev)

In [None]:
# We have GW
df_labels[df_labels['id'] == event]

<a id="4"></a>
<h2 style='background:#0788f0; font-size:200%; border:0; color:white'><center> 4 Processing GW data: GWpy <center><h0>

[GWpy](https://gwpy.github.io/docs/latest/index.html) is a collaboration-driven Python package providing tools for studying data from ground-based GW detectors. It provides a user-friendly, intuitive interface to the common time-domain and frequency-domain data produced by the [LIGO](https://www.ligo.org/) and [Virgo](https://www.ego-gw.it/) instruments and their analysis, with easy-to-follow tutorials at each step.

In [None]:
# The recommended way of installing GWpy is:
# !conda install -c conda-forge gwpy  # with Conda
# !python -m pip install gwpy          # with Pip

try:
    from gwpy.timeseries import TimeSeries, TimeSeriesDict
    from gwpy.plot import Plot
except:
    ! python -m pip install -q gwpy
    from gwpy.timeseries import TimeSeries, TimeSeriesDict
    from gwpy.plot import Plot

In [None]:
def get_npy_data(file):
    """Get .npy file content."""
    data_npy = np.load(file)
    Hanford    = TimeSeries(data_npy[0,:], sample_rate=2048)
    Livingston = TimeSeries(data_npy[1,:], sample_rate=2048)
    Virgo      = TimeSeries(data_npy[2,:], sample_rate=2048)
    return Hanford, Livingston, Virgo

In [None]:
# Get detectors data
Hanford, Livingston, Virgo = get_npy_data(file_npy_ev)

In [None]:
print(Hanford.shape)
Hanford

In [None]:
plt.figure(figsize=(12, 9))
sns.histplot(data=Hanford)
plt.xlim(-2e-20, 2e-20)
plt.xlabel('Amplitude [strain]')
plt.show()

In [None]:
plt.figure(figsize=(12, 9))
sns.histplot(data=Livingston)
plt.xlim(-2e-20, 2e-20)
plt.xlabel('Amplitude [strain]')
plt.show()

In [None]:
plt.figure(figsize=(12, 9))
sns.histplot(data=Virgo)
plt.xlim(-0.5e-20, 0.5e-20)
plt.xlabel('Amplitude [strain]')
plt.show()

In [None]:
plt.figure(figsize=(12, 9))
sns.histplot(data=Hanford)
sns.histplot(data=Livingston)
#sns.histplot(data=Virgo)
plt.xlim(-2e-20, 2e-20)
plt.xlabel('Amplitude [strain]')
plt.show()

<a id="4.1"></a>
<h3 style='background:#0788f0; font-size:200%; border:0; color:white'><center> 4.1 Time Series <center><h0>

In [None]:
# Check out: 
# https://gwpy.github.io/docs/latest/overview.html
# https://gwpy.github.io/docs/latest/plot/index.html

def plot_time_series(Hanford, Livingston, Virgo):
    """Plot time series. Separate subplots.
    Detectors order: LIGO Hanford, LIGO Livingston, and Virgo
    """
    plot = Plot(Hanford, Livingston, Virgo, 
                separate=True, 
                sharex=True, 
                figsize=[18, 12])
    ax = plot.gca()
    ax.set_xlim(0,2)
    ax.set_xlabel('Time [s]')
    plot.show()
    
def plot_time_series_all(Hanford, Livingston, Virgo):
    """Plot time series. All detectors together."""
    plot = Plot(figsize=(18, 4))
    ax = plot.add_subplot()
    ax.plot(Hanford, color='gwpy:ligo-hanford', label='LIGO-Hanford')
    ax.plot(Livingston, color='gwpy:ligo-livingston', label='LIGO-Livingston')
    ax.plot(Virgo, color='gwpy:virgo', label='Virgo')
    ax.set_ylabel('Amplitude [strain]')
    ax.set_xlim(0, 2)
    ax.set_ylim(-2e-20, 2e-20)
    ax.legend()
    
def plot_time_series_LIGO(Hanford, Livingston):
    """Plot time series. LIGO detectors."""
    plot = Plot(figsize=(18, 4))
    ax = plot.add_subplot()
    ax.plot(Hanford, color='gwpy:ligo-hanford', label='LIGO-Hanford')
    ax.plot(Livingston, color='gwpy:ligo-livingston', label='LIGO-Livingston')
    ax.set_ylabel('Amplitude [strain]')
    ax.set_xlim(0, 2)
    ax.set_ylim(-2e-20, 2e-20)
    ax.legend()

In [None]:
plot_time_series(Hanford, Livingston, Virgo)

In [None]:
plot_time_series_LIGO(Hanford, Livingston)

In [None]:
plot_time_series_all(Hanford, Livingston, Virgo)

<a id="4.2"></a>
<h2 style='background:#0788f0; font-size:200%; border:0; color:white'><center> 4.2 Spectrograms <center><h0>

### Generate the Q-transform of a TimeSeries
One of the most useful tools for filtering and visualising short-duration features in a **TimeSeries** is the **Q-transform**. This is regularly used by the Detector Characterization working groups of the LIGO Scientific Collaboration and the Virgo Collaboration to produce high-resolution time-frequency maps of transient noise (glitches) and potential gravitational-wave signals. Check out [here](https://gwpy.github.io/docs/latest/examples/timeseries/qscan.html).


In [None]:
# Check out:
# https://gwpy.github.io/docs/latest/examples/timeseries/qscan.html

def plot_spectrograms(detector_data, detector_name, event, target_ev, grid):
    """Plot spectrograms for specific detector's event."""
    
    qspecgram = detector_data.q_transform(outseg=(0.0, 2.0))
    
    plot = qspecgram.plot(figsize=[12, 10])
    ax = plot.gca()
    ax.set_title('{}. Event: {}. Target: {}'.format(detector_name, event, target_ev))
    ax.set_xlabel('Time [s]')
    ax.set_ylabel('Frequency [Hz]')
    #ax.set_yscale('log')
    ax.grid(grid)
    ax.colorbar(cmap='viridis', label='Normalized energy')
    plot.show()

In [None]:
# Get data and target value for specific event
Hanford, Livingston, Virgo = get_npy_data(file_npy_ev)
target_ev = df_labels[df_labels['id'] == event]['target'].iloc[0]

In [None]:
plot_spectrograms(Hanford, 'LIGO Hanford', event, target_ev, False)

In [None]:
plot_spectrograms(Livingston, 'LIGO Livingston', event, target_ev, False)

In [None]:
plot_spectrograms(Virgo, 'Virgo', event, target_ev, True)

In [None]:
# Calculating a Spectrogram from a TimeSeries
# The time-frequency Spectrogram of a TimeSeries can be calculated using the spectrogram() method. 
# We can extend previous examples of plotting a TimeSeries with calculation of a Spectrogram with a 2-second stride:
# https://gwpy.github.io/docs/latest/spectrogram/index.html#calculating-a-spectrogram-from-a-timeseries
# https://gwpy.github.io/docs/latest/api/gwpy.spectrogram.Spectrogram.html#gwpy.spectrogram.Spectrogram
# https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html
# https://gwpy.github.io/docs/stable/api/gwpy.timeseries.TimeSeries.html#gwpy.timeseries.TimeSeries.spectrogram

def plot_spectrograms_from_timeseries(spectrogram):
    """Plot spectrograms from Time Series."""
    plot = spectrogram.plot(figsize=[12, 10], 
                       norm='log', 
                       #vmin=1e-24, 
                       #vmax=1e-21
                      )
    ax = plot.gca()
    ax.set_ylim(1, 1000)
    ax.set_yscale('log')
    ax.set_xlabel('Time [s]')
    ax.set_ylabel('Frequency [Hz]')
    ax.colorbar(label='GW strain ASD [strain/$\sqrt{\mathrm{Hz}}$]')
    plot.show()

### Rules for spectrogram function:
- `stride` cannot be greater than the duration of this TimeSeries
- `fftlength` cannot be greater than stride
- `overlap` must be less than fftlength
- `window` should be not longer than input signal

In [None]:
H_spec = Hanford.spectrogram(stride=2,       # number of seconds in single PSD (column of spectrogram). 
                             fftlength=0.75, # number of seconds in single FFT
                             overlap=0,      # number of seconds of overlap between FFTs, defaults to the recommended overlap for the given window (if given), or 0
                             nproc=8         # number of CPUs to use in parallel processing of FFTs
                            ) ** (1/2.)      # sqrt

In [None]:
L_spec = Livingston.spectrogram(stride=2,       # number of seconds in single PSD (column of spectrogram). 
                                fftlength=0.75, # number of seconds in single FFT
                                overlap=0,      # number of seconds of overlap between FFTs, defaults to the recommended overlap for the given window (if given), or 0
                                nproc=8         # number of CPUs to use in parallel processing of FFTs
                                ) ** (1/2.)      # sqrt

In [None]:
V_spec = Virgo.spectrogram(stride=2,       # number of seconds in single PSD (column of spectrogram). 
                           fftlength=0.75, # number of seconds in single FFT
                           overlap=0,      # number of seconds of overlap between FFTs, defaults to the recommended overlap for the given window (if given), or 0
                           nproc=8         # number of CPUs to use in parallel processing of FFTs
                          ) ** (1/2.)      # sqrt

In [None]:
plot_spectrograms_from_timeseries(H_spec)

In [None]:
plot_spectrograms_from_timeseries(L_spec)

In [None]:
plot_spectrograms_from_timeseries(V_spec)

<a id="100"></a>
<h2 style='background:#0788f0; font-size:200%; border:0; color:white'><center> References <center><h0>

- [G2Net](www.g2net.eu). COST Action CA17137: A network for Gravitational Waves, Geophysics and Machine Learning.
- [COST](www.cost.eu): European Cooperation in Science and Technology.
- [EGO](https://www.ego-gw.it/): European Gravitational Observatory.
- [LIGO](https://www.ligo.org/): Laser Interferometer Gravitational-Wave Observatory.
- [LIGO](https://www.ligo.caltech.edu/) at CalTech.
- B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration) Phys. Rev. Lett. 116, 061102 [DOI:10.1103/PhysRevLett.116.061102](https://link.aps.org/doi/10.1103/PhysRevLett.116.061102)