This code is directly copied from https://www.kaggle.com/mistag/data-preprocessing-with-gwpy by Geir Drange (please upvote his notebook), I'm publishing it only to show how I used this code to create a preprocessed dataset.

# Processing of gravitational data
In this notebook we will apply some signal processing to the gravitational data. Preprocessing is definately required on these signals, and thankfully there is a Python package called [GWpy](https://gwpy.github.io/docs/latest/index.html) that has all functions that are needed. Theory will not be discussed here in detail, but there are plenty of info and code on the topic:   
  * [GW tutorials](https://www.gw-openscience.org/tutorials/)
  * [Gravitational Wave Open Science Center](https://www.gw-openscience.org/software/)
  * [A guide to LIGO–Virgo detector noise and extraction of transient gravitational-wave signals](https://iopscience.iop.org/article/10.1088/1361-6382/ab685e)
  
  
First, install GWpy:

In [None]:
!python -m pip install gwpy

In [None]:
from gwpy.timeseries import TimeSeries
from gwpy.plot import Plot
import numpy as np
from scipy import signal
from sklearn.preprocessing import MinMaxScaler
from PIL import Image
from matplotlib import pyplot as plt

# Read & plot files
Let's define helper function to read numpy data and convert into GWpy TimeSeries format and plot the data.

In [None]:
def read_file(fname):
    data = np.load(fname)
    d1 = TimeSeries(data[0,:], sample_rate=2048)
    d2 = TimeSeries(data[1,:], sample_rate=2048)
    d3 = TimeSeries(data[2,:], sample_rate=2048)
    return d1, d2, d3

def plot_time_data(d1, d2, d3):
    plot = Plot(d1, d2, d3, separate=True, sharex=True, figsize=[12, 8])
    ax = plot.gca()
    ax.set_xlim(0,2)
    ax.set_xlabel('Time [s]')
    plot.show()

Visualize a file:

In [None]:
d1, d2, d3 = read_file('../input/g2net-gravitational-wave-detection/train/0/0/0/0002b64784.npy')
plot_time_data(d1, d2, d3)

# Preprocess
Then we will follow the general processing steps outlined in [this article](https://iopscience.iop.org/article/10.1088/1361-6382/ab685e):  
* Apply a window function (Tukey - tapered cosine window) to suppress spectral leakage
* Whiten the spectrum
* Bandpass

## Apply window function

The Tukey window looks like this:

In [None]:
window = signal.tukey(4096)
plt.plot(window);

Let's look at the signal after windowing:

In [None]:
d1, d2, d3 = d1*window, d2*window, d3*window
plot_time_data(d1, d2, d3)

Take a look at the spectrum:

In [None]:
fig2 = d1.asd(fftlength=2).plot(figsize=[12, 6])
plt.xlim(10,1024)
plt.ylim(1e-24, 1e-19);

## Spectral whitening and bandpass filtering
This is super simple with GWpy:

In [None]:
white_data = d1.whiten()
bp_data = white_data.bandpass(35, 350) # frequency range 35-350Hz
fig3 = bp_data.plot(figsize=[12, 6])
plt.xlim(0, 2)
ax = plt.gca()
ax.set_title('Whitened and bandpassed')
ax.set_xlabel('Time [s]');

Now, we have a preprocessed data that is ready for further analysis. First, let's define a function that combines all the steps above and outputs preprocessed data:

In [None]:
def preprocess(d1, d2, d3, lf=35, hf=350):
    window = signal.tukey(4096)
    d1, d2, d3 = d1*window, d2*window, d3*window
    white_d1 = d1.whiten()
    white_d2 = d2.whiten()
    white_d3 = d3.whiten()
    bp_d1 = white_d1.bandpass(lf, hf) 
    bp_d2 = white_d2.bandpass(lf, hf)
    bp_d3 = white_d3.bandpass(lf, hf)
    return bp_d1, bp_d2, bp_d3

# Q-Transform
The Q-Transform is related to the Fourier transform, and very closely related to a wavelet transform. The spectrogram is a possible candidate as input for a CNN model.

In [None]:
r1, r2, r3 = read_file('../input/g2net-gravitational-wave-detection/train/0/0/0/0002b64784.npy')
p1, p2, p3 = preprocess(r1, r2, r3)
hq = p1.q_transform(outseg=(0, 2))
fig4 = hq.plot(figsize=[12, 10])
ax = fig4.gca()
fig4.colorbar(label="Normalised energy")
ax.grid(False)
ax.set_yscale('log')
ax.set_xlabel('Time [s]');

## Combine three channels into one RGB image
Since we have 3 detectors, we can combine the Q-Transforms as RGB channels into one color image. Let's make a function for that:

In [None]:
def create_rgb(fname):
    r1, r2, r3 = read_file(fname)
    p1, p2, p3 = preprocess(r1, r2, r3)
    hq1 = p1.q_transform(outseg=(0, 2))
    hq2 = p2.q_transform(outseg=(0, 2))
    hq3 = p3.q_transform(outseg=(0, 2))
    img = np.zeros([hq1.shape[0], hq1.shape[1], 3], dtype=np.uint8)
    scaler = MinMaxScaler()
    img[:,:,0] = 255*scaler.fit_transform(hq1)
    img[:,:,1] = 255*scaler.fit_transform(hq2)
    img[:,:,2] = 255*scaler.fit_transform(hq3)
    return Image.fromarray(img).rotate(90, expand=1)

In [None]:
create_rgb('../input/g2net-gravitational-wave-detection/train/4/1/0/410031196b.npy')

Awesome! (Or crazy?) Next step is to find out if we can train an image classifier with these images...

In [None]:
def id2path(img_id, is_test):
    a, b, c = img_id[0], img_id[1], img_id[2]
    if is_test: return f'../input/g2net-gravitational-wave-detection/test/{a}/{b}/{c}/{img_id}.npy'
    return f'../input/g2net-gravitational-wave-detection/train/{a}/{b}/{c}/{img_id}.npy'

In [None]:
import pandas as pd
df = pd.read_csv('../input/g2net-gravitational-wave-detection/sample_submission.csv')

In [None]:
import os
os.makedirs('test_images', exist_ok=True)

In [None]:
def save_test_img(_id):
    fname = id2path(_id, True)
    im = create_rgb(fname)
    im = im.resize((300,300), Image.BILINEAR)
    im.save(f'test_images/{_id}.png', format="png")

In [None]:
df = df[:56500]
len(df)

In [None]:
# https://www.kaggle.com/yasufuminakama/g2net-spectrogram-generation-train
import joblib
from tqdm.auto import tqdm

_ = joblib.Parallel(n_jobs=8)(
    joblib.delayed(save_test_img)(_id) for _id in tqdm(df['id'].values)
)


In [None]:
import shutil

shutil.make_archive('test_images', 'zip', 'test_images')
shutil.rmtree('test_images')