<span style="float: left;padding: 1.3em">![logo](https://github.com/gw-odw/odw/blob/main/Tutorials/logo.png?raw=1)</span>

# Gravitational Wave Open Data Workshop

## Tutorial 1.1: Discovering open data from GW observatories

This notebook describes how to discover what data are available from the [Gravitational-Wave Open Science Center (GWOSC)](https://gwosc.org).
    
View this tutorial on [Google Colaboratory](https://colab.research.google.com/github/gw-odw/odw/blob/main/Tutorials/Day_1/Tuto_1.1_Discovering_Open_Data.ipynb) or launch [mybinder](https://mybinder.org/v2/gh/gw-odw/odw/HEAD).


## Installation (execute only if running on a cloud platform, like Google Colab, or if you haven't done the installation already!)

First, we need to install the software, which we do following the instruction in [Software Setup Instructions](../../setup.md):

> ⚠️ **Warning**: restart the runtime after running the cell below.
>
> To do so, click "Runtime" in the menu and choose "Restart and run all".

In [3]:
# -- Uncomment following line if running in Google Colab
!pip install -q 'gwosc==0.7.1'

## Initialization

In [4]:
#check the version of the package gwosc you are using
import gwosc
print(gwosc.__version__)

0.7.1


The version you get should be 0.7.1. If it's not, check that you have followed all the steps in [Software Setup Instructions](../../setup.md).

## A brief presentation of GWOSC and Open Data

Open Science is the movement to make scientific research accessible to everyone and to increase scientific collaboration. Open Science includes different movements and practices such as open data, open source software and infrastructures, open access to publications and citizen science and more. See the [wikipedia page](https://en.wikipedia.org/wiki/Open_science) for more information.

Data from the LIGO-Virgo-KAGRA (LVK) collaboration are made available to the public via the [Gravitational-Wave Open Science Center (GWOSC)](https://gwosc.org), as described in the [LIGO Data Management Plan](https://dcc.ligo.org/LIGO-M1000066/public).

For a more detailed presentation of the data, including conventions on file and channel names and details about the preparation of the data, see the paper "_Open Data from the Third Observing Run of LIGO, Virgo, KAGRA, and GEO_" ([link](https://iopscience.iop.org/article/10.3847/1538-4365/acdc9f)).

The GWOSC also provides links to software used to analyze LVK data and organize training sessions (you are participating in it!).

## Querying for event information

The module `gwosc.datasets` provides tools for searching for datasets, including events, catalogs and full run strain data releases.


For example, we can search for events in the [GWTC-1 catalog](https://gwosc.org/eventapi/html/GWTC-1-confident/), the catalog of all events from the O1 and O2 observing runs. A list of available catalogs can be seen in the [Event Portal](https://gwosc.org/eventapi)

In [5]:
from gwosc.datasets import find_datasets
from gwosc import datasets

#-- List all available catalogs
print("List of available catalogs")
print(find_datasets(type="catalog"))

List of available catalogs
['GWTC', 'GWTC-1-confident', 'GWTC-1-marginal', 'GWTC-2', 'GWTC-2.1-auxiliary', 'GWTC-2.1-confident', 'GWTC-2.1-marginal', 'GWTC-3-confident', 'GWTC-3-marginal', 'IAS-O3a', 'Initial_LIGO_Virgo', 'O1_O2-Preliminary', 'O3_Discovery_Papers', 'O3_IMBH_marginal', 'O4_Discovery_Papers']


In [8]:
#-- Print all the GW events from the GWTC-1 catalog
gwtc1 = datasets.find_datasets(type='events', catalog='GWTC-1-confident')
print('GWTC-1 events:', gwtc1)
print("")

GWTC-1 events: ['GW150914-v3', 'GW151012-v3', 'GW151226-v2', 'GW170104-v2', 'GW170608-v3', 'GW170729-v1', 'GW170809-v1', 'GW170814-v3', 'GW170817-v3', 'GW170818-v1', 'GW170823-v1']



Note that the event name is of the type _GWyymmdd-vx_ where _x_ is the last available version for the data set provided by GWOSC.

In [9]:
#-- Print all the large strain data sets from LIGO/Virgo/KAGRA observing runs
runs = find_datasets(type='run')
print('Large data sets:', runs)

Large data sets: ['O1', 'O1_16KHZ', 'O2_16KHZ_R1', 'O2_4KHZ_R1', 'O3GK_16KHZ_R1', 'O3GK_4KHZ_R1', 'O3a_16KHZ_R1', 'O3a_4KHZ_R1', 'O3b_16KHZ_R1', 'O3b_4KHZ_R1', 'S5', 'S6']


_Attention: Note that the most recent observation runs, e.g. O2, are labeled with names containing the name of the run (e.g. O2), the sampling rate (4 or 16 kHz) and the release version (e.g. R1). This means that for O2 you have two labels 'O2_4KHZ_R1' and 'O2_16KHZ_R1', depending which is the desired sampling rate_

`datasets.find_datasets` also accepts a `segment` and `detector` keyword to narrow results based on GPS time and detector:

In [10]:
#-- Detector and segments keywords limit search result
print(datasets.find_datasets(type='events', catalog='GWTC-1-confident', detector="L1", segment=(1164556817, 1187733618)))

['GW170104-v2', 'GW170608-v3', 'GW170729-v1', 'GW170809-v1', 'GW170814-v3', 'GW170817-v3', 'GW170818-v1', 'GW170823-v1']


Using `gwosc.datasets.event_gps`, we can query for the GPS time of a specific event (it works also without the version number):

In [11]:
from gwosc.datasets import event_gps
gps = event_gps('GW190425')
print(gps)

1240215503.0


<div class="alert alert-info">All of these times are returned in the GPS time system, which counts the number of seconds that have elapsed since the start of the GPS epoch at midnight (00:00) on January 6th 1980. GWOSC provides a <a href="https://gwosc.org/gps/">GPS time converter</a> you can use to translate into datetime, or you can use <a href="https://gwpy.github.io/docs/stable/time/"><code>gwpy.time</code></a>.</div>

In [12]:
# You can do also the vice-versa
from gwosc.datasets import event_at_gps
print(datasets.event_at_gps(1240215503))

GW190425


Note that the method `event_at_gps` looks for events found within 1 seconds of the given GPS time. If no events is found it will give an error.

We can query for the GPS time interval for an observing run:

In [23]:
from gwosc.datasets import run_segment
print(run_segment('O1'))

(1126051217, 1137254417)


In [14]:
# and vice-versa also in this case
from gwosc.datasets import run_at_gps
print(run_at_gps(1240215503))

O3a_4KHZ_R1


Now we can use what we have learned with `run_segment` and `find_datasets` to see only the confident events in O1:

In [15]:
O1_events = datasets.find_datasets(type='events', catalog='GWTC-1-confident', segment=run_segment('O1'))
print(O1_events)

['GW150914-v3', 'GW151012-v3', 'GW151226-v2']


## Querying for data files

The `gwosc.locate` module provides a function to find the URLs of data files associated with a given dataset.

For event datasets, one can get the list of URLs using only the event name:

In [16]:
from gwosc.locate import get_event_urls
urls = get_event_urls('GW150914')
print(urls)

['https://gwosc.org/eventapi/json/GWTC-1-confident/GW150914/v3/H-H1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'https://gwosc.org/eventapi/json/GWTC-1-confident/GW150914/v3/H-H1_GWOSC_4KHZ_R1-1126257415-4096.hdf5', 'https://gwosc.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'https://gwosc.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126257415-4096.hdf5']


By default, this function returns all of the files associated with a given event, which isn't particularly helpful. However, we can filter on any of these by using keyword arguments, for example to get the URL for the 32-second file for the LIGO-Livingston detector:

In [18]:
urls = get_event_urls('GW150914', duration=32, detector='L1')
print(urls)

['https://gwosc.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5']


##  Query filtered by merger parameters
The `query_events` module of `gwosc.datasets` allows to get a list of events filtered by some parameters, similar to what is done by the `Query` function of the [event portal](https://gwosc.org/eventapi/html/query/). A list of available parameters can be found [here](https://gwosc.readthedocs.io/en/stable/reference/gwosc.datasets.query_events.html) or using `query_events?`.

Let's see how to use this module to find which events have been detected with a network signal to noise ratio (SNR) between 25 and 30:

In [21]:
from gwosc.datasets import query_events
selection = query_events(select=["network-matched-filter-snr >= 30"])
#this is equivalent to
#query_events(select=["network-matched-filter-snr <= 30", "network-matched-filter-snr>= 25"])
print(selection)

['GW170817-v3']


Note that this module will give the list of **all available versions** for all the events that have the required parameters. For example, in this query the event GW190814 is listed twice because 2 versions of that event satisfy the request of SNR between 25 and 30.

# Exercises

Now that you've seen examples of how to query for dataset information using the `gwosc` package, please try and complete the following exercises using that interface:

- How many months did O2 last? (Hint: check the output of _find_datasets(type='run')_ to find the correct label to use)
- How many GWTC-3-confident events were detected during O3b?
- How many events have been detected with a network signal to noise ratio (SNR) >= 30?

In [29]:
#1
from gwosc.datasets import run_segment
o2 = run_segment('O2_16KHZ_R1')
Time_Months = (o2[1] - o2[0]) / (60 * 60 * 24 * 30)
print(Time_Months)


8.941667052469136


In [31]:
#2
from gwosc.datasets import find_datasets
from gwosc import datasets

gwtc3 = datasets.find_datasets(type='events', catalog='GWTC-3-confident')
print('GWTC-3 events:', gwtc3)
gwtc3_count = len(gwtc3)
print(gwtc3_count)


GWTC-3 events: ['GW191103_012549-v1', 'GW191105_143521-v1', 'GW191109_010717-v1', 'GW191113_071753-v1', 'GW191126_115259-v1', 'GW191127_050227-v1', 'GW191129_134029-v1', 'GW191204_110529-v1', 'GW191204_171526-v1', 'GW191215_223052-v1', 'GW191216_213338-v1', 'GW191219_163120-v1', 'GW191222_033537-v1', 'GW191230_180458-v1', 'GW200112_155838-v1', 'GW200115_042309-v2', 'GW200128_022011-v1', 'GW200129_065458-v1', 'GW200202_154313-v1', 'GW200208_130117-v1', 'GW200208_222617-v1', 'GW200209_085452-v1', 'GW200210_092254-v1', 'GW200216_220804-v1', 'GW200219_094415-v1', 'GW200220_061928-v1', 'GW200220_124850-v1', 'GW200224_222234-v1', 'GW200225_060421-v1', 'GW200302_015811-v1', 'GW200306_093714-v1', 'GW200308_173609-v1', 'GW200311_115853-v1', 'GW200316_215756-v1', 'GW200322_091133-v1']
35


In [32]:
from gwosc.datasets import query_events
selection = query_events(select=["network-matched-filter-snr >= 30"])
print(selection)

['GW170817-v3']
