In [1]:
# this just replaces the builtin `print` function with IPython's `display` function
# which is better at printing long lists of things;
# feel free to ignore this, and don't try and run it outside of jupyter or ipython
from IPython.display import display as print

# Gravitational wave 'Open Data'

The KAGRA, LIGO, and Virgo projects are all funded by government research funding agencies, and so have a responsibility and an obligation to publish not only their results but also their data, in effect to give it back to the taxpayer.

The [Gravitational-Wave Open Science Center](https://gw-openscience.org) (GWOSC) is jointly operated by the collaborations as the place where data are made available.

When new detections ('_events_) are published, an hour (or so) of data are made freely available by GWOSC, and eventually the full observing data set is released (after 18 months of restricted access).

The event datasets are grouped into _catalogues_, called 'GWTC-1' and 'GWTC-2'. In the future 'GWTC-3' will be release, and so on.

## How can I get the data?

The GWOSC [Event Portal](https://www.gw-openscience.org/eventapi/) can be used to see what data are available.

Example event page: [GW150914](https://www.gw-openscience.org/eventapi/html/GWTC-1-confident/GW150914/v3/)

However, you can do much more with the data if you use [Python](https://python.org)!

## `gwosc`, the Python interface to GWOSC

While you can use the GWOSC website to find and download data, that can be slow with lots of clicking around to find what you want.

The [`gwosc`](https://gwosc.readthedocs.io) Python package can be used to simplify or automate most of that.

First, we have to install it, for that we can use [`pip`](https://pip.pypa.io/).

In [2]:
# this is just a fancy version of 'pip install gwosc' for use inside a jupyter notebook
import sys
!{sys.executable} -m pip install gwosc



Now that it has installed, we can `import` it and start working:

In [3]:
import gwosc
help(gwosc)

Help on package gwosc:

NAME
    gwosc - GWOSC: a python interface to the GW Open Science data archive

PACKAGE CONTENTS
    _version
    api
    catalog
    datasets
    locate
    tests (package)
    timeline
    urls
    utils

VERSION
    0.5.7

AUTHOR
    Duncan Macleod <duncan.macleod@ligo.org>

FILE
    c:\users\spxdmm\miniconda3\envs\py39\lib\site-packages\gwosc\__init__.py




## Querying for datasets with `gwosc`

In order to discover what data are available, we can use the functions in the [`gwosc.datasets`](https://gwosc.readthedocs.io/en/latest/datasets.html) module:

In [4]:
from gwosc import datasets
help(datasets)

Help on module gwosc.datasets in gwosc:

NAME
    gwosc.datasets - `gwosc.datasets` includes functions to query for available datasets.

DESCRIPTION
    To search for all available datasets:
    
    >>> from gwosc import datasets
    >>> datasets.find_datasets()
    ['GW150914', 'GW151226', 'GW170104', 'GW170608', 'GW170814', 'GW170817', 'LVT151012', 'O1', 'S5', 'S6']
    >>> datasets.find_datasets(detector='V1')
    ['GW170814', 'GW170817']
    >>> datasets.find_datasets(type='run')
    ['O1', 'S5', 'S6']
    
    To query for the GPS time of an event dataset (or vice-versa):
    
    >>> datasets.event_gps('GW170817')
    1187008882.43
    >>> datasets.event_at_gps(1187008882)
    'GW170817'
    
    Similar queries are available for observing run datasets:
    
    >>> datasets.run_segment('O1')
    (1126051217, 1137254417)
    >>> datasets.run_at_gps(1135136350)  # event_gps('GW151226')
    'O1'

FUNCTIONS
    dataset_type(dataset, host='https://www.gw-openscience.org')
        Re

Following the example from the `help` message, we can discover the available datasets using the `find_datasets` function:

In [5]:
print(datasets.find_datasets())

['151008-v1',
 '151012.2-v1',
 '151116-v1',
 '161202-v1',
 '161217-v1',
 '170208-v1',
 '170219-v1',
 '170405-v1',
 '170412-v1',
 '170423-v1',
 '170616-v1',
 '170630-v1',
 '170705-v1',
 '170720-v1',
 '190924_232654-v1',
 '191223_014159-v1',
 '191225_215715-v1',
 '200114_020818-v1',
 '200214_224526-v1',
 'BKGW170608_16KHZ_R1',
 'GRB051103-v1',
 'GW150914-v1',
 'GW150914-v2',
 'GW150914-v3',
 'GW151012-v1',
 'GW151012-v2',
 'GW151012-v3',
 'GW151226-v1',
 'GW151226-v2',
 'GW170104-v1',
 'GW170104-v2',
 'GW170608-v1',
 'GW170608-v2',
 'GW170608-v3',
 'GW170729-v1',
 'GW170809-v1',
 'GW170814-v1',
 'GW170814-v2',
 'GW170814-v3',
 'GW170817-v1',
 'GW170817-v2',
 'GW170817-v3',
 'GW170818-v1',
 'GW170823-v1',
 'GW190408_181802-v1',
 'GW190412-v1',
 'GW190412-v2',
 'GW190412-v3',
 'GW190413_052954-v1',
 'GW190413_134308-v1',
 'GW190421_213856-v1',
 'GW190424_180648-v1',
 'GW190425-v1',
 'GW190425-v2',
 'GW190426_152155-v1',
 'GW190503_185404-v1',
 'GW190512_180714-v1',
 'GW190513_205428-v1',
 

This includes _everything_ available, including datasets of different types:

- `event`: data around individual signal detections
- `run`: bulk data for an entire observing period
- `catalog`: groups of detections (roughly grouped by observing period)

To see just the 'event' datasets:

In [6]:
print(datasets.find_datasets(type="event"))

['151008-v1',
 '151012.2-v1',
 '151116-v1',
 '161202-v1',
 '161217-v1',
 '170208-v1',
 '170219-v1',
 '170405-v1',
 '170412-v1',
 '170423-v1',
 '170616-v1',
 '170630-v1',
 '170705-v1',
 '170720-v1',
 '190924_232654-v1',
 '191223_014159-v1',
 '191225_215715-v1',
 '200114_020818-v1',
 '200214_224526-v1',
 'GRB051103-v1',
 'GW150914-v1',
 'GW150914-v2',
 'GW150914-v3',
 'GW151012-v1',
 'GW151012-v2',
 'GW151012-v3',
 'GW151226-v1',
 'GW151226-v2',
 'GW170104-v1',
 'GW170104-v2',
 'GW170608-v1',
 'GW170608-v2',
 'GW170608-v3',
 'GW170729-v1',
 'GW170809-v1',
 'GW170814-v1',
 'GW170814-v2',
 'GW170814-v3',
 'GW170817-v1',
 'GW170817-v2',
 'GW170817-v3',
 'GW170818-v1',
 'GW170823-v1',
 'GW190408_181802-v1',
 'GW190412-v1',
 'GW190412-v2',
 'GW190412-v3',
 'GW190413_052954-v1',
 'GW190413_134308-v1',
 'GW190421_213856-v1',
 'GW190424_180648-v1',
 'GW190425-v1',
 'GW190425-v2',
 'GW190426_152155-v1',
 'GW190503_185404-v1',
 'GW190512_180714-v1',
 'GW190513_205428-v1',
 'GW190514_065416-v1',
 '

Here we can see the success of gravitational-wave detectors just by the number of different event datasets available.

Those with the prefix `GW` are so-called 'confident' detections where we are sure that the signal came from a real merger event, and those without are 'marginal' detections where we aren't so sure (but hopeful!).

Let's see the different catalogues that are available:

In [7]:
print(datasets.find_datasets(type='catalog'))

['GWTC-1-confident',
 'GWTC-1-marginal',
 'GWTC-2',
 'Initial_LIGO_Virgo',
 'O1_O2-Preliminary',
 'O3_Discovery_Papers',
 'O3_IMBH_marginal']

We can filter the events by name and detector to see just the datasets for confident events that were seen by the LIGO-Livingston detector (labelled _L1_):

In [8]:
l1events = datasets.find_datasets(type='event', match="GW", detector='L1')
print(l1events)

['GW150914-v1',
 'GW150914-v2',
 'GW150914-v3',
 'GW151012-v1',
 'GW151012-v2',
 'GW151012-v3',
 'GW151226-v1',
 'GW151226-v2',
 'GW170104-v1',
 'GW170104-v2',
 'GW170608-v1',
 'GW170608-v2',
 'GW170608-v3',
 'GW170729-v1',
 'GW170809-v1',
 'GW170814-v1',
 'GW170814-v2',
 'GW170814-v3',
 'GW170817-v1',
 'GW170817-v2',
 'GW170817-v3',
 'GW170818-v1',
 'GW170823-v1',
 'GW190408_181802-v1',
 'GW190412-v1',
 'GW190412-v2',
 'GW190412-v3',
 'GW190413_052954-v1',
 'GW190413_134308-v1',
 'GW190421_213856-v1',
 'GW190424_180648-v1',
 'GW190425-v1',
 'GW190425-v2',
 'GW190426_152155-v1',
 'GW190503_185404-v1',
 'GW190512_180714-v1',
 'GW190513_205428-v1',
 'GW190514_065416-v1',
 'GW190517_055101-v1',
 'GW190519_153544-v1',
 'GW190521-v1',
 'GW190521-v2',
 'GW190521-v3',
 'GW190521_074359-v1',
 'GW190527_092055-v1',
 'GW190602_175927-v1',
 'GW190620_030421-v1',
 'GW190630_185205-v1',
 'GW190701_203306-v1',
 'GW190706_222641-v1',
 'GW190707_093326-v1',
 'GW190708_232457-v1',
 'GW190719_215514-v1'

<div class="alert alert-warning">If you look closely, you will see that some datasets are just different versions of data for the same event, e.g. <code>GW170814-v1</code>, <code>GW170814-v2</code>, and <code>GW170814-v3</code>, so be aware that not each of these represents a unique astrophysical phenomenon.

## Querying for event information

As well as the `find_datasets` function, the `gwosc.datasets` module provides utilities for getting useful information about individual events, including the event time: 

In [9]:
print(datasets.event_gps('GW150914'))

1126259462.4

All of these times are returned in the GPS time system, which counts the number of seconds that have elapsed since the start of the GPS epoch at midnight (00:00) on January 6th 1980. GWOSC provides a [GPS time converter](https://www.gw-openscience.org/gps/) you can use to translate into datetime, or you can use [`gwpy.time`](https://gwpy.github.io/docs/stable/time/).

## Querying for data files

Most of the time, what you really want is the data, not just metadata about the catalogs.
The `gwosc.locate` module provides a function to discover the URLs of actual data files hosted by GWOSC.

For event datasets, you just need to pass the event name to the `get_event_urls` function:

In [10]:
from gwosc import locate
urls = locate.get_event_urls("GW150914")
print(urls)

['https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/H-H1_GWOSC_4KHZ_R1-1126259447-32.hdf5',
 'https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/H-H1_GWOSC_4KHZ_R1-1126257415-4096.hdf5',
 'https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5',
 'https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126257415-4096.hdf5']

By default, this function returns all of the files associated with a given event, which may not be particularly helpful. However, we can can filter on any of these by using keyword arguments, for example to get the URL for the 32-second file for the LIGO-Livingston detector:

In [11]:
urls = locate.get_event_urls('GW150914', duration=32, detector='L1')
print(urls)

['https://www.gw-openscience.org/eventapi/json/GWTC-1-confident/GW150914/v3/L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5']

The [HDF5](https://www.hdfgroup.org/solutions/hdf5/) file linked here is a publicly-available file that contains _real_ data from the LIGO-Livingston gravitational-wave detector - real data that includes a _real_ gravitational-wave signal from the first-ever direct observation of a binary black hole!

## Recap

What have we learned:

- the `gwosc` Python package provides a programmatic way of querying for GW open datasets
- it can be used to discover data for 'events', 'catalogs' and 'runs'
- it provides utility functions to get the GPS time for an event, or the URL of data files

In the next tutorial we will learn how the GWpy Python package can be used to actually download and interact with these data.

<a class="btn btn-primary" href="./2-GWpy.ipynb" target="_blank" role="button">Click here</a> to open the next notebook.