_This notebook was put together by [Keneth Garcia](https://stivengarcia7113.wixsite.com/kenethgarcia). Source and license info are on [GitHub](https://github.com/KenethGarcia/GRB_ML)._

# Swift Data Download
The Neil Gehrels Swift Observatory presents analysis results for the Swift/BAT Gamma-Ray Burst (GRBs) on [this website](https://swift.gsfc.nasa.gov/results/batgrbcat/) (open access).

This notebook summarizes how to download these data in different resolutions. Through this document, we are using the _python3_ implementations from the _scripts_ folder. It is necessary to have an internet connection and a _Jupyter Notebook_/_Python 3_ compiler software.

First, we need to import the **main.py** file to our notebook (and some packages needed):

In [1]:
from scripts import main
import os  # Packages needed in this notebook
import numpy as np

Then, create a new object from the `main.py` class and, if you need, set the download path (by default it will be the folder containing this notebook):

In [10]:
object1 = main.SwiftGRBWorker()
object1.data_path = r'G:\Mi unidad\Cursos\Master_Degree_Project\GRB_ML\Data'  # Change download path
object1.table_path = r'G:\Mi unidad\Cursos\Master_Degree_Project\GRB_ML\Tables'  # Change Table path
object1.original_data_path = r'G:\Mi unidad\Cursos\Master_Degree_Project\GRB_ML\Data\Original_Data'  # Change data directory path

Now, there are two different approaches to handling data download. Download only one GRB or the complete dataset.

## Changing the Swift GRB binning
By default, the following sections download the data for 64ms binning in Swift. There are some cases in which we need to use different data resolutions and binning; handling these situations can be solved in this package by managing the _resolution_ and _end_ variables.

Through this package, you can change the _resolution_ variable to $2$, $8$, $16$, $64$, and $256$ ms respectively. Additionally, you can set $1$ for 1s binning and change the end variable to "sn5_10s" to use data with a signal-to-noise ratio higher than 5 or 10 s binning (these data don't have uniform time spacing).

In [3]:
object1.res = 64  # Resolution for the Light Curve Data in ms, could be 2, 8, 16, 64 (default), 256 and 1 (this last in s)
# object1.end = "sn5_10s"  # Uncomment this line if you need to use signal-to-noise higher than 5 or 10s binning

It is advisable not to change both variables at the same time; this could cause unknown bugs when running package routines and sub-routines.

## Single data download
If you want, for example, to download data only for GRB060614, you can use the download_data instance. This function needs the name and Trigger ID associated with that GRB; we need to index these data from the `summary_general` table and pass them to the instance. Then, if you are using this notebook for the first time, you need to download some tables:

In [None]:
object1.summary_tables_download()

The next step is to get the GRB names and IDs from the _summary_general.txt_ table:

In [4]:
sum_general_path = os.path.join(object1.table_path, "summary_general.txt")
GRB_names, ids = np.genfromtxt(sum_general_path, delimiter="|", dtype=str, usecols=(0, 1), unpack=True, autostrip=True)
print(f"There are {len(GRB_names)} in summary table")
print(f"Names: {GRB_names}")
print(f"IDs: {ids}")

There are 1391 in summary table
Names: ['GRB200829A' 'GRB200819A' 'GRB200809B' ... 'GRB041219B' 'GRB041219A'
 'GRB041217']
IDs: ['993768' '992099' '987745' ... '100368' '100307' '100116']


and index the ID associated with GRB060614:

In [5]:
name = 'GRB060614'  # Change this name if you want another GRB
index = np.where(GRB_names == name)
ID, *other = ids[index]
print(f"{name} has {ID} as ID")

GRB060614 has 214805 as ID


Finally, use the `download_data` instance, passing the name and ID. This instance returns a tuple (name, result), where the result is None if there isn't any error during download, else return a string containing details about the error:

In [6]:
result_GRB = object1.download_data(name, ID)
print(f"{result_GRB[0]} has been downloaded") if not result_GRB[1] else print(f"Error downloading {GRB_names[index]} data: {result_GRB[1]}")

GRB060614 has been downloaded


At this point, there is a remark: **For some GRBs, there aren't any data due to Swift measure problems**. On June 27, 2022, there are (at least) 22 GRBs with this problem at 64ms binning:  _170131A, 160623A, 070125, 060123, 160409A, 140611A, 131031A, 130913A, 130518A, 120817B, 110604A, 101204A, 090827, 090720A, 071112C, 071028B, 071010C, 071006, 070227, 140909A, and 041219A._ If you get the _Not Found for url_ error, may be due to this.

## Multiple data download

This instance uses the same arguments as `data_download`: An array of GRB names and their IDs. Additionally, there is a boolean value named `error` to indicate if you want to save a report datafile.

To download the entire GRB dataset, you need only one line of code:

In [None]:
object1.so_much_downloads(GRB_names, ids)

Now, in the Data folder created, you can see the Original_Data subfolder and the "Errors_64ms.txt" summary file. Reading this file, we can check how many errors there are:

In [11]:
GRB_errors = np.genfromtxt(os.path.join(object1.original_data_path, f"Errors_{object1.end}.txt"), delimiter='|', dtype=str, unpack=True)[0]
print(f"There are {len(GRB_errors)} errors.")

There are 22 errors.


If you get some _HTTPSConnectionPool_ in the errors summary file, you can run the following code lines as many times as you need:

In [None]:
match = np.where(np.isin(GRB_names, GRB_errors))[0]  # Index the IDs of GRB Errors
object1.so_much_downloads(GRB_names[match], t_ids=ids[match])  # Try to re-download the GRBs

By this, we close this section by remarking that original size data can use 2.67GB of free space on disk approximately (in decompress mode). But, compressing data using `gzip` library, the size has been reduced by a factor of 3-4. Actually, there are:

In [12]:
size = 0  # Set size variable to zero
for path, dirs, files in os.walk(object1.original_data_path):  # Loop over the folder containing all data downloaded
    for f in files:  # Loop over files into folder
        fp = os.path.join(path, f)  # Join file name with folder path
        size += os.stat(fp).st_size  # Get file size and sum over previous size
print(f"There are {round(size / (1024 * 1024), 3)} MB of data")

There are 912.683 MB of data


In [None]:
output = os.system("jupyter nbconvert  README.ipynb --to markdown --output README.md --output-dir='G:\Mi unidad\Cursos\Master_Degree_Project\GRB_ML\Documentation'")
print(f"Readable file created") if output == 0 else None