# 1. Data reduction for a one-dimensional analysis

In this notebook, we will perform the **data reduction**, that is, we will obtain from our event list and instrument response function (the content of the DL3 files), a binned information that can be used to extract a scientific result (e.g. a spectrum or a light curve). We will reduce data from the MAGIC, H.E.S.S., and LST-1 telescopes.

In [None]:
# - basic dependencies
import numpy as np
import astropy.units as u
from astropy.coordinates import SkyCoord, Angle
from regions import PointSkyRegion, CircleSkyRegion
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from pathlib import Path

# - Gammapy dependencies
from gammapy.data import DataStore
from gammapy.maps import Map, MapAxis, RegionGeom
from gammapy.datasets import Datasets, SpectrumDataset
from gammapy.makers import (
    ReflectedRegionsFinder,
    ReflectedRegionsBackgroundMaker,
    SafeMaskMaker,
    SpectrumDatasetMaker,
    WobbleRegionsFinder,
)

# - this repo dependencies
from utils import plot_on_off_regions

The analysis we aim to perform in this tutorial is a point-like or one-dimensional analysis. In general, depending on the region of the sky observed, gamma-ray data might contain emission from different sources. Therefore, when interpreting the observations, one should account for several sources in the model, considering eventually also their extension. The observed gamma-ray events, in this most general case, are  binned in so-called _data cubes_, that is three-dimensional histograms of coordinates and energy. We will perform this analysis in the last notebook of this tutorial series. This type of analysis is referred to as _three-dimensional_ or _spectro-morphological_.

In [None]:
# read images
img_1 = mpimg.imread("figures/data_cube_grid.png")
img_2 = mpimg.imread("figures/data_cube.png")

# display images
fig, ax = plt.subplots(1, 2)
ax[0].imshow(img_1)
ax[0].axis("off")
ax[1].imshow(img_2)
ax[1].axis("off")
plt.show()

Images credit: Axel Donath.

Now, in several cases - for example in observations of small portions of the sky - it might happen that a single isolated gamma-ray source occupies the field of view. In that case a more simplified, _one-dimensional_ analysis, is adopted. We can consider a small region around the source nominal positions and consider only the events enclosed by it. We still have a binning in energy, as we want to estimate a spectrum, but in this case, being the F.o.V. mostly empty, the information from other regions of the sky is not relevant.

In [None]:
img_3 = mpimg.imread("figures/one_dimensional_analysis.png")

# display images
fig, ax = plt.subplots(figsize=(6, 4))
ax.imshow(img_3)
ax.axis("off")
plt.show()

This is what we have done in the previous exercise with the _aperture photometry_ technique. We have considered, to estimate the signal, only the counts coming from the small $0.2^{\circ}$-radius circle centred on the Crab Nebula. We do not discard the rest of the data, since - as we saw - this region contains a background that we have to estimate from some other position in the field of view. Still, a complete spatial description of the region observed is not needed.

In what follows, we will let `Gammapy` perform automatically the process of data reduction for us. In `Gammapy`'s terminology, we will move from **observations** to **datasets**, that contain the signal and background counts histograms and the IRF evaluated at the position of the source in the sky. In the next tutorial we will see that this is all we need to perform a statistical analysis. 

## 1.1. H.E.S.S. data reduction

The `gammapy.DataStore` objects allows us to read all the DL3 files (i.e. all the observations) in a directory. Let us start with the H.E.S.S. data.

In [None]:
hess_datastore = DataStore.from_dir("$GAMMAPY_DATA/hess-dl3-dr1/")
hess_datastore.obs_table

This table gives us an overview of the conditions of the different observations. We are interested in Crab Nebula observations, let us select the observation IDs corresponding to them:

In [None]:
crab_obs_mask = hess_datastore.obs_table["TARGET_NAME"] == "Crab"
obs_ids = hess_datastore.obs_table["OBS_ID"][crab_obs_mask]
print(obs_ids)

In [None]:
# let us get the `gammapy.Observation`s, the object representing the individual DL3 files
observations_hess = hess_datastore.get_observations(obs_ids)
print(observations_hess)

### Data reduction
This is where the data reduction starts. We will feed `Gammapy` with the ON region (the region from which we want to estimate the counts).

In [None]:
# define the on region
target_position = SkyCoord(ra=83.63333, dec=22.01444, unit="deg", frame="icrs")
on_region_radius = Angle("0.11 deg")
on_region = CircleSkyRegion(center=target_position, radius=on_region_radius)

In [None]:
energy_axis = MapAxis.from_energy_bounds(
    0.1, 40, nbin=10, per_decade=True, unit="TeV", name="energy"
)
energy_axis_true = MapAxis.from_energy_bounds(
    0.05, 100, nbin=20, per_decade=True, unit="TeV", name="energy_true"
)

geom = RegionGeom.create(region=on_region, axes=[energy_axis])
dataset_empty = SpectrumDataset.create(geom=geom, energy_axis_true=energy_axis_true)

dataset_maker = SpectrumDatasetMaker(
    containment_correction=True, selection=["counts", "exposure", "edisp"]
)
bkg_maker = ReflectedRegionsBackgroundMaker()
safe_mask_masker = SafeMaskMaker(methods=["aeff-max"], aeff_percent=10)

In [None]:
datasets_hess = Datasets()

for obs_id, observation in zip(obs_ids, observations_hess):
    dataset = dataset_maker.run(dataset_empty.copy(name=str(obs_id)), observation)
    dataset_on_off = bkg_maker.run(dataset, observation)
    dataset_on_off = safe_mask_masker.run(dataset_on_off, observation)
    datasets_hess.append(dataset_on_off)

print(datasets_hess)

Let us use the `Observation`s to display the process of signal extraction.

In [None]:
plot_on_off_regions(observations_hess[0], on_region)

In [None]:
plot_on_off_regions(observations_hess[2], on_region)

As explained in the previous tutorial, we have selected an __on region__, in red, to estimate the events coming from the source. In this region we have also background counts, i.e. events that are not real gamma rays. To subtract them, we estimate the background events from an __off region__ that do not contain any real source of gamma ray data. Previously, we considered only one background region symmetric to the ON, we now try to fit as many bakground regions as possible. We also have divided our data set in different energy bins, we perform this estimation for the events in each energy bin, for example considering all the events with energies in $[300, 1000]\,{\rm GeV}$:

In [None]:
plot_on_off_regions(observations_hess[1], on_region, energies=[300 * u.GeV, 1 * u.TeV])

We can thus build an histogram of the events in the _on_ and _off_ regions as a function of the energy

In [None]:
datasets_hess[2].plot_counts(
    kwargs_counts={"color": "crimson", "lw": 1.5, "label": "on region"},
    kwargs_background={"color": "dodgerblue", "lw": 1.5, "label": "off region"},
)
plt.show()

By subtracting the two in each of the energy bins, we can obtain the so called _excesses_, that is the counts that we estimate are coming from the source.

In [None]:
datasets_hess[2].plot_excess()
plt.show()

Let us check, beside the counts, what do the `Dataset`s we just created contain. We will see in the next tutorial that this is all we need to fit the spectrum. Let us also save to disk the reduced data, in the meanwhile.

In [None]:
datasets_hess[2].peek()

In [None]:
results_dir = Path("results/spectra/hess")
results_dir.mkdir(exist_ok=True, parents=True)

for observation, dataset in zip(observations_hess, datasets_hess):
    dataset.write(results_dir / f"pha_obs_{observation.obs_id}.fits", overwrite=True)

## 1.2. MAGIC data reduction


We perform the same data reduction we have performed on the H.E.S.S. data on the MAGIC data

In [None]:
data_store_magic = DataStore.from_dir("$GAMMAPY_DATA/magic/rad_max/data/")
observations_magic = data_store_magic.get_observations(required_irf="point-like")
print(observations_magic)

In [None]:
# define the on region as a PointSkyRegion now
target_position = SkyCoord(ra=83.63333, dec=22.01444, unit="deg", frame="icrs")
on_region = PointSkyRegion(target_position)

In [None]:
# true and estimated energy axes
energy_axis = MapAxis.from_energy_bounds(10, 1e5, nbin=20, unit="GeV", name="energy")
energy_axis_true = MapAxis.from_energy_bounds(
    10, 1e5, nbin=28, unit="GeV", name="energy_true"
)

# the geometry defines the array of ON counts
geom = RegionGeom.create(region=on_region, axes=[energy_axis])

dataset_empty = SpectrumDataset.create(geom=geom, energy_axis_true=energy_axis_true)

In [None]:
# the maker will actually fill the array of counts and compute the IRF
dataset_maker = SpectrumDatasetMaker(
    containment_correction=False, selection=["counts", "exposure", "edisp"]
)

# we need a RegionsFinder to find the OFF regions
# and a BackgroundMaker to fill the array of the OFF counts
region_finder = WobbleRegionsFinder(n_off_regions=1)
bkg_maker = ReflectedRegionsBackgroundMaker(region_finder=region_finder)

In [None]:
datasets_magic = Datasets()

for observation in observations_magic:
    # fill the ON counts array and compute the IRF at the observation offset
    dataset = dataset_maker.run(
        dataset_empty.copy(name=str(observation.obs_id)), observation
    )
    # fill the OFF counts
    dataset_on_off = bkg_maker.run(dataset, observation)
    datasets_magic.append(dataset_on_off)

Let us check now what the `Dataset`s we created contain

In [None]:
datasets_magic[0].peek()

Let us also save the MAGIC dataset to disk

In [None]:
results_dir = Path("results/spectra/magic")
results_dir.mkdir(exist_ok=True, parents=True)

for observation, dataset in zip(observations_magic, datasets_magic):
    dataset.write(results_dir / f"pha_obs_{observation.obs_id}.fits", overwrite=True)

## 1.3. LST data reduction
Finally, we repeat the same process for LST. Let us adopt the same configuration we adopted for MAGIC.

In [None]:
data_store_lst = DataStore.from_dir("../crab_lst_data/")
observations_lst = data_store_lst.get_observations(required_irf="point-like")
print(observations_lst)

In [None]:
datasets_lst = Datasets()

for observation in observations_lst:
    # fill the ON counts array and compute the IRF at the observation offset
    dataset = dataset_maker.run(
        dataset_empty.copy(name=str(observation.obs_id)), observation
    )
    # fill the OFF counts
    dataset_on_off = bkg_maker.run(dataset, observation)
    datasets_lst.append(dataset_on_off)

In [None]:
datasets_lst[0].peek()

In [None]:
results_dir = Path("results/spectra/lst")
results_dir.mkdir(exist_ok=True, parents=True)

for observation, dataset in zip(observations_lst, datasets_lst):
    dataset.write(results_dir / f"pha_obs_{observation.obs_id}.fits", overwrite=True)