# Datasets - Reduced data, IRFs, models 

## Introduction

`gammapy.datasets` are a crucial part of the gammapy API. `datasets` constitute `DL4` data - binned counts, IRFs, models and the associated likelihoods. `Datasets` from the end product of the `makers` stage, see [makers notebook](makers.ipynb), and are passed on to the `Fit` or estimator classes for modelling and fitting purposes.

To find the different types of `Dataset` that are supported see [Datasets home](../../datasets/index.rst#Types-of-supported-datasets)


## Setup

In [None]:
import numpy as np
import astropy.units as u
from astropy.time import Time
from regions import CircleSkyRegion
from astropy.coordinates import SkyCoord
from gammapy.datasets import (
    MapDataset,
    SpectrumDataset,
    Datasets,
    FluxPointsDataset,
)
from gammapy.data import DataStore
from gammapy.maps import WcsGeom, RegionGeom, MapAxes, MapAxis, Map
from gammapy.modeling.models import SkyModel, PowerLawSpectralModel
from gammapy.estimators import FluxPoints

## MapDataset

The counts, exposure, background, masks, and IRF maps are bundled together in a data structure named `MapDataset`. While the `counts`, and `background` maps are binned in reconstructed energy and must have the same geometry, the IRF maps can have a different spatial (coarsely binned and larger) geometry and spectral range (binned in true energies). It is usually recommened that the true energy bin should be larger and more finely sampled and the reco energy bin.

### Creating an empty dataset 

An empty `MapDataset` can be instantiated from any `WcsGeom` object. Binnings of the all IRF axes can be individually configued, otherwise, internal defaults will be selected.

In [None]:
energy_axis = MapAxis.from_energy_bounds(
    1, 10, nbin=11, name="energy", unit="TeV"
)

geom = WcsGeom.create(
    skydir=(83.63, 22.01),
    axes=[energy_axis],
    width=5 * u.deg,
    binsz=0.05 * u.deg,
    frame="icrs",
)

energy_axis_true = MapAxis.from_energy_bounds(
    0.1, 100, nbin=11, name="energy_true", unit="TeV", per_decade=True
)

rad_axis = MapAxis.from_bounds(0, 5, nbin=50, unit="deg", name="rad")

dataset_empty = MapDataset.create(
    geom=geom,
    energy_axis_true=energy_axis_true,
    rad_axis=rad_axis,
    binsz_irf=0.1,
)

In [None]:
dataset_empty.edisp

To see the geometry of each map, we can

In [None]:
dataset_empty.geoms

To see how to use the `dataset_empty` in the data reduction process, please see the [makers notebook](makers.ipynb)

### Reading and write datasets

Datasets can be read from and saved to disk using the `read` and write commands. This saves the various `Map` attributes of the dataset as different HDUs of a single FITS file. The Maps are currently stored according to the [gadf specifications for skymaps](https://gamma-astro-data-formats.readthedocs.io/en/latest/skymaps/index.html)

In [None]:
dataset = MapDataset.read(
    "$GAMMAPY_DATA/cta-1dc-gc/cta-1dc-gc.fits.gz", name="test"
)

**Note**: The dataset name is a very important attribute. They act as unique identifier for a dataset within `datasets`. No two datasets can have the same name. Models are linked to datasets through the dataset name. See,  [model management](model_management.ipynb) for details

## Accessing contents of a dataset

To explore the contents of a `Dataset`, you can simply

In [None]:
print(dataset)

In [None]:
# For a quick info, use
dataset.info_dict()

In [None]:
# To access the individual components of a dataset, eg background, you can simply
dataset.background

`Dataset.background` contains the background map computed from the IRF.
To see the model corrected background, use `dataset.npred_background()`. 
To compute the predicted counts from a particluar source model, use `dataset.npred_signal(model_name)` 

*Note* - The reduced IRFs, counts, backgrounds and the model predicted counts, ie, npred(), are all stored as `maps` on a dataset. Standard `Map` operations can be performed on these, eg, see: [maps notebook](maps.ipynb). The `psf` and `edisp` are stored as `~gammapy.irf.PSFKernelMap` and `~gammapy.irf.EDispKernelMap`, respectively, see the associted documentation for further details.

### Using masks

There are two masks that can be set on a `Dataset`, `mask_safe` and `mask_fit`. 

- The `mask_safe` is computed during the data reduction process according to the specified selection cuts, and should not be changed by the user.
- During modelling and fitting, the user might want to additionally ignore some parts of a reduced dataset, e.g. to restrict the fit to a specific energy range or to ignore parts of the region of interest. This should be done by applying the `mask_fit`. To see details of applying masks, please refer to [Masks-for-fitting](mask_maps.ipynb#Masks-for-fitting:-mask_fit)

Both the `mask_fit` and `mask_safe` must have the safe `geom` as the `counts` and `background` maps.

In [None]:
# eg: to see the safe data range
dataset.mask_safe.plot_interactive(add_cbar=True);

In [None]:
# To apply a mask fit - in enegy and space
region = CircleSkyRegion(
    SkyCoord(2.1, 1.5, unit="deg", frame="galactic"), 0.7 * u.deg
)
mask_space = dataset.geoms["geom"].region_mask([region], inside=False)
mask_energy = dataset.geoms["geom"].energy_mask(0.6 * u.TeV, 4 * u.TeV)
mask = mask_space & mask_energy  # standard binary operations allowed on masks
dataset.mask_fit = mask
dataset.mask_fit.plot_grid();

To see the allowed energy ranges, you can use
- `dataset.energy_range_safe` : energy range allowed by the `mask_safe`
- `dataset.energy_range_fit` : energy range allowed by the `mask_fit`
- `dataset.energy_range` : the final energy range used in likelihood computation

These methods return two maps, with the `min` and `max` energy values at each spatial pixel

In [None]:
dataset.energy_range

In [None]:
# To see the lower energy threshold at each point
dataset.energy_range[0].plot(add_cbar=True)

### Downsampling datasets

It can often be useful to coarsely rebin an initially computed datasets by a specfied factor. The number of counts. are preserved. By default only spatial axes are downsampled, but additional axes can be specified, eg

In [None]:
downsampled_dataset = dataset.downsample(
    factor=10, axis_name="energy", name="downsampled_dataset"
)

In [None]:
print(downsampled_dataset, dataset)

## SpectrumDataset

`SpectrumDataset` inherits from a `MapDataset`, and is specially adapted for 1D spectral analysis, and uses a `RegionGeom` instead of a `WcsGeom`. 
A `MapDatset` can be converted to a `SpectrumDataset`, by summing the `counts` and `background` inside the `on_region`, which can then be used for classical spectral analysis. Containment correction is feasible only for circular regions.

In [None]:
on_region = CircleSkyRegion(
    SkyCoord(0, 0, unit="deg", frame="galactic"), 0.5 * u.deg
)
spectrum_dataset = dataset.to_spectrum_dataset(
    on_region, containment_correction=True
)

In [None]:
# For a quick look
spectrum_dataset.peek();

A `MapDataset` can also be integrated over the `on_region` to create a `MapDataset` with a `RegionGeom`. Complex regions can be handled and since the full IRFs are used, containment correction is not required. 

In [None]:
reg_dataset = dataset.to_region_map_dataset(on_region, name="RegionMapDS")
print(reg_dataset)

## FluxPointsDataset

`FluxPointsDataset` is a `Dataset` container for precomputed flux points, which can be then used in fitting.
`FluxPointsDataset` cannot be read directly, but should be read through `FluxPoints`, with an additional `SkyModel`. Similarly, `FluxPointsDataset.write` only saves the `data` component to disk. 

In [None]:
flux_points = FluxPoints.read(
    "$GAMMAPY_DATA/tests/spectrum/flux_points/diff_flux_points.fits"
)
model = SkyModel(spectral_model=PowerLawSpectralModel())
fp_dataset = FluxPointsDataset(data=flux_points, models=model)

The masks on `FluxPointsDataset` are `np.array` and the data is a `FluxPoints` object. The `mask_safe`, by default, masks the upper limit points

In [None]:
fp_dataset.mask_safe  # Note: the mask here is simply a numpy array

In [None]:
fp_dataset.data  # is a `FluxPoints` object

In [None]:
fp_dataset.data_shape()  # number of data points

For an example of fitting `FluxPoints`, see [flux point fitting](../analysis/1D/sed_fitting), and can be used for catalog objects, eg see [catalog notebook](catalog.ipynb)

## Datasets

`Datasets` are a collection of `Dataset` objects. They can be of the same type, or of different types, eg: mix of `FluxPointDataset`, `MapDataset` and `SpectrumDataset`. 

For modelling and fitting of a list of `Dataset` objects, you can either
- Do a joint fitting of all the datasets together
- Stack the datasets together, and then fit them.

`Datasets` is a convenient tool to handle joint fitting of simlutaneous datasets. As an example, please see the [joint fitting tutorial](../3D/analysis_mwl.ipynb)

To see how stacking is performed, please see [Implementation of stacking](../../datasets/index.html#stacking-multiple-datasets)

To create a `Datasets` object, pass a list of `Dataset` on init, eg

In [None]:
# Create some dummy datasets for example purposes
dataset1 = dataset.copy(name="dataset1")
dataset2 = dataset.copy(name="dataset2")
dataset3 = dataset.copy(name="dataset3")

In [None]:
datasets = Datasets([dataset1, dataset2, dataset3])

In [None]:
datasets.info_table()  # quick info of all datasets

In [None]:
datasets.names  # unique name of each dataset

Normal list operations work on `Datasets`, so 

In [None]:
dataset0 = datasets[0]  # extracts the first dataset

To select certain datasets within a given time interval, pass `astropy.time.Time` objects to `Datasets.select_time()`

In [None]:
datasets_sub = datasets.select_time(
    time_min=Time(51544, format="mjd"), time_max=Time(51554, format="mjd")
)

If all the datasets have equivalent geometries, they can be stacked together

In [None]:
stacked = datasets.stack_reduce(name="stacked")
print(stacked)

In [None]:
# Use python list convention to remove/add datasets, eg:
datasets.remove("dataset2")
datasets.names

In [None]:
datasets.append(dataset2)
datasets.names