# Creation of the CTA data sets

In this notebook, the IRFs of CTA are read and MapDatasets created for a given source. One can store the pseudodatasets to disk (will be written in /data/cta) by adjusting the `analysis_config.yml` (default: True).

You can plot the outcome with Plot CTA datasets.

In [None]:
import numpy as np
import pandas as pd

import astropy.units as u

from gammapy.datasets import MapDataset
from gammapy.data import Observation
from gammapy.maps import WcsGeom, MapAxis
from gammapy.makers import MapDatasetMaker, SafeMaskMaker
from gammapy.irf import load_cta_irfs

from os import path
import warnings
import sys

sys.path.append("../src")
from configure_analysis import AnalysisConfig
analysisconfig = AnalysisConfig()

from flux_utils import SourceModel

## Setup for pseudo datasets

With the IRFs and input models in hand, we generate 100 pseudo data sets for each source and instrument, both for the PD and IC model. For the CTA data sets we use an analysis geometry with 16 energy bins per decade between 0.1 TeV and 154 TeV and spatial bins of $0.02^\circ \times 0.02^\circ$. For each pseudo data set, we assume a total observation time of 200 hours, split equally between four pointing positions with $1^\circ$ offset with respect to the source position. The predicted number of source and background events are summed for each pixel and Poisson-distributed random counts are drawn based on those values. 

For this purpose we load publicly available CTA IRF.

In [None]:
irfs = load_cta_irfs(analysisconfig.get_file("cta/irfs/irf_file_new_CTA.fits"))
irfs.keys()

As can be seen CTA IRF consists of four standard parts:
- Effective Area (`aeff`)
- Energy Dispertion (`edisp`)
- Point Spread Function (`psf`)
- Background (`bkg`)

Each of these can be easily depicted using the built-in `peek` method:

In [None]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    irfs["aeff"].peek()

In [None]:
# The source can be set in the analysis_config.yml file
source_name = analysisconfig.get_source()

# default flux type is PD
model = SourceModel(sourcename=source_name)
src_pos = model.get_sourceposition
print("Working on source", source_name, "at position", src_pos)

First, get the sky coordinates, generate the pointings and define the map geometry.

In [None]:
# Defining map geometry for binned simulation
e_edges = np.logspace(
    analysisconfig.get_value("emin", "cta_datasets"), # logarithmic, in TeV
    analysisconfig.get_value("emax", "cta_datasets"), # bkg not properly defined above ~154 TeV 
    analysisconfig.get_value("nebin", "cta_datasets")) * u.TeV
energy_reco_axis = MapAxis.from_edges(e_edges, unit="TeV", name="energy", interp="log")

geom = WcsGeom.create(
    skydir=src_pos,
    binsz=analysisconfig.get_value("binwidth", "cta_datasets"),
    width=(6, 6),
    frame=analysisconfig.get_value("frame", "cta_datasets"),
    axes=[energy_reco_axis],
)

# 16 bins/decade is enough also for the true energy axis
energy_true_axis = MapAxis.from_edges(
    e_edges, unit="TeV", name="energy_true", interp="log"
)

We create 4 observations for the 4 pointing positions, each with 1/4 of the total live time (this is not realistic, but does not matter since observations are stacked in the next step).
Generation of the data set might take about 1min.

In [None]:
# Get 4 symmetric pointings each with 1deg offset from the source position
pointings = src_pos.directional_offset_by(
    analysisconfig.get_value("pointings", "cta_datasets") * u.deg, 
    analysisconfig.get_value("offset", "cta_datasets") * u.deg
)

In [None]:
# Generating datasets
stacked_dataset = MapDataset.create(
    geom, name="CTA-dataset-{}".format(source_name), energy_axis_true=energy_true_axis
)
dataset_maker = MapDatasetMaker(selection=["exposure", "background", "psf", "edisp"])
maker_safe_mask = SafeMaskMaker(methods=["offset-max"], offset_max=4.0 * u.deg)
livetime = analysisconfig.get_value("livetime", "cta_datasets")  * u.h

count = 0
for pointing in pointings:
    print ("Working on pointing", pointing)
    obs = Observation.create(
        pointing=pointing, 
        livetime=livetime/4, 
        irfs=irfs)
    with np.errstate(divide="ignore", invalid="ignore"):
        dataset = dataset_maker.run(stacked_dataset.copy(name="P{}".format(count)), obs)
    dataset = maker_safe_mask.run(dataset, obs)

    # in case the background model contains infinites
    assert np.isfinite(dataset.background_model.map.data[dataset.mask_safe.data]).all()
    dataset.background_model.map.data[~dataset.mask_safe.data] = 0.0

    # stack the datasets
    with np.errstate(divide="ignore", invalid="ignore"):
        stacked_dataset.stack(dataset)
    count += 1

In [None]:
# The data set is not included in the repo because of its size, but can be written to disk.
if analysisconfig.get_value("write_CTA_pseudodata", "io"):
    outfilename = analysisconfig.get_file(
        "cta/pseudodata/CTA_{}_{}{}_p4.fits.gz".format(
            source_name, int(livetime.value), livetime.unit)
    )
    stacked_dataset.write(outfilename, overwrite=True)