# ABC Guide for XMM-Newton -- Part 2

---

#### Introduction
This tutorial is based on Chapter 6 from the The XMM-Newton ABC Guide prepared by the NASA/GSFC XMM-Newton Guest Observer Facility. This notebook assumes you are using the version of pySAS found on [GitHub](https://github.com/XMMGOF/pysas) and have already configured it to work with your SAS installation (see the [README on GitHub](https://github.com/XMMGOF/pysas/blob/main/README.md)). 
#### Expected Outcome
The ability to run source detection, spectra extraction, check for pile up, and preparing the spectra for analysis by creating a redistribution matrix file (RMF) and an ancillary response file (ARF).
#### SAS Tasks to be Used

- `evselect`[(Documentation for evselect)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/evselect/index.html)
- `edetect_chain`[(Documentation for edetect_chain)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/edetect_chain/index.html)
- `atthkgen `[(Documentation for atthkgen)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/atthkgen/index.html)
- `srcdisplay`[(Documentation for srcdisplay)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/srcdisplay/index.html)
- `epatplot`[(Documentation for epatplot)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/epatplot/index.html)
- `backscale`[(Documentation for backscale)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/backscale/index.html)
- `rmfgen`[(Documentation for rmfgen)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/rmfgen/index.html)
- `arfgen`[(Documentation for arfgen)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/arfgen/index.html)

#### Prerequisites
<div class="alert alert-block alert-info">
    <b>Note:</b> Before running this notebook, or even starting a Jupyter Lab session, HEASOFT has to be initialized. If you did not initalize HEASOFT before starting this Jupyter Lab session, or opening this notebook, please close this window and initalize HEASOFT (it is not possible to initalize HEASOFT from within a Jupyter Notebook). SAS defaults will need to be set as explained in the README on GitHub (https://github.com/XMMGOF/pysas/blob/main/README.md).
</div>

#### Useful Links

- [`pysas` Documentation](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/pysas/index.html "pysas Documentation")
- [`pysas` on GitHub](https://github.com/XMMGOF/pysas)
- [Common SAS Threads](https://www.cosmos.esa.int/web/xmm-newton/sas-threads "SAS Threads")
- [Users' Guide to the XMM-Newton Science Analysis System (SAS)](https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/sas_usg/USG/SASUSG.html "Users' Guide")
- [The XMM-Newton ABC Guide](https://heasarc.gsfc.nasa.gov/docs/xmm/abc/ "ABC Guide")
- [XMM Newton GOF Helpdesk](https://heasarc.gsfc.nasa.gov/docs/xmm/xmm_helpdesk.html "Helpdesk") - Link to form to contact the GOF Helpdesk.

#### Caveats
This tutorial uses an observation of the Lockman Hole (obsid = '0123700101').


##### Last Reviewed: _25 April 2024, for SAS v21_
##### Last Updated: _25 April 2024_
##### By: Ryan Tanner (ryan.tanner@nasa.gov)
---

In [None]:
# pySAS imports
import pysas
from pysas.wrapper import Wrapper as w

# Useful imports
import os, subprocess

# Imports for plotting
import matplotlib.pyplot as plt
from astropy.visualization import astropy_mpl_style
from astropy.io import fits
from astropy.wcs import WCS
from astropy.table import Table
plt.style.use(astropy_mpl_style)

Now we need to let pysas know which Obs ID we are working with. If you have already worked through Part 1 of this tutorial then when you create the odf object it will auto-detect the observation files.

In [None]:
obsid = '0123700101'
odf = pysas.odfcontrol.ODFobject(obsid)
odf.basic_setup(overwrite=False,repo='heasarc',
                rerun=False, epproc_args=['withoutoftime=yes'])

# File names for this notebook. The User can change these file names.
file_keys = list(odf.files.keys())
unfiltered_event_list = odf.files[file_keys[2]][0]
temporary_event_list = 'temporary_event_list.fits'
light_curve_file='mos1_ltcrv.fits'
gti_rate_file = 'gti_rate.fits'
filtered_event_list = 'filtered_event_list.fits'
attitude_file = 'attitude.fits'
soft_band_file = 'mos1-s.fits'
hard_band_file = 'mos1-h.fits'
mos_all_file = 'mos1-all.fits'
eml_list_file = 'emllist.fits'

---
If you have already worked through Part 1 of this tutorial you can skip the next cell. But if not, or if you want to run it again, the necessary code from Part 1 is in the cell below.

In [None]:
odf.basic_setup(overwrite=False,repo='heasarc',
                rerun=False, epproc_args=['withoutoftime=yes'])

os.chdir(odf.work_dir)

# "Standard" Filter
if not os.path.exists(temporary_event_list):
    inargs = ['table={0}'.format(unfiltered_event_list), 
              'withfilteredset=yes', 
              "expression='(PATTERN <= 12)&&(PI in [200:4000])&&#XMMEA_EM'", 
              'filteredset={0}'.format(temporary_event_list), 
              'filtertype=expression', 
              'keepfilteroutput=yes', 
              'updateexposure=yes', 
              'filterexposure=yes']
    
    w('evselect', inargs).run()
else:
    print('File {0} found. Not applying the standard filter again.'.format(temporary_event_list))

# Make Light Curve File
if not os.path.exists(light_curve_file):
    inargs = ['table={0}'.format(temporary_event_list), 
              'withrateset=yes', 
              'rateset={0}'.format(light_curve_file), 
              'maketimecolumn=yes', 
              'timecolumn=TIME', 
              'timebinsize=100', 
              'makeratecolumn=yes']
    
    w('evselect', inargs).run()
else:
    print('File {0} found. Not making the light curve again.'.format(light_curve_file))

# Make Secondary GTI File
if not os.path.exists(gti_rate_file):
    inargs = ['table={0}'.format(light_curve_file), 
              'gtiset={0}'.format(gti_rate_file),
              'timecolumn=TIME', 
              "expression='(RATE <= 6)'"]
    
    w('tabgtigen', inargs).run()
else:
    print('File {0} found. Not making GTI rate file again.'.format(gti_rate_file))

# Filter Using Secondary GTI File
if not os.path.exists(filtered_event_list):
    inargs = ['table={0}'.format(temporary_event_list),
              'withfilteredset=yes', 
              "expression='GTI({0},TIME)'".format(gti_rate_file), 
              'filteredset={0}'.format(filtered_event_list),
              'filtertype=expression', 
              'keepfilteroutput=yes',
              'updateexposure=yes', 
              'filterexposure=yes']
    
    w('evselect', inargs).run()
else:
    print('File {0} found. Not running evselect filter again.'.format(filtered_event_list))

### 6.6 : Source Detection with `edetect_chain`

The `edetect_chain` task does nearly all the work involved with EPIC source detection. It can process up to three intruments (both MOS cameras and the PN) with up to five images in different energy bands simultaneously. All images must have identical binning and WCS keywords. For this example, we will perform source detection on MOS1 images in two bands ("soft" X-rays with energies between 300 and 2000 eV, and "hard" X-rays, with energies between 2000 and 10000 eV) using the filtered event files produced here.

We will start by generating some files that `edetect_chain` needs: an attitude file and images of the sources in the desired energy bands, with the image binning sizes as needed according to the detector. For the MOS, the we'll let the binsize be 22.

In [None]:
os.chdir(odf.work_dir)

if not os.path.exists(attitude_file):
    inargs = ['atthkset={0}'.format(attitude_file),
              'timestep=1']
    
    w('atthkgen', inargs).run()
else:
    print('File {0} found. Not making attitude file again.'.format(attitude_file))


# Soft band selection 300-2000 eV
if not os.path.exists(soft_band_file):
    inargs = ['table={0}'.format(filtered_event_list),
              'withimageset=yes',
              'imageset={0}'.format(soft_band_file),
              'imagebinning=binSize',
              'xcolumn=X',
              'ximagebinsize=22',
              'ycolumn=Y',
              'yimagebinsize=22',
              'filtertype=expression',
              "expression='(FLAG == 0)&&(PI in [300:2000])'"]
    
    w('evselect', inargs).run()
else:
    print('File {0} found. Not making the soft band image file again.'.format(soft_band_file))


# Hard band selection 2000-10000 eV
if not os.path.exists(hard_band_file):
    inargs = ['table={0}'.format(filtered_event_list),
              'withimageset=yes',
              'imageset={0}'.format(hard_band_file),
              'imagebinning=binSize',
              'xcolumn=X',
              'ximagebinsize=22',
              'ycolumn=Y',
              'yimagebinsize=22',
              'filtertype=expression',
              "expression='(FLAG == 0)&&(PI in [2000:10000])'"]
    
    w('evselect', inargs).run()
else:
    print('File {0} found. Not making the hard band image file again.'.format(hard_band_file))

if not os.path.exists(mos_all_file):
    inargs = ['table={0}'.format(filtered_event_list),
              'withimageset=yes',
              'imageset={0}'.format(mos_all_file),
              'imagebinning=binSize',
              'xcolumn=X',
              'ximagebinsize=22',
              'ycolumn=Y',
              'yimagebinsize=22',
              'filtertype=expression',
              "expression='(FLAG == 0)&&(PI in [300:10000])'"]
    
    w('evselect', inargs).run()
else:
    print('File {0} found. Not making the full band image file again.'.format(mos_all_file))

if not os.path.exists(eml_list_file):
    inargs = ["imagesets='{0} {1}'".format(soft_band_file,hard_band_file),
              "eventsets='{0}'".format(filtered_event_list),
              'attitudeset={0}'.format(attitude_file),
              "pimin='300 2000'",
              "pimax='2000 10000'",
              'likemin=10',
              'witheexpmap=yes',
              "ecf='0.878 0.220'",
              'eboxl_list=eboxlist_l.fits',
              'eboxm_list=eboxlist_m.fits',
              'eml_list={0}'.format(eml_list_file),
              'esp_withootset=no']
    
    w('edetect_chain', inargs).run()
else:
    print('File {0} found. Not running edetect_chain again.'.format(eml_list_file))

In [None]:
inargs = ['boxlistset={0}'.format(eml_list_file),
          'imageset={0}'.format(mos_all_file),
          'withimageset=yes',
          'regionfile=regionfile.txt',
          'sourceradius=0.01',
          'withregionfile=yes']

w('srcdisplay', inargs).run()

### 6.7 Extract the Source and Background Spectra for a Single Region

Throughout the following, please keep in mind that some parameters are instrument-dependent. The parameter `specchannelmax` should be set to 11999 for the MOS, or 20479 for the PN. Also, for the PN, the most stringent filters, `(FLAG==0)&&(PATTERN<=4)`, must be included in the expression to get a high-quality spectrum.

For the MOS, the standard filters should be appropriate for many cases, though there are some instances where tightening the selection requirements might be needed. For example, if obtaining the best-possible spectral resolution is critical to your work, and the corresponding loss of counts is not important, only the single pixel events should be selected `(PATTERN==0)`. If your observation is of a bright source, you again might want to select only the single pixel events to mitigate pile up (see §6.8 and §6.9 for a more detailed discussion).

In any case, you'll need to know spatial information about the area over which you want to extract the spectrum, so display the filtered event file with ds9.

Select the object whose spectrum you wish to extract. This will produce a circle (extraction region), centered on the object. The circle's radius can be changed by clicking on it and dragging to the desired size. Adjust the size and position of the circle until you are satisfied with the extraction region; then, double-click on the region to bring up a window showing the center coordinates and radius of the circle. For this example, we will choose the source at (26188.5,22816.5) and set the extraction radius to 300 (in physical units). `expression='((X,Y) in CIRCLE(26188.5,22816.5,300))'`

The inputs for `evselect` to extract the source spectra are as follows.

	table - the event file
	energycolumn - energy column
	withfilteredset - make a filtered event file
	keepfilteroutput - keep the filtered file
	filteredset - name of output file
	filtertype - type of filter
	expression - expression to filter by
	withspectrumset - make a spectrum
	spectrumset - name of output spectrum
	spectralbinsize - size of bin, in eV
	withspecranges - covering a certain spectral range
	specchannelmin - minimum of spectral range
	specchannelmax - maximum of spectral range

When extracting the background spectrum, follow the same procedures, but change the extraction area. For example, make an annulus around the source; this can be done using two circles, each defining the inner and outer edges of the annulus, then change the filtering expression (and output file name) as necessary. `expression='((X,Y) in CIRCLE(26188.5,22816.5,1500))&&!((X,Y) in CIRCLE(26188.5,22816.5,500))'`

Below we extract both the source and the background spectra. The keywords are as described above.ds are as described above.

In [None]:
filtered_source = 'mos1_filtered.fits'
filtered_bkg = 'bkg_filtered.fits'
source_spectra_file = 'mos1_pi.fits'
bkg_spectra_file = 'bkg_pi.fits'

inargs = {'table': '{0}'.format(filtered_event_list),
          'energycolumn': 'PI',
          'withfilteredset': 'yes',
          'filteredset': '{0}'.format(filtered_source),
          'keepfilteroutput': 'yes',
          'filtertype': 'expression',
          'expression': "'((X,Y) in CIRCLE(26188.5,22816.5,300))'",
          'withspectrumset': 'yes',
          'spectrumset': '{0}'.format(source_spectra_file),
          'spectralbinsize': '5',
          'withspecranges': 'yes',
          'specchannelmin': '0',
          'specchannelmax': '11999'}

w('evselect', inargs).run()

inargs = {'table': '{0}'.format(filtered_event_list),
          'energycolumn': 'PI',
          'withfilteredset': 'yes',
          'filteredset': '{0}'.format(filtered_bkg),
          'keepfilteroutput': 'yes',
          'filtertype': 'expression',
          'expression': "'((X,Y) in CIRCLE(26188.5,22816.5,1500))&&!((X,Y) in CIRCLE(26188.5,22816.5,500))'",
          'withspectrumset': 'yes',
          'spectrumset': '{0}'.format(bkg_spectra_file),
          'spectralbinsize': '5',
          'withspecranges': 'yes',
          'specchannelmin': '0',
          'specchannelmax': '11999'}

w('evselect', inargs).run()

### 6.8 Check for Pile Up

Depending on how bright the source is and what modes the EPIC detectors are in, event pile up may be a problem. Pile up occurs when a source is so bright that incoming X-rays strike two neighboring pixels or the same pixel in the CCD more than once in a read-out cycle. In such cases the energies of the two events are in effect added together to form one event. If this happens sufficiently often, 
1) the spectrum will appear to be harder than it actually is, and
2) the count rate will be underestimated, since multiple events will be undercounted.

To check whether pile up may be a problem, use the SAS task epatplot. Heavily piled sources will be immediately obvious, as they will have a "hole" in the center, but pile up is not always so conspicuous. Therefore, we recommend to always check for it.

<div class="alert alert-block alert-info">
    <b>Note:</b> This procedure requires as input the event file for the source created when the spectrum was made (i.e. 'filtered_source'), not the usual time-filtered event file (i.e. 'filtered_event_list').
</div>

To check for pile up in our Lockman Hole example, run the following cell:

In [None]:
inargs = ['set={0}'.format(filtered_source),
          'plotfile=mos1_epat.ps',
          'useplotfile=yes',
          'withbackgroundset=yes',
          'backgroundset={0}'.format(filtered_bkg)]

w('epatplot', inargs).run()

where

    set - input events file 
    plotfile - output postscript file 
    useplotfile - flag to use file name from "plotfile" 
    withbackgroundset - use background event set for background subtraction? 
    backgroundset - name of background event file

The output of epatplot is a postscript file, mos1_epat.ps, which may be viewed with a postscript viewer such as `gv` (i.e. 'ghostscript viewer', install from a terminal using `sudo apt install gv`), containing two graphs describing the distribution of counts as a function of PI channel.

If you have `gv` installed on your computer the following cell will open the plot.

In [None]:
import subprocess
gv_out = subprocess.run(['gv','mos1_epat.ps'],stdout = subprocess.DEVNULL)

A few words about interpretting the plots are in order. The top is the distribution of counts versus PI channel for each pattern class (single, double, triple, quadruple), and the bottom is the expected pattern distribution (smooth lines) plotted over the observed distribution (histogram). The lower plot shows the model distributions for single and double events and the observed distributions. It also gives the ratio of observed-to-modeled events with 1-$\sigma$ uncertainties for single and double pattern events over a given energy range. (The default is 0.5-2.0 keV; this can be changed with the pileupnumberenergyrange parameter.) If the data is not piled up, there will be good agreement between the modeled and observed single and double event pattern distributions. Also, the observed-to-modeled fractions for both singles and doubles in the 0.5-2.0 keV range will be unity, within errors. In contrast, if the data is piled up, there will be clear divergence between the modeled and observed pattern distributions, and the observed-to-modeled fraction for singles will be less than 1.0, and for doubles, it will be greater than 1.0.

Finally, when examining the plots, it should noted that the observed-to-modeled fractions can be inaccurate. Therefore, the agreement between the modeled and observed single and double event pattern distributions should be the main factor in determining if an observation is affected by pile up or not.

The source used in our Lockman Hole example is too faint to provide reasonable statistics for epatplot and is far from being affected by pile up. For comparison, an example of a bright source (Mkn 421, Obs ID: 0136541101) which is strongly affected by pileup is shown below. Note that the observed-to-model fraction for doubles is over 1.0, and there is severe divergence between the model and the observed pattern distribution.

<center><img src="pile_up_Mkn_421.png"/></center>

### 6.9 My Observation is Piled Up! Now What?

If you're working with a different (much brighter) dataset that does show signs of pile up, there are a few ways to deal with it. First, using the region selection and event file filtering procedures demonstrated in earlier sections, you can excise the inner-most regions of a source (as they are the most heavily piled up), re-extract the spectrum, and continue your analysis on the excised event file. For this procedure, it is recommended that you take an iterative approach: remove an inner region, extract a spectrum, check with epatplot, and repeat, each time removing a slightly larger region, until the model and observed distribution functions agree. If you do this, be aware that removing too small a region with respect to the instrumental pixel size (1.1'' for the MOS, 4.1'' for the PN) can introduce systematic inaccuracies when calculating the source flux; these are less than 4%, and decrease to less than 1% when the excised region is more than 5 times the instrumental pixel half-size. In any case, be certain that the excised region is larger than the instrumental pixel size!

You can also use the event file filtering procedures to include only single pixel events (PATTERN==0), as these events are less sensitive to pile up than other patterns.

### 6.10 Determine the Spectrum Extraction Areas

Now that we are confident that our spectrum is not piled up, we can continue by finding the source and background region areas. This is done with the task `backscale`, which takes into account any bad pixels or chip gaps, and writes the result into the BACKSCAL keyword of the spectrum table. Alternatively, we can skip running backscale, and use a keyword in arfgen below. We will show both options for the curious.

The inputs for `backscale` are:

    spectrumset - spectrum file
    badpixlocation - event file containing the bad pixels

To find the source and background extraction areas explicitly,

In [None]:
inargs = ['spectrumset={0}'.format(source_spectra_file),
          'badpixlocation=mos1_filt_time.fits']

w('backscale', inargs).run()

inargs = ['spectrumset={0}'.format(bkg_spectra_file),
          'badpixlocation=mos1_filt_time.fits']

w('backscale', inargs).run()

### 6.11 Create the Photon Redistribution Matrix (RMF) and Ancillary File (ARF)

Now that a source spectrum has been extracted, we need to reformat the detector response by making a redistribution matrix file (RMF) and ancillary response file (ARF). To make the RMF we use `rmfgen`. The input arguments are:

    rmfset - output file
    spectrumset - input spectrum file

Now we can use `arfgen` with the RMF, spectrum, and event file to make the ancillary file (ARF). The input arguments are:

    arfset - output ARF file name
    spectrumset - input spectrum file name
    withrmfset - flag to use the RMF
    rmfset - RMF file created by rmfgen
    withbadpixcorr - flag to include the bad pixel correction
    badpixlocation - file containing the bad pixel information; should be set to the event file from which the spectrum was extracted.
    setbackscale - flag to calculate the area of the source region and write it to the BACKSCAL keyword in the spectrum header
    
At this point, the spectrum stored in the file `mos1_pi.fits` is ready to be analyzed using an analysis package such as XSPEC.

In [None]:
inargs = {'rmfset': 'mos1_rmf.fits',
          'spectrumset': '{0}'.format(source_spectra_file)}

w('rmfgen', inargs).run()

inargs = {}
inargs = {'arfset': 'mos1_arf.fits',
          'spectrumset': '{0}'.format(source_spectra_file),
          'withrmfset': 'yes',
          'rmfset': 'mos1_rmf.fits',
          'withbadpixcorr': 'yes',
          'badpixlocation': 'mos1_filt_time.fits',
          'setbackscale': 'yes'}

w('arfgen', inargs).run()