# ABC Guide for XMM-Newton -- EPIC Source Extraction and Spectrum Creation
<hr style="border: 2px solid #fadbac" />

- **Description:** XMM-Newton ABC Guide, EPIC Source Extraction and Spectrum Creation.
- **Level:** Beginner
- **Data:** XMM observation of the Lockman Hole (obsid=0123700101)
- **Requirements:** Must be run using pySAS version 2.2.2 or higher.
- **Credit:** Ryan Tanner (April 2024)
- **Support:** <a href="https://heasarc.gsfc.nasa.gov/docs/xmm/xmm_helpdesk.html">XMM Newton GOF Helpdesk</a>
- **Last verified to run:** 17 October 2025, for SAS v22.1 and pySAS v2.2.2

<hr style="border: 2px solid #fadbac" />

## 1. Introduction
This tutorial is based on Chapter 7 from the The XMM-Newton ABC Guide prepared by the NASA/GSFC XMM-Newton Guest Observer Facility. This notebook assumes you are at least minimally familiar with pySAS (see the [Long pySAS Introduction](./analysis-xmm-long-intro.ipynb "Long pySAS Intro")). 

#### SAS Tasks to be Used

- `evselect`[(Documentation for evselect)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/evselect/index.html)
- `atthkgen `[(Documentation for atthkgen)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/atthkgen/index.html)
- `epatplot`[(Documentation for epatplot)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/epatplot/index.html)
- `backscale`[(Documentation for backscale)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/backscale/index.html)
- `rmfgen`[(Documentation for rmfgen)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/rmfgen/index.html)
- `arfgen`[(Documentation for arfgen)](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/arfgen/index.html)

#### Useful Links

- [`pysas` Documentation](https://xmm-tools.cosmos.esa.int/external/sas/current/doc/pysas/index.html "pysas Documentation")
- [`pysas` on GitHub](https://github.com/XMMGOF/pysas)
- [Common SAS Threads](https://www.cosmos.esa.int/web/xmm-newton/sas-threads/ "SAS Threads")
- [Users' Guide to the XMM-Newton Science Analysis System (SAS)](https://xmm-tools.cosmos.esa.int/external/xmm_user_support/documentation/sas_usg/USG/SASUSG.html "Users' Guide")
- [The XMM-Newton ABC Guide](https://heasarc.gsfc.nasa.gov/docs/xmm/abc/ "ABC Guide")
- [XMM Newton GOF Helpdesk](https://heasarc.gsfc.nasa.gov/docs/xmm/xmm_helpdesk.html "Helpdesk") - Link to form to contact the GOF Helpdesk.

<div class="alert alert-block alert-warning">
    <b>Warning:</b> By default this notebook will place observation data files in your default <tt>data_dir</tt> directory. Make sure pySAS has been configured properly.
</div>

In [None]:
# pySAS imports
import pysas
from pysas import MyTask

# Useful imports
import os, subprocess

# Imports for plotting
import matplotlib.pyplot as plt
from astropy.visualization import astropy_mpl_style
from astropy.io import fits
from astropy.wcs import WCS
from astropy.table import Table
from regions import CircleSkyRegion
from astropy.coordinates import SkyCoord
import astropy.units as u
plt.style.use(astropy_mpl_style)

# To handle certain warnings
import warnings
warnings.filterwarnings("ignore")

Now we need to let pySAS know which Obs ID we are working with. When we run the command `basic_setup` it will auto-detect the observation files and event lists created in the notebook on [EPIC image creation and basic filtering](./analysis-xmm-ABC-guide-EPIC-image-filtering.ipynb).

In [None]:
obsid = '0123700101'

my_obs = pysas.ObsID(obsid)

my_obs.basic_setup(overwrite=False,rerun=False,run_epproc=False,run_rgsproc=False)

os.chdir(my_obs.work_dir)

# File names for this notebook. The User can change these file names.
unfiltered_event_list = my_obs.files['M1evt_list'][0]
temporary_event_list = 'temporary_event_list.fits'
light_curve_file ='mos1_ltcrv.fits'
gti_rate_file = 'gti_rate.fits'
filtered_event_list = 'filtered_event_list.fits'

***
If you have already worked through the notebook tutorial on [EPIC image creation and basic filtering](./analysis-xmm-ABC-guide-EPIC-image-filtering.ipynb) you can skip the next cell. But if not, or if you want to run it again, the necessary code from that notebook is in the cell below.

In [None]:
# "Standard" Filter
inargs = {'table'           : unfiltered_event_list, 
          'withfilteredset' : 'yes', 
          "expression"      : "'(PATTERN <= 12)&&(PI in [200:4000])&&#XMMEA_EM'", 
          'filteredset'     : temporary_event_list, 
          'filtertype'      : 'expression', 
          'keepfilteroutput': 'yes', 
          'updateexposure'  : 'yes', 
          'filterexposure'  : 'yes'}

MyTask('evselect', inargs).run()

# Make Light Curve File
inargs = {'table'          : temporary_event_list, 
          'withrateset'    : 'yes', 
          'rateset'        : light_curve_file, 
          'maketimecolumn' : 'yes', 
          'timecolumn'     : 'TIME', 
          'timebinsize'    : '100', 
          'makeratecolumn' : 'yes'}

MyTask('evselect', inargs).run()

# Make Secondary GTI File
# Chose the rate based on the plot from the light curve file
filter_rate = 6
inargs = {'table'      : light_curve_file, 
          'gtiset'     : gti_rate_file,
          'timecolumn' : 'TIME', 
          "expression" : "'(RATE <= {0})'".format(filter_rate)}

MyTask('tabgtigen', inargs).run()

# Filter Using Secondary GTI File
inargs = {'table'           : temporary_event_list,
          'withfilteredset' : 'yes', 
          "expression"      : "'GTI({0},TIME)'".format(gti_rate_file), 
          'filteredset'     : filtered_event_list,
          'filtertype'      : 'expression', 
          'keepfilteroutput': 'yes',
          'updateexposure'  : 'yes', 
          'filterexposure'  : 'yes'}

MyTask('evselect', inargs).run()

## 2. Selecting a Source

As in the notebook on [EPIC image creation and basic filtering](./analysis-xmm-ABC-guide-EPIC-image-filtering.ipynb), we will use the function named 'quick_eplot' that is part of `ObsID` to create a FITS image file from the filtered event list.

In [None]:
image_file = 'image.fits'
_ = my_obs.quick_eplot(filtered_event_list, image_file = image_file)

We will now define a function that will allow us to plot a region on the image. The resulting image will be zoomed in on the region.

In [None]:
def plot_region(image_file, ra, dec, radius):
    
    # Define region
    ra_ll  = ra-20*radius
    ra_ul  = ra+20*radius
    dec_ll = dec-10*radius
    dec_ul = dec+10*radius
    center = SkyCoord(ra, dec)
    region = CircleSkyRegion(center, radius)
    
    # Open file
    hdu = fits.open(image_file)[0]
    wcs = WCS(hdu.header)

    # Convert region to artist object
    pixel_region = region.to_pixel(wcs)
    artist = pixel_region.as_artist(color='lime')

    # Set image limits
    ra_lim  = [ra_ll.value, ra_ul.value]
    dec_lim = [dec_ll.value, dec_ul.value]
    (xmin, xmax), (ymin, ymax) = wcs.all_world2pix(ra_lim, dec_lim, 0)

    # Plot
    ax = plt.subplot(projection=wcs)
    plt.imshow(hdu.data, origin='lower', norm='log', vmin=1.0, vmax=10.0)
    ax.set_facecolor("black")
    ax.add_artist(artist)
    ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))
    plt.grid(color='blue', ls='solid')
    plt.xlabel('RA')
    plt.ylabel('Dec')
    plt.colorbar()
    plt.show()

In this example we have preselected the coordinates for the source we are interested in. To find the coordinates for a source you can either,

1. Find the coordinates manually using ds9
2. Use an automated method such as `edetect_chain` to detect sources

We will now add a single region around an interesting source. The source coordinates are given in degrees (with decimals) and the source radius is given in arcseconds.

In [None]:
source_RA  = 163.164 * u.deg # degrees
source_Dec = 57.408 * u.deg  # degrees
source_rad = 15.0 * u.arcsec # arcseconds

plot_region(image_file, source_RA, source_Dec, source_rad)

## 3. Extract the Source and Background Spectra for a Single Region

Throughout the following, please keep in mind that some parameters are instrument-dependent. The parameter `specchannelmax` should be set to 11999 for the MOS, or 20479 for the PN. Also, for the PN, the most stringent filters, `(FLAG==0)&&(PATTERN<=4)`, must be included in the expression to get a high-quality spectrum.

For the MOS, the standard filters should be appropriate for many cases, though there are some instances where tightening the selection requirements might be needed. For example, if obtaining the best-possible spectral resolution is critical to your work, and the corresponding loss of counts is not important, only the single pixel events should be selected `(PATTERN==0)`. If your observation is of a bright source, you again might want to select only the single pixel events to mitigate pile up (see ยง6.8 and ยง6.9 for a more detailed discussion). In any case, you'll need to know spatial information about the area over which you want to extract the spectrum.

Now using `evselect` we can select just the events inside of the region. These events should mostly be from the source we are interested in. We will use the following filtering expression. (Note: The radius, 15.0 arcsec, has been converted to degrees.)

`expression='((RA,DEC) in CIRCLE(163.164,57.408,0.00416667))'`

The inputs for `evselect` to extract the source spectra are as follows.

	table - the event file
	energycolumn - energy column
	withfilteredset - make a filtered event file
	keepfilteroutput - keep the filtered file
	filteredset - name of output file
	filtertype - type of filter
	expression - expression to filter by
	withspectrumset - make a spectrum
	spectrumset - name of output spectrum
	spectralbinsize - size of bin, in eV
	withspecranges - covering a certain spectral range
	specchannelmin - minimum of spectral range
	specchannelmax - maximum of spectral range

When extracting the background spectrum, follow the same procedures, but change the extraction area. For example, make an annulus around the source; this can be done using the keyword 'ANNULUS' and then providing the inner and outer edges of the annulus, then change the filtering expression (and output file name) as necessary. For the outer radius we have chosen 2x the inner radius.

`expression='((RA,DEC) in ANNULUS(163.164,57.408,0.00416667,0.00833333))'`

Below we extract both the source and the background spectra. The keywords are as described above.

In [None]:
filtered_source = 'mos1_filtered.fits'
filtered_bkg = 'bkg_filtered.fits'
source_spectra_file = 'mos1_pi.fits'
bkg_spectra_file = 'bkg_pi.fits'

circle = "CIRCLE({0},{1},{2})".format(source_RA.value,source_Dec.value,source_rad.to(u.deg).value)

inargs = {'table'           : filtered_event_list,
          'energycolumn'    : 'PI',
          'withfilteredset' : 'yes',
          'filteredset'     : filtered_source,
          'keepfilteroutput': 'yes',
          'filtertype'      : 'expression',
          'expression'      : "'((RA,DEC) in {0})".format(circle),
          'withspectrumset' : 'yes',
          'spectrumset'     : source_spectra_file,
          'spectralbinsize' : '5',
          'withspecranges'  : 'yes',
          'specchannelmin'  : '0',
          'specchannelmax'  : '11999'}

MyTask('evselect', inargs).run()

annulus = "ANNULUS({0},{1},{2},{3})".format(source_RA.value,source_Dec.value,source_rad.to(u.deg).value,2*source_rad.to(u.deg).value)

inargs = {'table'           : filtered_event_list,
          'energycolumn'    : 'PI',
          'withfilteredset' : 'yes',
          'filteredset'     : filtered_bkg,
          'keepfilteroutput': 'yes',
          'filtertype'      : 'expression',
          'expression'      : "'((RA,DEC) in {0})'".format(annulus),
          'withspectrumset' : 'yes',
          'spectrumset'     : bkg_spectra_file,
          'spectralbinsize' : '5',
          'withspecranges'  : 'yes',
          'specchannelmin'  : '0',
          'specchannelmax'  : '11999'}

MyTask('evselect', inargs).run()

## 4. Check for Pile Up

Depending on how bright the source is and what modes the EPIC detectors are in, event pile up may be a problem. Pile up occurs when a source is so bright that incoming X-rays strike two neighboring pixels or the same pixel in the CCD more than once in a read-out cycle. In such cases the energies of the two events are in effect added together to form one event. If this happens sufficiently often, 
1) the spectrum will appear to be harder than it actually is, and
2) the count rate will be underestimated, since multiple events will be undercounted.

To check whether pile up may be a problem, use the SAS task epatplot. Heavily piled sources will be immediately obvious, as they will have a "hole" in the center, but pile up is not always so conspicuous. Therefore, we recommend to always check for it.

<div class="alert alert-block alert-info">
    <b>Note:</b> This procedure requires as input the event file for the source created when the spectrum was made (i.e. 'filtered_source'), not the usual time-filtered event file (i.e. 'filtered_event_list').
</div>

To check for pile up in our Lockman Hole example, run the following cell:

In [None]:
inargs = {'set'               : filtered_source,
          'plotfile'          : 'mos1_epat.pdf',
          'useplotfile'       : 'yes',
          'withbackgroundset' : 'yes',
          'backgroundset'     : filtered_bkg}

MyTask('epatplot', inargs).run()

where

    set - input events file 
    plotfile - output postscript file 
    useplotfile - flag to use file name from "plotfile" 
    withbackgroundset - use background event set for background subtraction? 
    backgroundset - name of background event file

The output of `epatplot` is a pdf file, `mos1_epat.pdf` and is found in the `work_dir` for the Obs ID we are using. In the pdf there are two graphs describing the distribution of counts as a function of PI channel. You should get a plot like that shown below.

In [None]:
print(my_obs.work_dir)

![Pileup Plot](./_files/pile_up_plot1.png)

A few words about interpretting the plots are in order. The top is the distribution of counts versus PI channel for each pattern class (single, double, triple, quadruple), and the bottom is the expected pattern distribution (smooth lines) plotted over the observed distribution (histogram). The lower plot shows the model distributions for single and double events and the observed distributions. It also gives the ratio of observed-to-modeled events with 1-$\sigma$ uncertainties for single and double pattern events over a given energy range. (The default is 0.5-2.0 keV; this can be changed with the pileupnumberenergyrange parameter.) If the data is not piled up, there will be good agreement between the modeled and observed single and double event pattern distributions. Also, the observed-to-modeled fractions for both singles and doubles in the 0.5-2.0 keV range will be unity, within errors. In contrast, if the data is piled up, there will be clear divergence between the modeled and observed pattern distributions, and the observed-to-modeled fraction for singles will be less than 1.0, and for doubles, it will be greater than 1.0.

Finally, when examining the plots, it should noted that the observed-to-modeled fractions can be inaccurate. Therefore, the agreement between the modeled and observed single and double event pattern distributions should be the main factor in determining if an observation is affected by pile up or not.

The source used in our Lockman Hole example is too faint to provide reasonable statistics for epatplot and is far from being affected by pile up. For comparison, an example of a bright source (Mkn 421, Obs ID: 0136541101) which is strongly affected by pileup is shown below. Note that the observed-to-model fraction for doubles is over 1.0, and there is severe divergence between the model and the observed pattern distribution.

![Mkn 421 Pileup Plot](./_files/pile_up_Mkn_421.png)

## 5. My Observation is Piled Up! Now What?

If you're working with a different (much brighter) dataset that does show signs of pile up, there are a few ways to deal with it. First, using the region selection and event file filtering procedures demonstrated in earlier sections, you can excise the inner-most regions of a source (as they are the most heavily piled up), re-extract the spectrum, and continue your analysis on the excised event file. For this procedure, it is recommended that you take an iterative approach: remove an inner region, extract a spectrum, check with epatplot, and repeat, each time removing a slightly larger region, until the model and observed distribution functions agree. If you do this, be aware that removing too small a region with respect to the instrumental pixel size (1.1'' for the MOS, 4.1'' for the PN) can introduce systematic inaccuracies when calculating the source flux; these are less than 4%, and decrease to less than 1% when the excised region is more than 5 times the instrumental pixel half-size. In any case, be certain that the excised region is larger than the instrumental pixel size!

You can also use the event file filtering procedures to include only single pixel events (PATTERN==0), as these events are less sensitive to pile up than other patterns.

## 6. Determine the Spectrum Extraction Areas

Now that we are confident that our spectrum is not piled up, we can continue by finding the source and background region areas. This is done with the task `backscale`, which takes into account any bad pixels or chip gaps, and writes the result into the BACKSCAL keyword of the spectrum table. Alternatively, we can skip running backscale, and use a keyword in arfgen below. We will show both options for the curious.

The inputs for `backscale` are:

    spectrumset - spectrum file
    badpixlocation - event file containing the bad pixels

To find the source and background extraction areas explicitly,

In [None]:
inargs = {'spectrumset'    : source_spectra_file,
          'badpixlocation' : filtered_event_list}

MyTask('backscale', inargs).run()

inargs = {'spectrumset'    : bkg_spectra_file,
          'badpixlocation' : filtered_event_list}

MyTask('backscale', inargs).run()

## 7. Create the Photon Redistribution Matrix (RMF) and Ancillary File (ARF)

Now that a source spectrum has been extracted, we need to reformat the detector response by making a redistribution matrix file (RMF) and ancillary response file (ARF). To make the RMF we use `rmfgen`. The input arguments are:

    rmfset - output file
    spectrumset - input spectrum file

Now we can use `arfgen` with the RMF, spectrum, and event file to make the ancillary file (ARF). The input arguments are:

    arfset - output ARF file name
    spectrumset - input spectrum file name
    withrmfset - flag to use the RMF
    rmfset - RMF file created by rmfgen
    withbadpixcorr - flag to include the bad pixel correction
    badpixlocation - file containing the bad pixel information; should be set to the event file from which the spectrum was extracted
    setbackscale - flag to calculate the area of the source region and write it to the BACKSCAL keyword in the spectrum header

In [None]:
rmf_file = 'mos1_rmf.fits'
arf_file = 'mos1_arf.fits'

inargs = {}
inargs = {'rmfset'      : rmf_file,
          'spectrumset' : source_spectra_file}

MyTask('rmfgen', inargs).run()

inargs = {}
inargs = {'arfset'         : arf_file,
          'spectrumset'    : source_spectra_file,
          'withrmfset'     : 'yes',
          'rmfset'         : rmf_file,
          'withbadpixcorr' : 'yes',
          'badpixlocation' : filtered_event_list,
          'setbackscale'   : 'yes'}

MyTask('arfgen', inargs).run()

To analize the spectra the individual photon counts need to be grouped into energy bins. We also need to include the filenames of the ARF, RMF, and background spectra in the header of the grouped spectra file. We do this by using the `specgroup` command. The input arguments are:

    spectrumset - name of the input (ungrouped) sprectra file
    groupedset - name of the output (grouped) spectra file
    arfset - ARF file name
    rmfset - RMF file name
    backgndset - background spectra file name
    mincounts - the minimum number of counts per bin; the bins will be sized to reach the mincounts value

In [None]:
grouped_spectra = 'mos1_grp.fits'

inargs = {}
inargs = {'spectrumset' : source_spectra_file,
          'groupedset'  : grouped_spectra,
          'arfset'      : arf_file,
          'rmfset'      : rmf_file,
          'backgndset'  : bkg_spectra_file,
          'mincounts'   : 30}

MyTask('specgroup', inargs).run()

At this point, the spectrum stored in the file `mos1_grp.fits` is ready to be analyzed using an analysis package such as XSPEC. For a simple example of that see the notebook [Fitting an EPIC Spectrum in XSPEC](./analysis-xmm-ABC-guide-spectra-fitting.ipynb) based on Chapter 13 of the ABC Guide.