# Identifying and acquiring relevant X-ray observations

This tutorial will explain the basic concepts and components of the X-ray astronomy Python module 'Democratising Archival X-ray Astronomy' (DAXA). We will particularly focus on the various 'mission' classes (implemented for each of the X-ray telescopes that DAXA supports), and the functionality that allows for large numbers of observations to be selected and downloaded.

DAXA mission classes allow the user to interact with and search various X-ray telescope archives, all through an identical interface, within a single module, and through Python commands. It should be simple for anyone with a passing familiarity with Python to identify and acquire X-ray data relevant to their research.

## Import Statements

In [9]:
from daxa.mission import MISS_INDEX, XMMPointed, Chandra, eRASS1DE, eROSITACalPV, NuSTARPointed, Swift, \
    ROSATAllSky, ROSATPointed, ASCA, INTEGRALPointed, Suzaku

from datetime import date
import numpy as np
from astropy.units import Quantity

## What missions are available?

We have implemented support for access and searching the archives of many X-ray telescopes; we would also be willing to provide an interface to the data archives of other X-ray telescopes, if feasible - please feel free to reach out using the support page if you think there is one we should add.

Some telescopes (such as the ROSATPointed/ROSATAllSky and eROSITACalPV/eRASS1DE classes) are not uniquely represented by a single DAXA mission - this is generally the case when the telescope in question has been used in distinctly different ways (e.g. ROSAT had a survey phase and a pointed phase), such that the data for one mode may not be relevant to all applications.

In [None]:
MISS_INDEX

### XMM-Pointed

This class is for acquiring XMM-Newton data, particularly that taken when the telescope is pointing at specific targets; i.e. it cannot be used to filter and download data taken when the telescope is slewing from one target to the next. XMM is still in use, so the number of observations that are available are constantly increasing - this also means that some data are still in a proprietary period, and DAXA will not be able to access them. **Note that only raw, unprocessed, data can currently be downloaded for XMM using DAXA.**

The three EPIC instruments are supported (PN, MOS1, and MOS2), as well as the two grating spectrometers (RGS1 and RGS2). We also support the acquisition of XMM optical monitor (OM) data, but support for processing it is more limited.


Values that can be passed to the `insts` argument of XMMPointed on declaration (either as single strings or as part of a list):

* PN
* MOS1
* MOS2
* RGS1
* RGS2
* OM

In [4]:
xm = XMMPointed()

  self._fetch_obs_info()


### Chandra

The class for acquiring Chandra data, specifically taken when pointing at a target - Chandra does not have easily accessible 'slew' data, and this class does not consider it at all.

Data from all instruments can be downloaded by DAXA - though unlike XMM there are not data being taken simultaneously, so each ObsID is generally only associated with one instrument. Note though that the gratings (HETG and LETG) are used _with_ another instrument, so cannot be passed by themselves.

**Note that the standard format of Chandra data can be downloaded using DAXA, allowing for the use of the standard Chandra re-processing scripts. This standard download includes pre-made images.**

Values that can be passed to the `insts` argument of Chandra (either as single string or as part of a list):

* ACIS-I
* ACIS-S
* HRC-I
* HRC-S
* LETG [an instrument that this grating was used with must be specified]
* HETG [an instrument that this grating was used with must be specified]

In [2]:
ch = Chandra()

### eROSITA All-Sky DR1 (German Half)

The class for acquiring data from the first data release by the German part of the eROSITA consortium - this covers half the sky to 1/8th the planned final survey depth of eROSITA. All of this data is taken in survey mode, where the telescope is constantly slewing. **Note that the data downloads can include pre-generated images and exposure maps.**

The eROSITA telescope is made up of 7 telescope modules that observe simultaneously, and which can be individually selected on declaration with the `insts` argument - ***it is not recommended to use DAXA to limit the telescope modules being considered, as we crudely modify the event lists to remove events from non-selected telescope modules***.

Values that can be passed to the `insts` argument of eRASS1DE (either as a single string or as part of a list):

* TM1
* TM2
* TM3
* TM4
* TM5
* TM6
* TM7

In [None]:
ea = eRASS1DE()

### eROSITA Calibration and Performance Verification

This class provides DAXA access to the data from the calibration and performance verification stages of the eROSITA mission, including all pointed observations and survey regions (such as the eROSITA Final Equatorial-Depth Survey; eFEDS). **Note that _only_ event list data can be downloaded for this class.**

The eROSITA telescope is made up of 7 telescope modules that observe simultaneously, and which can be individually selected on declaration with the `insts` argument - ***it is not recommended to use DAXA to limit the telescope modules being considered, as we crudely modify the event lists to remove events from non-selected telescope modules***.

Values that can be passed to the `insts` argument of eROSITACalPV (either as a single string or as part of a list):

* TM1
* TM2
* TM3
* TM4
* TM5
* TM6
* TM7

In [None]:
ecpv = eROSITACalPV()

### NuSTAR-Pointed

This class is for acquiring NuSTAR data, particularly that taken when the telescope is pointing at specific targets; i.e. it cannot be used to filter and download data taken when the telescope is slewing from one target to the next. NuSTAR is still in use, so the number of observations that are available are constantly increasing - this also means that some data are still in a proprietary period, and DAXA will not be able to access them. **Note that data downloads can optionally include pre-generated images**

NuSTAR has two instruments that observe simultaneously, and are essentially identical, Focal Plane Module A & B.

Values that can be passed to the `insts` argument of NuSTARPointed on declaration (either as single strings or as part of a list):

* FPMA
* FPMB

In [None]:
nu = NuSTARPointed()

### Swift

This class is for acquiring Swift data. Swift is still in use, so the number of observations that are available are constantly increasing. Also, due to the Swift's primary mission of rapid transient follow-up, and how observations are split up, the table of available observations is unusually large (only INTEGRAL is comparable) - **as such this class may take several minutes to declare, depending on your internet connection**.

Swift has three instruments that generally observe simultaneously; the X-ray Telescope (XRT), the Burst Alert Telescope (BAT) which is has poor spatial resolution but has a large field of view and is sensitive to very high energy photons, and the Ultraviolet and Optical Telescope (UVOT). **Note that data downloads can optionally include pre-generated images, but not for BAT observations.**

Values that can be passed to the `insts` argument of Swift on declaration (either as single strings or as part of a list); the XRT and BAT instruments are selected by default:

* XRT
* BAT
* UVOT

In [None]:
sw = Swift()

### ROSAT All-Sky Survey

This class provides access to data taken by ROSAT during its all-sky survey. Though ROSAT had multiple instruments, this was all taken with the position-sensitive proportional counters (PSPC) - specifically with the 'PSPC-C' instrument (ROSAT had two PSPC instruments). The initial all-sky survey was abandoned after an accidental pass over the sun destroyed PSPC-C, but follow-up observations with PSPC-B were taken towards the end of ROSAT's lifetime to complete the survey (**these are not included in this class**).

**Note: Data acquired through this class will include just event lists by default, but can also include pre-generated images and exposure maps.**

In [None]:
ra = ROSATAllSky()

### ROSAT-Pointed

This class provides access to data taken by ROSAT during the pointed phase of its lifetime (including follow-up observations used to complete the all-sky survey). ROSAT instruments could not observe simultaneously, so each separate observation uses a single instrument. **Note: Data acquired through this class will include just event lists by default, but can also include pre-generated images (PSPC & HRI) and exposure maps (just PSPC).**

Values that can be passed to the `insts` argument of ROSATPointed on declaration (either as single strings or as part of a list):

* PSPCB
* PSPCC
* HRI

In [None]:
rp = ROSATPointed()

### Suzaku

This class provides access to data taken by the Suzaku X-ray telescope during pointed observations (data taken while slewing are not included in the public archive). **Note: Data acquired through this class will include just event lists by default, but can also include pre-generated images.**

We provide access to XIS data, but not XRS (as the cooling system was damaged soon after launch) or HXD (as it was not an imaging instrument).

Values that can be passed to the `insts` argument of Suzaku on declaration (either as single strings or as part of a list):

* XIS0
* XIS1
* XIS2
* XIS3

In [None]:
su = Suzaku()

### ASCA

This class provides access to data taken by the ASCA X-ray telescope during pointed observations (we cannot find anywhere to access the data taken whilst slewing). **Note: Data acquired through this class will include just event lists by default, but can also include pre-generated images, spectra, and lightcurves.**


Values that can be passed to the `insts` argument of ASCA on declaration (either as single strings or as part of a list):

* SIS0
* SIS1
* GIS2
* GIS3

In [None]:
asca = ASCA()

### INTEGRAL-Pointed

This class is for acquiring INTEGRAL data. INTEGRAL is still in use, so the number of observations that are available are still increasing (_though operations will see around the end of 2024_). The table of available observations is unusually large (only Swift is comparable) - **as such this class may take several minutes to declare, depending on your internet connection**.

INTEGRAl has a selection of instruments that cover different parts of the X-ray and Gamma-ray energy range - most of them are based on the 'coded mask' technology, and so have very limited spatial resolution. **Note that only raw data/calibration files can be downloaded through DAXA, there are no pre-processed images available**.

Values that can be passed to the `insts` argument of INTEGRALPointed on declaration (either as single strings or as part of a list):

* JEMX1
* JEMX2
* ISGRI
* PICsIT
* SPI

In [None]:
inte = INTEGRALPointed()

## Mission properties

Here we run through the basic properties that each of the DAXA mission classes share. We also show examples, particularly in cases where differences between telescopes result in us assigning different values for particular properties.

### Name

The name assigned to each mission class, so that they can be differentiated both by the user and by DAXA functions. Each mission class has two names, the 'internal DAXA name' (used by DAXA to identify missions) and the 'pretty name', which is typically in a more aethsetically pleasing format.

For instance, we show the 'name' and 'pretty name' of the XMMPointed and eROSITA All-Sky Survey 1 classes:

In [None]:
print(xm.name)
print(xm.pretty_name, '\n')

print(ea.name)
print(ea.pretty_name)

### All Instruments & Chosen Instruments

Most telescopes have multiple instruments, though not all are necessarily selected by default. This can be for a number of reasons, but is generally because they either aren't suited to archival/serindipitious science (which is the primary reason this module exists) or because they aren't X-ray telescopes (like the optical monitors on XMM and Swift). 

The instruments whose data is to be acquired are generally specified when the mission class is declared (using the `insts` argument), but can also be set through the `chosen_instruments` property.

Every available instrument for a mission is stored in the `all_mission_instruments` property:

In [None]:
print(rp.all_mission_instruments)
print(ch.all_mission_instruments)

The selected instruments (normally specified on declaration) are stored in the `chosen_instruments` property, which can also be set:

In [None]:
print(rp.chosen_instruments)
print(ch.chosen_instruments)
ch.chosen_instruments = ['ACIS-I', 'ACIS-S']
print(ch.chosen_instruments)

### ObsID Regular Expression

Each of the mission's observations are uniquely identified by an 'ObsID', and each telescope/mission has a different format of ObsID (generally just made up of numeric characters) - there are points where the mission class may have to check the format of a supplied ObsID, and it does that by comparing to the ObsID regular expression:

In [None]:
print(xm.id_regex)
print(rp.id_regex)

### Field of View

The field of view (FoV) values attached to DAXA mission classes represent the half-width (or radius) of the region of sky that a telescope/instrument observes. Given that each telescope instrument tends to have a unique geometry, this is a simplification, but it is beneficial to have a single number that represents how much of the sky an instrument can see.

In the simplest cases, the FoV property is just a single quantity, meaning that there is either only one instrument, or that every instrument has the same field of view. In other cases there may be multiple instruments associated with a mission, in which case they will all have their own entry in a FoV dictionary.

Finally, some telescopes (such as Chandra) have instruments which have irregular geometries (the ACIS-S and HRC-S chips), or that are frequently used in different observational modes that enable and disable different parts of the sensor. As such the actual coverage of the FoV can vary dramatically, in cases like these we will have gone with the half-width of the longest possible side of the field of view.

In [None]:
print(xm.pretty_name, '-', xm.fov, '\n')
print(rp.pretty_name, '-', rp.fov, '\n')
print(sw.pretty_name, '-', sw.fov, '\n')
print(ch.pretty_name, '-', ch.fov)

### Coordinate Frame

This property contains the coordinate frame for the central positions of observations taken by the mission - this is largely irrelevant to the user, and will be used in cases where the mission class needs to compare an input coordinate to an observation coordinate. 

Also, practically speaking the difference between the ICRS and FK5 frames (most commonly used) is neglible compared to the typical spatial uncertainty involved in X-ray data:

In [None]:
print(ch.coord_frame)
print(asca.coord_frame)

### Pre-processed Energy Bands

This property will contain the upper and lower energy bounds available for pre-processed data products for a particular mission (if their online dataset supplies energy-bound data products) - these energy bounds are provided on an instrument level (as some missions provide different energy-bound products for different instruments). The left hand column indicates the lower energy bound, and the right hand column the upper energy bound. 

An energy bound being present here does not guarantee that all products supplied by the mission online dataset are available in that bound - e.g. some missions provide bound images and a single, general, exposure map.

In [None]:
rp.preprocessed_energy_bands

If the mission in question cannot provide pre-processed data products that are energy bound, then an error will be raised:

In [None]:
inte.preprocessed_energy_bands

### Observation Information

One of the most important properties of a DAXA mission class - this returns a dataframe of all the observations that are associated with a mission. This can include observations that are not yet publicly available (i.e. they are still in a proprietary period), but will never include observations that are planned but haven't been taken yet. 

In most cases this data is dynamically updated when a mission is declared (i.e. the table is pulled down from a mission server) - this is not the case for eROSITACalPV and eRASS1DE. **Please also note that the Swift and INTEGRALPointed missions have very large `all_obs_info` tables due to the way their data/observations are organised.**

Some information will be constant across telescopes, and some will be mission specific. We present truncated versions of `all_obs_info` for every DAXA mission as of the current date:

In [None]:
date.today().strftime('%d-%B-%Y')

In [None]:
xm.all_obs_info

In [None]:
ch.all_obs_info

In [None]:
ea.all_obs_info

In [None]:
ecpv.all_obs_info

In [None]:
nu.all_obs_info

In [None]:
sw.all_obs_info

In [None]:
ra.all_obs_info

In [None]:
rp.all_obs_info

In [None]:
su.all_obs_info

In [None]:
asca.all_obs_info

In [None]:
inte.all_obs_info

### Filter Array

This is unlikely to ever be accessed directly by the user, but is what defines the observations that a mission currently deems to be accepted/selected. It is a boolean numpy array with a length equal to the number of observations in the all_obs_info dataframe, a `True` value means the observation is accepted and a `False` value means it is excluded; all observations start off as accepted.

Various filtering methods can be used to adjust the filter array and set the observations which are to be downloaded/included in a DAXA archive, depending on your particular sample and science case. 

It is also possible to manually set this filter array, as is demonstrated below:

In [None]:
# The filter array defaults to all True, so all observations are accepted
xm.filter_array

In [None]:
# Demonstrating manually setting a filter array - it must be boolean and be the same length as the
#  'all_obs_info' table, otherewise it will not be accepted
demo_filt_arr = np.full(len(xm.all_obs_info), True)
demo_filt_arr[0] = False
xm.filter_array = demo_filt_arr
xm.filter_array

## Selecting relevant observations

Few users will wish to download, process, and maintain ___complete___ observation archives, preferring to just locate data that may be relevant to the sources that they are studying. This can be achieved with the use of several filtering methods which are built into all DAXA missions.

Here we introduce the different filtering methods that are currently implemented for DAXA missions, but we do not provide detailed demonstrations of their use; that is [left to specific case studies](../../tutorials.casestudies.html) designed to show scientists with different needs how DAXA can be used in ways that are most relevant to them.

### Filtering on ObsID

The most basic filtering method available can be used when you already know which observation(s) you are interested in - if you have the ObsID(s) you can just pass them to the `filter_on_obs_ids` method, and select only that data:

In [None]:
help(xm.filter_on_obs_ids)

### Filtering on position

Arguably the most useful type of filtering supported by DAXA missions, the `filter_on_positions` method allows us to search for observations that are relevant to specific positions on the sky (generally these will represent particular objects). A search can be performed either for a single position, or for a whole sample.

The `search_distance` argument controls how close the central coordinate of an observation must be to a search position for that observation to be accepted. The default search positions are defined by the field-of-view of the telescope (if a mission has multiple instruments with different field-of-views then each instrument will be searched with the correct field-of-view). The user may also specify their own search distance value (or values), and it is also possible to specify a different search distance for every position.

The `return_pos_obs_info` argument controls whether a dataframe is returned that links passed positions to particular observations that are relevant to them - this dataframe would not include positions that are determined to have no relevant observations.

In [None]:
help(xm.filter_on_positions)

### Filtering on name

If you are interested in data relevant to named objects, you can pass the name(s) to the `filter_on_name` method. It will use a lookup service (specifically Sesame), to locate coordinates for the object(s), and then pass that to the `filter_on_positions` method. 

Bear in mind that you are reliant on the lookup service having accurate central coordinates for the named object, so it could be worth checking with your own coordinates and using `filter_on_positions` directly if you can't find any observations!

In [None]:
help(xm.filter_on_name)

### Filtering on target type

It is possible to filter observations based on the type of object that was the original target of the observation. **Warning: this should be used with significant caution, as our object taxonomy may not be granular enough to represent all different types of astronomical objects, and conversions between the different target types used by different missions and our target types is imperfect!**

Here we display the DAXA source type taxonomy, the short form codes on the left are what should be passed to the `filter_on_target_type` method - the user may pass either a single target type, or a list of them:

In [None]:
xm.show_allowed_target_types()

In [None]:
help(xm.filter_on_target_type)

### Filtering on time

Observations can be filtered on ___when___ they were taken, with the user specifying a time frame (defined by a start and end date-time) from which they wish to select observations. By default any observation that coincides with that time window (either starting in it, ending in it, or starting and ending outside but being taken during it) will be selected - it is also possible to require that an observation must have taken place entirely within the time window.

___Warning: Observations of survey missions like eRASS1DE and ROSATAllSky may repeatedly visit a particular location, and all that data may be incorporated as one observation, meaning that there may not constant coverage in such an observation___

In [None]:
help(xm.filter_on_time)

### Filtering on time & position

A method of filtering that combines positional and temporal filtering, so that observations of a particular position, within a particular time window, can be located. The user ___does not___ have to filter for one position-time combination at a time - they may pass a set of positions, with a corresponding set of time windows, and find all the observations that fulfill those requirements.

The `filter_on_positions_at_time` method supports the same arguments passed to `filter_on_positions` and `filter_on_time`, which set search distances from the passed coordinate (`search_distance`), whether a dataframe linking particular input coordinates to particular selected observations (`return_obs_info`), and whether only observations that start and end within the specified time period should be considered (`over_run`). 

In [None]:
help(xm.filter_on_positions_at_time)

## Downloading data

Once the user has decided upon a set of observations, for a particular mission, that are relevant to their research - the next step is often to download them. This section specifies what can be downloaded, and how to download it.

### What can be downloaded?

The exact data that can be downloaded depends on what a particular mission has made available on their online archive - Event lists are available for all missions bar INTEGRAL, and most missions support the acquisition of pre-generated images, but not all (INTEGRAL and eROSITA CalPV) for instance).

We make a distinction between 'raw' and 'pre-processed' data, which is necessarily fuzzy due to the disparate natures of the various data archives we have to deal with. Broadly speaking '**raw**' data means either absolutely raw files that need to be processed into initial event lists (the case with XMMPointed) or just event lists (uncleaned and pre-cleaned); '**pre-processed**' data includes pre-processed data products such as images, exposure maps, and background maps.

**This means that the user can either set up an archive using the pre-processed data, or re-process data using DAXA interfaces to the various telescope backend software packages.**

Pre-processed products are downloaded by default (in addition to the raw data), but when setting up an Archive you can choose between using pre-processed data or re-processing the raw data; if you do not wish to acquire pre-processed data products, you can pass `download_products=False` to the download method (see below). 

**Event lists are considered raw data (i.e. not pre-processed) for most missions, and will always be downloaded**

#### XMM-Newton

_Currently_ we only support the acquisition of raw data.

#### Chandra

The following data products can be downloaded:

* Full FoV image (if an Archive is constructed including pre-processed Chandra data, this is what is included).
* High-resolution central image.

#### eROSITA All-Sky DR1 (German Half)

The following data products can be downloaded:

* Images
* Exposure maps
* Background maps

#### eROSITA Calibration and Performance Verification

No pre-processed data products are available, only event lists.

#### NuSTAR-Pointed

The following data products can be downloaded:

* Images

#### Swift

The following data products can be downloaded (_no products are available for BAT, only raw data_):

* Images (XRT and UVOT)
* Exposure maps (XRT and UVOT)

#### ROSAT All-Sky Survey

The following data products can be downloaded:

* Images
* Exposure maps

#### ROSAT-Pointed

The following data products can be downloaded:

* Images
* Exposure maps (only PSPC, no HRI)

#### Suzaku

The following data products can be downloaded:

* Images

#### ASCA

The following data products can be downloaded:

* Images (SIS0 + SIS1, and GIS2 + GIS3)
* Exposure maps
* Lightcurves (not included in Archive constructed from pre-processed data)
* Spectra (not included in Archive constructed from pre-processed data)

#### INTEGRAL-Pointed

No pre-processed data (nor even event lists) are available to download for INTEGRAL, only raw data and calibration files.

### How can it be downloaded?

The selected data can be downloaded using the `download` method that is built into each mission class.

If multiple observations have been selected, then downloads can be multi-threaded; this can sometimes offer a speed benefit, though only if the archive and user internet connections are not saturated by the download. The number of cores used can be set via the `num_cores` argument; by defualt this is set to the NUM_CORES DAXA constant, which can either be set by the user in the DAXA configuration file/by setting the value of daxa.NUM_CORES, or will be 90% of the cores of the system.

**Note that it is not necessary to manually activate the download method if you will be creating a DAXA Archive from the filtered mission objects, as that will be done on archive initialisation - [see the Archive tutorial for more information.](archives.html)**

Here we demonstrate a simple XMM download, as well as a Chandra download that includes pre-generated images (standard Chandra reprocessing scripts can be used on this downloaded data):

In [5]:
xm.filter_on_obs_ids('0201903501')
xm.download(num_cores=1)

ch.filter_on_obs_ids('3205')
ch.download(num_cores=1, download_products=True)

  xm.download(num_cores=1)
  ch.download(num_cores=1, download_products=True)


## Getting paths to downloaded mission data products

If pre-processed data have been downloaded, it is possible to use methods built into the mission class to retrieve the paths to various data products. **If you intend on using an XGA archive, using these methods should not be necessary** as the data will be moved to the Archive processed data storage structure, but it may still be useful if you just wish to use DAXA to download data.

There are four different get methods:

* `get_evt_list_path` - to retrieve event lists.
* `get_image_path` - to retrieve images.
* `get_expmap_path` - to retrieve exposure maps.
* `get_background_path` - to retrieve background maps.

They are very easy to use - only the ObsID, instrument (in most cases, not necessary if the mission has only one instrument per ObsID), and lower/upper energy bound (for images, exposure maps, and background maps - though if the products only have one energy band it will be completed automatically) need to be provided. They also provide helpful, detailed, error messages if what is requested isn't possible.

Here we retrive the path to the event list (note that we do not need to pass the instrument name, as Chandra has only one per ObsID):

In [6]:
ch.get_evt_list_path('3205')

'/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/chandra_raw/3205/primary/acisf03205N006_evt2.fits'

We can also retrieve the path to the full-FoV image (again we don't need to pass instrument in this specific instance, nor do we need to pass energy bounds because Chandra only supplies one): 

In [7]:
ch.get_image_path('3205')

'/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/chandra_raw/3205/primary/acisf03205N006_full_img2.fits'

A more typical way of using this function would be this:

In [11]:
ch.get_image_path('3205', Quantity(0.5, 'keV'), Quantity(7, 'keV'), 'ACIS-I')

'/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/chandra_raw/3205/primary/acisf03205N006_full_img2.fits'

The get methods for exposure and background map paths operate identically to the image path method, though will show an error for Chandra:

In [12]:
ch.get_expmap_path('3205', Quantity(0.5, 'keV'), Quantity(7, 'keV'), 'ACIS-I')

PreProcessedNotSupportedError: This mission (Chandra) does not support the download of pre-processed exposure maps, so a path cannot be provided. 