# Finding, downloading, and processing relevant X-ray observations

We want to examine the X-ray emission of these groups, some of which we know have had targeted XMM observations. The XMM data is what we shall be working with here, but we may also check to see whether the groups fall near to any other X-ray observations for missions implemented in DAXA.

Once identified, the XMM observations will be downloaded, and their raw data processed into a state suitable for analysis.

## Import statements

In [1]:
import daxa
daxa.NUM_CORES = 10
daxa.OUTPUT = "/mnt/gs21/scratch/turne540/OVI_groups/data/"
daxa.config.OUTPUT = "/mnt/gs21/scratch/turne540/OVI_groups/data/"
daxa.mission.base.OUTPUT = "/mnt/gs21/scratch/turne540/OVI_groups/data/"
from daxa.archive import Archive
from daxa.process.simple import full_process_xmm
from daxa.mission import XMMPointed, Chandra, NuSTARPointed

import pandas as pd
from astropy.units import Quantity
import os

## Reading the sample file

Reading in the sample file that contains the positions of the groups, these will be used to locate relevant X-ray observations:

In [2]:
samp = pd.read_csv("../sample_files/init_group_info.csv")
samp

Unnamed: 0,name,est_ra,est_dec,redshift
0,25124,243.629055,26.73024,0.186
1,44739,229.79362,28.33175,0.118
2,19670A,150.25671,50.793942,0.134
3,19670B,150.21492,50.805014,0.134
4,12833,129.4968,44.2487,0.145
5,44858,230.112,28.88775,0.127
6,PHL1811,328.76992,-9.588805,0.077


## Searching for data

We can use DAXA classes to search the online databases of observations maintained for several X-ray telescopes. We will only make use of XMM data in this analysis, but it will be interesting to see what other data are available.

### XMM-Newton Pointed

We define an XMMPointed instance, which makes the distinction between XMM data that were taken in 'pointing' mode, when the attitude of the spacecraft was fixed (or nearly fixed), and data taken when XMM was slewing to its next target. Slew data is harder to analyse, with decreased spatial resolution, and is not yet implemented in DAXA.

The RA-Dec values for the groups of interest are used to filter the entire archive on position, searching for any XMM pointed observations with an aimpoint within 40 arcminutes of a position of interest.

In [3]:
# Define the instance
xmm = XMMPointed()
# 
xmm.filter_on_positions(samp[['est_ra', 'est_dec']].values, Quantity(40, 'arcmin'))

  self._fetch_obs_info()


This part is a small bodge to account for the fact that the proprietary data I want to make use of in this notebook isn't associated with my XMM account, and I've just been handed the downloaded data. This part may not last very long as I think I'll probably add a more elegant way of manually adding data to a DAXA mission (one that doesn't involve accessing protected attributes). 

This essentially relies on us having copied the unpacked ODFs into the xmm_pointed_raw DAXA directory, and then manually altering the observation info dataframe to trick DAXA that it is in a proprietary usable period. It'll check for the ObsIDs when it does the download, find them there, and that will be that:

In [4]:
prop_obsids = ['0900700101', '0900700201']
for obs_id in prop_obsids:
    if os.path.exists('../data/xmm_pointed_raw/{}'.format(obs_id)): 
        rel_rows_msk = xmm._obs_info['ObsID'] == obs_id
        xmm._obs_info.loc[rel_rows_msk, 'proprietary_usable'] = True
xmm.filtered_obs_info

Unnamed: 0,ra,dec,ObsID,start,science_usable,duration,proprietary_end_date,revolution,proprietary_usable,end
3802,328.7562,-9.373528,204310101,2004-11-01 09:06:42,True,0 days 09:08:39,2005-12-01 00:00:00,897,True,2004-11-01 18:15:21
12006,328.75625,-9.373333,761910201,2015-11-29 09:38:07,True,0 days 16:30:00,2016-12-11 23:00:00,2925,True,2015-11-30 02:08:07
13309,243.555,26.071167,801892301,2018-01-22 11:06:41,True,0 days 06:23:20,2019-02-08 23:00:00,3319,True,2018-01-22 17:30:01
13847,230.111667,28.886111,820240301,2019-01-14 04:17:53,True,0 days 06:06:40,2020-02-04 23:00:00,3498,True,2019-01-14 10:24:33
14961,129.414208,44.284083,861080201,2020-10-07 12:28:01,True,0 days 06:18:20,2021-11-13 00:00:00,3815,True,2020-10-07 18:46:21
14962,129.414208,44.284083,861080501,2020-10-07 10:09:04,True,0 days 02:18:57,2021-11-13 00:00:00,3815,True,2020-10-07 12:28:01
14965,150.166375,50.783694,861080101,2020-10-11 11:57:49,True,0 days 06:23:20,2021-11-02 00:00:00,3817,True,2020-10-11 18:21:09
14972,150.166375,50.783694,861080601,2020-10-11 09:53:44,True,0 days 02:04:05,2021-11-02 00:00:00,3817,True,2020-10-11 11:57:49
16245,229.803958,28.380667,900700201,2022-08-17 12:04:49,True,0 days 08:13:20,2023-08-30 00:00:00,4155,True,2022-08-17 20:18:09
16293,243.542417,26.647694,900700101,2022-09-12 02:31:27,True,0 days 08:03:20,2023-09-29 00:00:00,4168,True,2022-09-12 10:34:47


### Chandra

We perform a similar search in the Chandra archive, though we won't download or process any Chandra data:

In [5]:
chandra = Chandra()
chandra.filter_on_positions(samp[['est_ra', 'est_dec']].values, Quantity(40, 'arcmin'))
chandra.filtered_obs_info

Unnamed: 0,ra,dec,ObsID,science_usable,proprietary_usable,start,end,duration,proprietary_end_date,target_category,detector,grating,data_mode
8821,129.49708,44.24889,15378,True,True,2013-01-04 02:14:34.000002,2013-01-04 07:49:14.000002,0 days 05:34:40,2014-01-04,GCL,ACIS-I,NONE,TE_00458
9370,130.03917,44.365,21564,True,True,2019-01-04 18:55:19.999998,2019-01-05 00:24:59.999998,0 days 05:29:40,2020-01-07,GCL,ACIS-I,NONE,TE_006E6
11979,130.03917,44.365,22035,True,True,2019-01-06 19:19:28.000001,2019-01-06 22:40:48.000001,0 days 03:21:20,2020-01-07,GCL,ACIS-I,NONE,TE_006E6
13802,328.75625,-9.37353,2958,True,True,2001-12-17 22:59:37.999999,2001-12-18 01:45:37.999999,0 days 02:46:00,2002-12-21,AGN,ACIS-S,NONE,TE_002A2
14238,328.75625,-9.37353,2957,True,True,2001-12-05 08:43:41.000002,2001-12-05 11:22:01.000002,0 days 02:38:20,2002-12-18,AGN,ACIS-S,NONE,TE_002A2
17584,230.48917,28.98831,4791,True,True,2004-04-12 03:22:08.000002,2004-04-12 04:39:28.000002,0 days 01:17:20,2005-04-20,AGN,ACIS-S,NONE,TE_003C2
19467,328.75625,-9.37342,15357,True,True,2012-11-24 20:01:44.999999,2012-11-24 20:36:34.999999,0 days 00:34:50,2013-11-26,AGN,ACIS-S,NONE,TE_00B4A


### NuSTAR Pointed

Finally we search the NuSTAR archive, though the search distance has been decreased to 20 arcminutes as NuSTAR has a smaller field of view. Again we are only searching through NuSTAR pointed data, no data taken whilst slewing is included in the search:

In [6]:
nustar = NuSTARPointed()
nustar.filter_on_positions(samp[['est_ra', 'est_dec']].values, Quantity(20, 'arcmin'))
nustar.filtered_obs_info

Unnamed: 0,ra,dec,ObsID,science_usable,proprietary_usable,start,end,duration,proprietary_end_date,target_category,exposure_a,exposure_b,ontime_a,ontime_b,nupsdout,issue_flag
816,328.7406,-9.3989,60101004002,True,True,2015-11-28 18:46:08.184003,2015-11-30 04:11:08.184002,1 days 09:24:59.999999,2016-12-11 00:00:00,AGN,0 days 15:12:38,0 days 15:10:13,0 days 16:23:15,0 days 16:23:54,0,0


## Creating an Archive

We'll set up a DAXA archive - which will be most useful when dealing with multi-mission data, but is required here to process the data. The archive will be named, and the processed raw data will be stored in it, as well as logs of the processing steps.

It also automatically downloads the XMM data, if we haven't already triggered that from the mission class instance:

In [7]:
ovi_group_arch = Archive(xmm, 'OVIGroups', clobber=True)
ovi_group_arch.info()


-----------------------------------------------------
Number of missions - 1
Total number of observations - 10
Beginning of earliest observation - 2004-11-01 09:06:42
End of latest observation - 2022-09-12 10:34:47

-- XMM-Newton Pointed --
   Internal DAXA name - xmm_pointed
   Chosen instruments - M1, M2, PN
   Number of observations - 10
   Fully Processed - False
-----------------------------------------------------



  ovi_group_arch = Archive(xmm, 'OVIGroups', clobber=True)
  mission.download()
  mission.download()


## Processing XMM data

The different processing steps that need to be applied to XMM can be controlled separately, with an array of user-configurable options; but a full processing stack to go from raw data to cleaned event lists, images, and exposure maps is available. It uses default settings and should produce very useful data:

In [8]:
full_process_xmm(ovi_group_arch)

XMM-Newton Pointed - Generating calibration files: 100%|██████████| 10/10 [01:00<00:00,  6.00s/it]
XMM-Newton Pointed - Generating ODF summary files: 100%|██████████| 10/10 [02:06<00:00, 12.64s/it]
XMM-Newton Pointed - Assembling PN and PN-OOT event lists: 100%|██████████| 8/8 [18:12<00:00, 136.62s/it]
XMM-Newton Pointed - Assembling MOS event lists: 100%|██████████| 20/20 [03:04<00:00,  9.21s/it]
XMM-Newton Pointed - Finding PN/MOS soft-proton flares: 100%|██████████| 28/28 [00:50<00:00,  1.80s/it]
XMM-Newton Pointed - Generating cleaned PN/MOS event lists: 100%|██████████| 27/27 [00:13<00:00,  1.96it/s]
XMM-Newton Pointed - Generating final PN/MOS event lists: 100%|██████████| 23/23 [00:00<00:00, 37.15it/s]
Generating products of type(s) ccf: 100%|██████████| 8/8 [01:06<00:00,  8.28s/it]
Generating products of type(s) image: 100%|██████████| 23/23 [00:01<00:00, 13.22it/s]
Generating products of type(s) expmap: 100%|██████████| 23/23 [04:36<00:00, 12.04s/it]
Generating products of typ

## Checking the data

Here we use the logging facilities of DAXA archives to investigate whether all the data were processed fully and, if they weren't, which observations failed and at which steps. That way we can decide whether there is something extra that can be done for those observations, or there are no further steps to take.

### Overall processing success

The first step is to check the 'overall' success, which will only report as False if all data for an entire observation (i.e. PN, MOS1, and MOS2) have been marked as failing processing. We can see that the processing of two observations has failed outright:

In [9]:
ovi_group_arch.final_process_success

{'xmm_pointed': {'0204310101': True,
  '0761910201': True,
  '0801892301': True,
  '0820240301': True,
  '0861080201': True,
  '0861080501': False,
  '0861080101': True,
  '0861080601': False,
  '0900700201': True,
  '0900700101': True}}

### 0861080601 - Outright failure

We can examine the 'observation summary' for 0861080601, which is parsed from the SAS summary file generated by ODF ingest. While the instruments all took science observations, we can see that each of the three telescopes had their filter wheels in 'CalClosed' position. Given the similar ObsID of the other outright failure, it seems likely that it will be for the same reason:

In [10]:
ovi_group_arch.observation_summaries['xmm_pointed']['0861080601']

{'M1': {'active': True,
  'num_exp': 1,
  'exposures': {'S001': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'CalClosed',
    'ccd_modes': {1: 'Imaging',
     2: 'Imaging',
     4: 'Imaging',
     5: 'Imaging',
     7: 'Imaging'}}}},
 'M2': {'active': True,
  'num_exp': 1,
  'exposures': {'S002': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'CalClosed',
    'ccd_modes': {1: 'Imaging',
     2: 'Imaging',
     3: 'Imaging',
     4: 'Imaging',
     5: 'Imaging',
     6: 'Imaging',
     7: 'Imaging'}}}},
 'PN': {'active': True,
  'num_exp': 14,
  'exposures': {'S003': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'CalClosed',
    'ccd_modes': {1: 'Imaging',
     2: 'Imaging',
     3: 'Imaging',
     4: 'Imaging',
     5: 'Imaging',
     6: 'Imaging',
     7: 'Imaging',
     8: 'Imaging',
     9: 'Imaging',
     10: 'Imaging',
     11: 'Imaging',
     12: 'Imaging'

### 0861080501 - Outright failure

This has failed outright for the same reason as 0861080601, all observations were taken with the CalClosed filter:

In [11]:
ovi_group_arch.observation_summaries['xmm_pointed']['0861080501']

{'M1': {'active': True,
  'num_exp': 1,
  'exposures': {'S001': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'CalClosed',
    'ccd_modes': {1: 'Imaging',
     2: 'Imaging',
     4: 'Imaging',
     5: 'Imaging',
     7: 'Imaging'}}}},
 'M2': {'active': True,
  'num_exp': 1,
  'exposures': {'S002': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'CalClosed',
    'ccd_modes': {1: 'Imaging',
     2: 'Imaging',
     3: 'Imaging',
     4: 'Imaging',
     5: 'Imaging',
     6: 'Imaging',
     7: 'Imaging'}}}},
 'PN': {'active': True,
  'num_exp': 14,
  'exposures': {'S003': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'CalClosed',
    'ccd_modes': {1: 'Imaging',
     2: 'Imaging',
     3: 'Imaging',
     4: 'Imaging',
     5: 'Imaging',
     6: 'Imaging',
     7: 'Imaging',
     8: 'Imaging',
     9: 'Imaging',
     10: 'Imaging',
     11: 'Imaging',
     12: 'Imaging'

### Tracing ObsID-Instrument-SubExposure failures at different stages

We can also check the success flags for the different processing stages at an individual sub-exposure level, to see whether any of the observations had a partial failure in processing. The only problematic sub-exposure we find is '0801892301PNS003', the scheduled PN observation of 0801892301, which failed the `espfilt` (soft-proton filtering) stage and was not allowed to continue:

In [12]:
ovi_group_arch.process_success

{'xmm_pointed': {'cif_build': {'0820240301': True,
   '0801892301': True,
   '0861080501': True,
   '0900700201': True,
   '0900700101': True,
   '0861080101': True,
   '0204310101': True,
   '0861080201': True,
   '0861080601': True,
   '0761910201': True},
  'odf_ingest': {'0900700101': True,
   '0900700201': True,
   '0801892301': True,
   '0861080601': True,
   '0861080501': True,
   '0820240301': True,
   '0861080101': True,
   '0204310101': True,
   '0861080201': True,
   '0761910201': True},
  'epchain': {'0801892301PNS003': True,
   '0820240301PNS003': True,
   '0861080101PNS003': True,
   '0861080201PNS003': True,
   '0900700101PNS003': True,
   '0900700201PNS003': True,
   '0204310101PNS003': True,
   '0761910201PNS003': True},
  'emchain': {'0861080201M1S001': True,
   '0900700201M1U003': True,
   '0861080201M1U003': True,
   '0900700101M1S001': True,
   '0820240301M1S001': True,
   '0861080101M1S001': True,
   '0204310101M1S001': True,
   '0900700201M1S001': True,
   '08018

### 0801892301PNS003 - `espfilt`

We can check the logs for errors that occured during the espfilt run for 0801892301PNS003, looking at both the `process_errors` entry (which attempts to parse raw SAS errors into something more compact), and the `raw_porcess_errors` entry, which gives the whole stderr output.

By doing this we can see that it failed because the PN data for 0801892301 were taken in a small window mode that is not supported by espfilt:

In [13]:
ovi_group_arch.process_errors['xmm_pointed']['espfilt']['0801892301PNS003']

['noPNSmallWindow raised by espfilt - Sorry, espfilt does not work with EPN *Small* window SUBMODEs']

In [14]:
ovi_group_arch.raw_process_errors['xmm_pointed']['espfilt']['0801892301PNS003']

'** espfilt: error (noPNSmallWindow), Sorry, espfilt does not work with EPN *Small* window SUBMODEs\nmv: cannot stat ‘pnS003-gti-2500-8500.fits’: No such file or directory\nmv: cannot stat ‘pnS003-allevc-2500-8500.fits’: No such file or directory\nmv: cannot stat ‘pnS003-hist-2500-8500.qdp’: No such file or directory\n'

## Assigning regions to XMM observations

Source regions are a key part of any analysis, but DAXA does not currently have the capability to apply source detection to observations that it processes. As such we shall use source regions generated by the XMM Cluster Survey's (XCS) XAPA source finder. Passing them into the DAXA archive will a) store them in the DAXA archive directory structure, and b) convert them from their original pixel coordinates to RA-Dec coordinates (this is safe because I know that the XGA images are the same binning as XCS images used for source detection):

In [15]:
reg_path = "../sample_files/xcs_regions/{o_id}_pix.reg"
im_path = "../data/archives/OVIGroups/processed_data/xmm_pointed/{o_id}/images/{o_id}_mos1_0.5-2.0keVimg.fits"

In [16]:
av_reg = [f.split('_')[0] for f in os.listdir('../sample_files/xcs_regions/')]
reg_dec = {'xmm_pointed': {o: {'regions': reg_path.format(o_id=o), 'wcs_src': im_path.format(o_id=o)} 
                           for o in av_reg}}

ovi_group_arch.source_regions = reg_dec