# Creating and interacting with a DAXA Archive

This tutorial will explain the basic concepts behind the second type of important class in DAXA, the Archive class (with mission classes being the first type, see [the missions tutorial](missions.html)). DAXA Archives are what manage the datasets that we download from various missions, enabling easy access and greatly simplifying processing/reduction - they allow you to stop thinking about all the files and settings that any large dataset entails.

We will cover the following:

* Setting up an Archive from scratch, using filtered DAXA missions.
* Loading an existing Archive from disk.
* The properties of an Archive.
* Accessing processing logs and success information (though we do not cover processing in this part of the documentation).

## Import Statements

In [1]:
from daxa.mission import XMMPointed, Chandra, eRASS1DE, ROSATPointed
from daxa.archive import Archive

import os

## What is a DAXA archive?

DAXA Archives take a set of filtered missions, make sure that their data are downloaded, and enable easy access and organisation of all data files and processing functions. Key functionality includes:

* Storing the logs and errors of all processing steps (if run).
* Allowing for their easy retrieval. 
* Managing the myriad files produced during the processing.
* Keeping track of which processes failed for which data, ensuring that any further processing only runs on data that have successfully passed through the earlier processes.

Archives can also be loaded back into DAXA at a later date, so that the processing logs of data that has since been found to be problematic can be easily inspected, or indeed so that processing steps can be re-run with different settings; this also allows for archives to be updated, if more data become available.

## Creating a new archive

Here we will demonstrate how to set up a new DAXA Archive from scratch - this information can be combined with the [the missions tutorial](missions.html) and the <font color='red'>case studies</font> to create an archive from any dataset you might be using.

### Step 1 - Set up and filter missions 

The first thing we have to do is to select the observations that we wish to include in the archive (and indeed the missions that we wish to include). The missions all have different characteristics, so your choice of which to include will be heavily dependent on your science case.

Here we will create an archive of XMM, Chandra, eROSITA All-Sky DR1, and ROSAT pointed observations of a famous galaxy cluster (though the archive would behave the same if it held data for a large sample of objects).

First of all, we define instances of the mission classes that we wish to include:

In [2]:
xm = XMMPointed()
ch = Chandra()
er = eRASS1DE()
rp = ROSATPointed()

  self._fetch_obs_info()


Then we filter them to only include observations of our cluster:

In [3]:
xm.filter_on_name("A3667")
ch.filter_on_name("A3667")
er.filter_on_name("A3667")
rp.filter_on_name("A3667")

  fov = self.fov


We then download the available data (though the declaration of an Archive would also trigger this, we do it this way because we wish to download pre-generated products for Chandra and ROSAT pointed observations):

In [4]:
xm.download()
ch.download(download_products=True)
er.download()
rp.download(download_products=True)

Downloading XMM-Newton Pointed data: 100%|██████████████████████████████████████| 8/8 [01:32<00:00, 11.62s/it]
Downloading Chandra data: 100%|███████████████████████████████████████████████| 12/12 [02:52<00:00, 14.36s/it]
Downloading eRASS DE:1 data: 100%|██████████████████████████████████████████████| 1/1 [00:04<00:00,  4.22s/it]
Downloading ROSAT Pointed data: 100%|███████████████████████████████████████████| 3/3 [00:16<00:00,  5.66s/it]


### Step 2 - Setting up an Archive object

Now we create the actual DAXA Archive instance - all this requires is for us to choose an archive name (which is what will be used to load it back in at a later date, if necessary) and to pass in the filtered missions that we have already created:

In [5]:
arch = Archive("A3667", [xm, ch, er, rp], clobber=True)

  arch = Archive("A3667", [xm, ch, er, rp], clobber=True)


Now we've declared it, we can use the `info()` method to get a summary of its current status, including the amount of data available:

In [6]:
arch.info()


-----------------------------------------------------
Number of missions - 4
Total number of observations - 24
Beginning of earliest observation - 1992-04-14 18:55:38.000003
End of latest observation - 2020-04-20 12:23:50

-- XMM-Newton Pointed --
   Internal DAXA name - xmm_pointed
   Chosen instruments - M1, M2, PN
   Number of observations - 8
   Fully Processed - False

-- Chandra --
   Internal DAXA name - chandra
   Chosen instruments - ACIS-I, ACIS-S, HRC-I, HRC-S
   Number of observations - 12
   Fully Processed - False

-- eRASS DE:1 --
   Internal DAXA name - erosita_all_sky_de_dr1
   Chosen instruments - TM1, TM2, TM3, TM4, TM5, TM6, TM7
   Number of observations - 1
   Fully Processed - False

-- ROSAT Pointed --
   Internal DAXA name - rosat_pointed
   Chosen instruments - PSPCB, PSPCC, HRI
   Number of observations - 3
   Fully Processed - False
-----------------------------------------------------



### Step 3 - Processing the Archive

We're not actually going to cover _how_ to process things here, as each telescope tends to have its own backend software with a unique way of doing things; they each have their own processing tutorials, which will demonstrate both a one-line processing method, and how to control the reduction in more detail. Any processing method will take the archive object as an argument, and act on the data stored within it.

So instead we include this step here to highlight that the next logical step after the creation of a new archive is to run processing and reduction routines, if raw data have been downloaded. The successful completion of this step will leave you with an archive of data that you can easily manage, access, and use for your scientific analyses.

If you elected to download existing products (most missions support this), then only one processing step is necessary - this reorganises the downloaded data so that it is compatible with DAXA storage and file naming conventions. **It will have run automatically on declaration**

### Step 4 - Using the data

### Note on saving Archives

Archive instances can be saved and loaded back in (as you'll see in the next section). This can be triggered manually by running the `save()` method, but __this shouldn't be generally necessary__ - this is because the archive is automatically saved upon first setup, and after every processing step.

## Loading an existing archive

As we have intimated, previously created archives can be loaded back in to memory in exactly the same state as when they were saved. We will demonstrate this here with an archive we prepared earlier - it has had XMM processing applied, which will allow us to demonstrate the logging and management functionality. 

Reloading an archive has a number of possible applications:

* Access to archive data management functions - e.g. locating specific data files, identifying what observations are available.
* Checking processing logs - e.g. finding errors or warnings in the processing of data that has since been identified as problematic.
* Updating the archive - either adding another mission, or using the archive to check for new data matching your original mission filtering operations (these are stored in the mission saves, so can be re-run automatically).

All you need to do is set up an Archive instance and pass the name of an existing archive - this assumes your code is running in the same directory as it was originally, as Archives are stored in 'daxa_output' (if the DAXA configuration file hasn't been altered). The configuration can also be altered so that all DAXA outputs are stored in an absolute path, in which case defining an Archive object with the name of an existing dataset would work from any directory).

Loading in an archive (note that you don't need to pass any missions, loading the archive back in will also reinstate the missions as they were when the Archive was last saved):

In [7]:
prev_arch = Archive("PHL1811_made_earlier")

  self._fetch_obs_info()


Once again, we will run the `info()` method, but note that for this archive the XMM-Newton Pointed mission is marked as 'fully processed':

In [8]:
prev_arch.info()


-----------------------------------------------------
Number of missions - 4
Total number of observations - 10
Beginning of earliest observation - 1990-10-31 00:00:00
End of latest observation - 2015-11-30 04:11:08.184002

-- XMM-Newton Pointed --
   Internal DAXA name - xmm_pointed
   Chosen instruments - M1, M2, PN
   Number of observations - 5
   Fully Processed - False

-- Chandra --
   Internal DAXA name - chandra
   Chosen instruments - ACIS-I, ACIS-S, HRC-I, HRC-S
   Number of observations - 3
   Fully Processed - False

-- NuSTAR Pointed --
   Internal DAXA name - nustar_pointed
   Chosen instruments - FPMA, FPMB
   Number of observations - 1
   Fully Processed - False

-- RASS --
   Internal DAXA name - rosat_all_sky
   Chosen instruments - PSPC
   Number of observations - 1
   Fully Processed - False
-----------------------------------------------------



We note that it _is_ possible to declare an Archive with a previously used name and overwrite it - you just have to pass `clobber=True` when you declare the Archive instance. We print the docstring of the Archive class here for reference:

In [9]:
print(Archive.__doc__)


    The Archive class, which is to be used to consolidate and provide some interface with a set
    of mission's data. Archives can be passed to processing and cleaning functions in DAXA, and also
    contain convenience functions for accessing summaries of the available data.

    :param str archive_name: The name to be given to this archive - it will be used for storage
        and identification. If an existing archive with this name exists it will be read in, unless clobber=True.
    :param List[BaseMission]/BaseMission missions: The mission, or missions, which are to be included
        in this archive - any setup processes (i.e. the filtering of data to be acquired) should be
        performed prior to creating an archive. The default value is None, but this should be set for any new
        archives, it can only be left as None if an existing archive is being read back in.
    :param bool clobber: If an archive named 'archive_name' already exists, then setting clobber to True
 

## Accessing component missions

The missions that were used to create an archive can be retrieved, giving access to their information tables - note that you cannot just use the filtering methods of a mission to change the data in the archive; altering the observations in an archive requires using <font color='red'>the archive `update()` method.</font>

To retrieve a mission you can either address the archive with the DAXA internal name of the mission, or get the whole list using the `missions` property:

In [10]:
prev_arch['xmm_pointed']

<daxa.mission.xmm.XMMPointed at 0x7fd3083df190>

In [11]:
prev_arch.missions

[<daxa.mission.xmm.XMMPointed at 0x7fd3083df190>,
 <daxa.mission.chandra.Chandra at 0x7fd3387b4160>,
 <daxa.mission.nustar.NuSTARPointed at 0x7fd369ef6340>,
 <daxa.mission.rosat.ROSATAllSky at 0x7fd34838cd90>]

In [12]:
prev_arch['xmm_pointed'].filtered_obs_info

Unnamed: 0,ra,dec,ObsID,start,science_usable,duration,proprietary_end_date,revolution,proprietary_usable,end
922,157.7463,31.04889,102041001,2000-12-07 04:57:14,True,0 days 01:30:06,2002-04-06 00:00:00,182,True,2000-12-07 06:27:20
1044,67.555425,-61.35056,105261001,2000-09-27 06:56:36,True,0 days 04:16:54,2002-08-25 00:00:00,147,True,2000-09-27 11:13:30
3802,328.7562,-9.373528,204310101,2004-11-01 09:06:42,True,0 days 09:08:39,2005-12-01 00:00:00,897,True,2004-11-01 18:15:21
6014,234.89625,-83.59306,502671101,2008-04-01 17:24:48,True,0 days 05:25:20,2009-05-29 00:00:00,1522,True,2008-04-01 22:50:08
12006,328.75625,-9.373333,761910201,2015-11-29 09:38:07,True,0 days 16:30:00,2016-12-11 23:00:00,2925,True,2015-11-30 02:08:07


## Archive general properties

Here we run through the general properties of the archive class, summarising their meaning and content.

### Name

The `archive_name` class returns the name that was given to the archive on creation - this cannot be changed.

In [13]:
prev_arch.archive_name

'PHL1811_made_earlier'

### Archive Path

This property (`archive_path`) returns the absolute path to the top level of this archive's storage directory:

In [14]:
prev_arch.archive_path

'/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/'

### Mission Names

In addition to the `missions` property discussed earlier, we include a `mission_names` property which lists the internal names of the mission classes associated with the archive:

In [15]:
prev_arch.mission_names

['xmm_pointed', 'chandra', 'nustar_pointed', 'rosat_all_sky']

## Processing-related Archive properties

This section deals with Archive properties that are related to processing of the available data into something scientifically useful - this is why we loaded an existing archive with processing applied, to demonstrate the contents of these properties:

### Process Success

The `process_success` property is very important - it is a nested dictionary which records which processing steps were 'successful' (usually defined as no errors being detected, and expected files being found) for which data. Those that were successful have a boolean value of True, those that weren't have a boolean value of False.

Top level keys will always be mission name, the next level down will be the process name, and the layer below that will be the unique IDs of the data the process acted on. This is often an ObsID, but can also be ObsID + instrument name, or ObsID + instrument name + sub-exposure ID.

This allows you (but more importantly the archive itself) to know which stages have failed for which data - that in turn means any processing that is dependent on a previous stage can know which data to skip. All this ensures no interruptions when reducing large sets of data.

In this case we've run all processing steps on the XMM data in the archive, note the following entry:

* `espfilt` - 0502671101PNS003 

It failed safely, and was not considered for the next processing stages. **Also note that the ObsID 0102041001 does not appear** after the `odf_ingest` step, as all of its data was taken in CalClosed mode and can't be used for the study of the target objects:

In [16]:
prev_arch.process_success

{'xmm_pointed': {'cif_build': {'0502671101': True,
   '0761910201': True,
   '0105261001': True,
   '0102041001': True,
   '0204310101': True},
  'odf_ingest': {'0102041001': True,
   '0105261001': True,
   '0502671101': True,
   '0204310101': True,
   '0761910201': True},
  'epchain': {'0105261001PNS003': True,
   '0105261001PNU002': True,
   '0502671101PNS003': True,
   '0204310101PNS003': True,
   '0761910201PNS003': True},
  'emchain': {'0105261001M1U002': True,
   '0105261001M2U002': True,
   '0105261001M1S001': True,
   '0105261001M2S002': True,
   '0502671101M1S001': True,
   '0502671101M2S002': True,
   '0204310101M1S001': True,
   '0204310101M2S002': True,
   '0761910201M1S001': True,
   '0761910201M2S002': True},
  'espfilt': {'0105261001M1S001': True,
   '0105261001M2S002': True,
   '0105261001M1U002': True,
   '0105261001M2U002': True,
   '0204310101M2S002': True,
   '0204310101M1S001': True,
   '0502671101M1S001': True,
   '0761910201M1S001': True,
   '0502671101M2S002': T

## Process Names

This property (`process_names`) stores a list of the processes that have been run on each mission:

In [17]:
prev_arch.process_names

{'xmm_pointed': ['cif_build',
  'odf_ingest',
  'epchain',
  'emchain',
  'espfilt',
  'cleaned_evt_lists',
  'merge_subexposures'],
 'chandra': [],
 'nustar_pointed': [],
 'rosat_all_sky': []}

### Process Logs (stdout)

All of the logs for all processes run by DAXA are stored, and can be accessed through the `process_logs` property - this is structured in the exact same way as `process_success`, as a nested dictionary. The only difference here is that the final values are strings rather than booleans.

We show the log for a single process applied to a single piece of data, otherwise this tutorial document would be very long indeed - this particular process worked perfectly:

In [18]:
print(prev_arch.process_logs['xmm_pointed']['espfilt']['0204310101M1S001'])

espfilt:- Executing (routine): espfilt eventfile=/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0204310101/P0204310101M1S001MIEVLI0000.FIT withoot=no ootfile=dataset method=histogram withsmoothing=yes smooth=51 withbinning=yes binsize=60 ratio=1.2 withlongnames=yes elow=2500 ehigh=8500 rangescale=6 allowsigma=3 limits='0.1 6.5' keepinterfiles=no  -w 1 -V 4
espfilt:- espfilt (espfilt-4.3)  [xmmsas_20211130_0941-20.0.0] started:  2024-04-11T16:29:59.000
espfilt:-  ESPFILT: Processing eventlist: /Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0204310101/P0204310101M1S001MIEVLI0000.FIT
espfilt:-  *FOV IMAGE* = mos1S001-fovim-2500-8500.fits
evselect:- Executing (routine): evselect table=/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0204310101/P0204310101M1S001MIEV

### Process Raw Errors (stderr)

Related to the previous section is the `raw_process_errors` property, which stores the stderr output of any process that generated one - note that if no stderr was produced, then we do not create an entry. DAXA makes a distinction between 'raw process errors' and the 'process errors' you'll see in the next section - this is because we attempt to parse any stderr and extract particular errors, versus this property which is just the raw text:

In [19]:
prev_arch.raw_process_errors

{'xmm_pointed': {'cif_build': {},
  'odf_ingest': {},
  'epchain': {},
  'emchain': {},
  'cleaned_evt_lists': {},
  'merge_subexposures': {}},
 'chandra': {},
 'nustar_pointed': {},
 'rosat_all_sky': {}}

### Process Parsed Errors (parsed from stderr)

We attempt to extract the pertinent information from the raw error outputs, and this is what gets stored in `process_errors` - again with the same storage structure. Here we just show a single entry, for one of the pieces of data that we noted had failed a processing step in the `process_success` section of this tutorial:

In [20]:
prev_arch.process_errors['xmm_pointed']['espfilt']['0502671101PNS003']

['noCounts raised by espfilt - All histo counts are zero! Check your FOV Lightcurve!']

### Process Parsed Warnings (parsed from stderr)

We make a distinction between errors and warnings, as do many pieces of backend software. The `process_warnings` property acts exactly as the `process_errors` property, but it is warnings that have been extracted rather than errors:

In [21]:
prev_arch.process_warnings['xmm_pointed']['epchain']["0502671101PNS003"][109]

{'originator': 'epframes',
 'name': 'notHKconstant',
 'message': 'HK parameter F1725 is not constant, range = [           0 ,         256 ]. Use default:          256'}

### Process Extra Information

The `process_extra_info` property is something of a catch-all for any information passed into, or produced by, a processing step that the archive might need access to later on. This can include things like the paths to the key output files of a process. This isn't really meant to be useful to the user, but can still be accessed.

We only show the contents for one processing step (`cleaned_evt_lists`) so as not to make the tutorial too long:

In [22]:
prev_arch.process_extra_info['xmm_pointed']['cleaned_evt_lists']

{'0105261001M1S001': {'evt_clean_path': '/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0105261001/M1S001_clean.fits',
  'en_key': ''},
 '0105261001M2S002': {'evt_clean_path': '/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0105261001/M2S002_clean.fits',
  'en_key': ''},
 '0105261001M2U002': {'evt_clean_path': '/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0105261001/M2U002_clean.fits',
  'en_key': ''},
 '0105261001M1U002': {'evt_clean_path': '/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0105261001/M1U002_clean.fits',
  'en_key': ''},
 '0761910201M1S001': {'evt_clean_path': '/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_

### Final process success

This property (`final_process_success`) is the final arbiter of whether a particular ObsID of a mission can be used by the end user - this is only decided when whatever DAXA defines as the 'final processing step' for a particular mission is run (for XMM this is the assembly of cleaned event lists). 

If it is False, then none of the data for that ObsID reached the final processing step - note here that 0102041001 is False, this is because all the instruments were in calibration mode with the filter closed. As such, this ObsID is moved from the 'processed_data' directory in the archive storage structure, to the 'failed_data' directory.

In [23]:
prev_arch.final_process_success

{'xmm_pointed': {'0102041001': False,
  '0105261001': True,
  '0204310101': True,
  '0502671101': True,
  '0761910201': True},
 'chandra': {},
 'nustar_pointed': {},
 'rosat_all_sky': {}}

### Observation Summaries

This property (`observation_summaries`) contains a summary of the exact data available for particular observations of a mission, and is meant to allow DAXA processes to access whether they can run on a particular ObsID (not really relevant to end users). The property is populated in different ways for each mission - the XMMPointed mission, for instance, parses the output summary file from `odf_ingest` and turns it into an extremely detailed summary of the state of each instrument on XMM for an observation. 

For clarity we only show the output for one ObsID:

In [24]:
prev_arch.observation_summaries['xmm_pointed']['0204310101']

{'M1': {'active': True,
  'num_exp': 1,
  'exposures': {'S001': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'Thin1',
    'ccd_modes': {'1': 'Imaging',
     '2': 'Imaging',
     '3': 'Imaging',
     '4': 'Imaging',
     '5': 'Imaging',
     '6': 'Imaging',
     '7': 'Imaging'}}}},
 'M2': {'active': True,
  'num_exp': 1,
  'exposures': {'S002': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'Thin1',
    'ccd_modes': {'1': 'Imaging',
     '2': 'Imaging',
     '3': 'Imaging',
     '4': 'Imaging',
     '5': 'Imaging',
     '6': 'Imaging',
     '7': 'Imaging'}}}},
 'PN': {'active': True,
  'num_exp': 14,
  'exposures': {'S003': {'scheduled': True,
    'type': 'SCIENCE',
    'mode': 'PrimeFullWindow',
    'filter': 'Thin1',
    'ccd_modes': {'1': 'Imaging',
     '2': 'Imaging',
     '3': 'Imaging',
     '4': 'Imaging',
     '5': 'Imaging',
     '6': 'Imaging',
     '7': 'Imaging',
     '8': 'Imaging',
     '9

## Data management Archive functions

This section introduces the built-in methods that allow you to manage the archive and its data:

### Get path to data

One of the most useful methods of the archive class is `get_current_data_path()`, which allows you to programmatically retrieve the current path to a particular ObsID's data in the archive storage structure (this takes into account the `final_process_success` flag - remember that entirely failed data are moved to a 'failed_data' directory.

All we need to do is pass the mission name and the ObsID, and the top-level data path will be returned. Here we show two examples, one for an ObsID with a True final success flag, and one with a False final success flag:

In [25]:
prev_arch.get_current_data_path('xmm_pointed', '0204310101')

'/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0204310101/'

In [26]:
prev_arch.get_current_data_path('xmm_pointed', '0102041001')

'/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/failed_data/xmm_pointed/0102041001/'

### Get failed processes

Often we will wish to know exactly which data failed a particular processing step - this could be inferred from the `process_success` property, but we also provide a convenient get method (`get_failed_processes()`) that will retrieve the unique identifer of each piece of data that failed a specified processing step:

In [27]:
prev_arch.get_failed_processes('espfilt')

{'xmm_pointed': ['0502671101PNS003']}

### Get logs

The `get_process_logs()` method provides a more convenient way of accessing specific logs stored in the `process_logs` property. It allows for the retrieval of logs based on several criteria, with the only requirement being the passing of a process name. 

Beyond that you can specify the mission name, ObsID, and instrument to retrieve logs for - as for some processes there are sub-exposures of a given instrument, giving this information can still result in multiple logs being returned. You can also pass __lists__ of mission name, ObsID, and instrument and retrieve sets of logs that way.

It is also possible to specify an exact unique identifier (0502671101M2S002 for instance).

We demonstrate fetching the `espfilt` logs for a single PN instrument of 0502671101:

In [32]:
prev_arch.get_process_logs('espfilt', obs_id='0502671101', inst='PN')

{'xmm_pointed': {'0502671101PNS003': "espfilt:- Executing (routine): espfilt eventfile=/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0502671101/P0502671101PNS003PIEVLI0000.FIT withoot=yes ootfile=/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0502671101/P0502671101PNS003OOEVLI0000.FIT method=histogram withsmoothing=yes smooth=51 withbinning=yes binsize=60 ratio=1.2 withlongnames=yes elow=2500 ehigh=8500 rangescale=15 allowsigma=3 limits='0.1 6.5' keepinterfiles=no  -w 1 -V 4\nespfilt:- espfilt (espfilt-4.3)  [xmmsas_20211130_0941-20.0.0] started:  2024-04-11T16:30:06.000\nespfilt:-  ESPFILT: Processing eventlist: /Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0502671101/P0502671101PNS003PIEVLI0000.FIT\nespfilt:-  *FOV IMAGE* = pnS003-fovim-2500-8500.fits\nev

### Get raw errors

There is also an equivalent method, `get_process_raw_error_logs()` that can fetch raw error logs. It behaves exactly the same as `get_process_logs()`, see the last section for information on the possible arguments:

In [33]:
prev_arch.get_process_raw_error_logs('espfilt', obs_id='0502671101', inst='PN')



### Get failed logs

There is one final method that allows for log retrieval - `get_failed_logs()`. This only takes a process name as an argument, and will retrieve the logs and raw error logs for every piece of data that failed the specified processing step. It returns them as a tuple, with the first entry being the dictionary of logs and the second being the dictionary of raw errors:

In [30]:
logs, errors = prev_arch.get_failed_logs('espfilt')

We only show 0502671101M2S002:

In [34]:
logs['xmm_pointed']['0502671101PNS003']

"espfilt:- Executing (routine): espfilt eventfile=/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0502671101/P0502671101PNS003PIEVLI0000.FIT withoot=yes ootfile=/Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0502671101/P0502671101PNS003OOEVLI0000.FIT method=histogram withsmoothing=yes smooth=51 withbinning=yes binsize=60 ratio=1.2 withlongnames=yes elow=2500 ehigh=8500 rangescale=15 allowsigma=3 limits='0.1 6.5' keepinterfiles=no  -w 1 -V 4\nespfilt:- espfilt (espfilt-4.3)  [xmmsas_20211130_0941-20.0.0] started:  2024-04-11T16:30:06.000\nespfilt:-  ESPFILT: Processing eventlist: /Users/dt237/code/DAXA/docs/source/notebooks/tutorials/daxa_output/archives/PHL1811_made_earlier/processed_data/xmm_pointed/0502671101/P0502671101PNS003PIEVLI0000.FIT\nespfilt:-  *FOV IMAGE* = pnS003-fovim-2500-8500.fits\nevselect:- Executing (routine): evselec

In [35]:
errors



## Adding region files to the Archive (optional)

Region files are created by running source detection algorithms on images generated from X-ray observations, and DAXA **does not yet have the ability to generate them itself**. However, they are a crucial part of many analyses, and as such a crucial part of data archives.

If you have created region files (**in the DS9 format**), you can add them to the archive so that they are included in the storage structure. This process verifies that the passed region files are in RA-Dec coordinates (rather than defined in the pixel coordinates of the image they were detected in), by requiring that images or WCS information be passed in those circumstances.

* {'mission_name': {'ObsID': 'path to regions'}}
* {'mission_name': {'ObsID': [list of region objects]}}
* {'mission_name': {'ObsID': {'region': 'path to regions'}}}
* {'mission_name': {'ObsID': {'region': [list of region objects]}}}
* {'mission_name': {'ObsID': {'region': ..., 'wcs_src': 'path to image'}}}
* {'mission_name': {'ObsID': {'region': ..., 'wcs_src': XGA Image}}}
* {'mission_name': {'ObsID': {'region': ..., 'wcs_src': Astropy WCS object}}}

For instance, we have used a tool to generate source regions for the observations we have processed, and want to add those regions to the archive. As the region files are in pixel coordinates we pass the paths to the images we used (unnecessary if regions are already in RA-Dec coordinates):

In [36]:
# Listing the region files in the test directory
reg_files = os.listdir('region_files')

# Setting up the structure of the dictionary we will pass to the archive at the end of this
reg_paths = {'xmm_pointed': {}}
# Iterating through the ObsIDs in the XMMPointed mission
for oi in prev_arch['xmm_pointed'].filtered_obs_ids:
    # Checking to see which have a corresponding region file
    if any([oi in rf for rf in reg_files]):
        # Generating the path to the image we need for pixel to RA-Dec conversion
        im_pth = prev_arch.get_current_data_path('xmm_pointed', oi) + \
        'images/{}_mos1_0.5-2.0keVimg.fits'.format(oi)
        # Setting up the entry in the final dictionary, with the path to the regions and the image
        reg_paths['xmm_pointed'][oi] = {'regions': 'region_files/{}.reg'.format(oi), 'wcs_src': im_pth}

# Adding the regions to the archive
prev_arch.source_regions = reg_paths

Once added to the archive, you can also retrieve the regions through a property (we did not discuss it earlier in this tutorial) - `source_regions`. It returns them as Python 'regions' module objects:

In [37]:
prev_arch.source_regions

{'xmm_pointed': {'0105261001': [<EllipseSkyRegion(<SkyCoord (ICRS): (ra, dec) in deg
       (67.59133516, -61.53306334)>, width=0.007545166292795352 deg, height=0.007545166292795352 deg, angle=186.39335083127384 deg)>,
   <EllipseSkyRegion(<SkyCoord (ICRS): (ra, dec) in deg
       (67.178534, -61.37451667)>, width=0.008274114359963867 deg, height=0.008274114359963867 deg, angle=466.9975179351541 deg)>,
   <EllipseSkyRegion(<SkyCoord (ICRS): (ra, dec) in deg
       (67.46879016, -61.36577221)>, width=0.00850673285572214 deg, height=0.00850673285572214 deg, angle=140.94986422162475 deg)>,
   <EllipseSkyRegion(<SkyCoord (ICRS): (ra, dec) in deg
       (68.01317832, -61.35516589)>, width=0.007519977428013009 deg, height=0.007519977428013009 deg, angle=261.2582496754984 deg)>,
   <EllipseSkyRegion(<SkyCoord (ICRS): (ra, dec) in deg
       (68.00486199, -61.33624502)>, width=0.013147992908181612 deg, height=0.013147992908181612 deg, angle=265.95154602021546 deg)>,
   <EllipseSkyRegion(<SkyCo