# Creating and interacting with a DAXA Archive

This tutorial will explain the basic concepts behind the second type of important class in DAXA, the Archive class (with mission classes being the first type, see [the missions tutorial](missions.html)). DAXA Archives are what manage the datasets that we download from various missions, enabling easy access and greatly simplifying processing/reduction - they allow you to stop thinking about all the files and settings that any large dataset entails.

We will cover the following:

* Setting up an Archive from scratch, using filtered DAXA missions.
* Loading an existing Archive from disk.
* The properties of an Archive.
* Accessing processing logs and success information (though we do not cover processing in this part of the documentation).

## Import Statements

In [1]:
from daxa.mission import XMMPointed, Chandra, eRASS1DE, ROSATPointed
from daxa.archive import Archive

## What is a DAXA archive?

DAXA Archives take a set of filtered missions, make sure that their data are downloaded, and enable easy access and organisation of all data files and processing functions. Key functionality includes:

* Storing the logs and errors of all processing steps (if run).
* Allowing for their easy retrieval. 
* Managing the myriad files produced during the processing.
* Keeping track of which processes failed for which data, ensuring that any further processing only runs on data that have successfully passed through the earlier processes.

Archives can also be loaded back into DAXA at a later date, so that the processing logs of data that has since been found to be problematic can be easily inspected, or indeed so that processing steps can be re-run with different settings; this also allows for archives to be updated, if more data become available.

## Creating a new archive

Here we will demonstrate how to set up a new DAXA Archive from scratch - this information can be combined with the [the missions tutorial](missions.html) and the <font color='red'>case studies</font> to create an archive from any dataset you might be using.

### Step 1 - Set up and filter missions 

The first thing we have to do is to select the observations that we wish to include in the archive (and indeed the missions that we wish to include). The missions all have different characteristics, so your choice of which to include will be heavily dependent on your science case.

Here we will create an archive of XMM, Chandra, eROSITA All-Sky DR1, and ROSAT pointed observations of a famous galaxy cluster (though the archive would behave the same if it held data for a large sample of objects).

First of all, we define instances of the mission classes that we wish to include:

In [2]:
xm = XMMPointed()
ch = Chandra()
er = eRASS1DE()
rp = ROSATPointed()

  self._fetch_obs_info()


Then we filter them to only include observations of our cluster:

In [3]:
xm.filter_on_name("A3667")
ch.filter_on_name("A3667")
er.filter_on_name("A3667")
rp.filter_on_name("A3667")

  fov = self.fov


We then download the available data (though the declaration of an Archive would also trigger this, we do it this way because we wish to download pre-generated products for Chandra and ROSAT pointed observations):

In [4]:
xm.download()
ch.download(download_products=True)
er.download()
rp.download(download_products=True)

  xm.download()
Downloading Chandra data: 100%|███████████████████████████████████████████████| 12/12 [03:28<00:00, 17.38s/it]
Downloading eRASS DE:1 data: 100%|██████████████████████████████████████████████| 1/1 [00:05<00:00,  5.54s/it]
Downloading ROSAT Pointed data: 100%|███████████████████████████████████████████| 3/3 [00:19<00:00,  6.36s/it]


### Step 2 - Setting up an Archive object

Now we create the actual DAXA Archive instance - all this requires is for us to choose an archive name (which is what will be used to load it back in at a later date, if necessary) and to pass in the filtered missions that we have already created:

In [5]:
arch = Archive("A3667", [xm, ch, er, rp])

Now we've declared it, we can use the `info()` method to get a summary of its current status, including the amount of data available:

In [7]:
arch.info()


-----------------------------------------------------
Number of missions - 4
Total number of observations - 24
Beginning of earliest observation - 1992-04-14 18:55:38.000003
End of latest observation - 2020-04-20 12:23:50

-- XMM-Newton Pointed --
   Internal DAXA name - xmm_pointed
   Chosen instruments - M1, M2, PN
   Number of observations - 8
   Fully Processed - False

-- Chandra --
   Internal DAXA name - chandra
   Chosen instruments - ACIS-I, ACIS-S, HRC-I, HRC-S
   Number of observations - 12
   Fully Processed - False

-- eRASS DE:1 --
   Internal DAXA name - erosita_all_sky_de_dr1
   Chosen instruments - TM1, TM2, TM3, TM4, TM5, TM6, TM7
   Number of observations - 1
   Fully Processed - False

-- ROSAT Pointed --
   Internal DAXA name - rosat_pointed
   Chosen instruments - PSPCB, PSPCC, HRI
   Number of observations - 3
   Fully Processed - False
-----------------------------------------------------



### Step 3 - Processing the Archive

We're not actually going to cover _how_ to process things here, as each telescope tends to have its own backend software with a unique way of doing things; they each have their own processing tutorials, which will demonstrate both a one-line processing method, and how to control the reduction in more detail. Any processing method will take the archive object as an argument, and act on the data stored within it.

So instead we include this step here to highlight that the next logical step after the creation of a new archive is to run processing and reduction routines, if raw data have been downloaded. The successful completion of this step will leave you with an archive of data that you can easily manage, access, and use for your scientific analyses.

If you elected to download existing products (most missions support this), then only one processing step is necessary - this reorganises the downloaded data so that it is compatible with DAXA storage and file naming conventions. **It will have run automatically on declaration**

## Loading an existing archive

As we have intimated, previously created archives can be loaded back in to memory in exactly the same state as when they were saved. We will demonstrate this here with an archive we prepared earlier - it has had XMM processing applied, which will allow us to demonstrate the logging and management functionality. 

Reloading an archive has a number of possible applications:

* Access to archive data management functions - e.g. locating specific data files, identifying what observations are available.
* Checking processing logs - e.g. finding errors or warnings in the processing of data that has since been identified as problematic.
* Updating the archive - either adding another mission, or using the archive to check for new data matching your original mission filtering operations (these are stored in the mission saves, so can be re-run automatically).

All you need to do is set up an Archive instance and pass the name of an existing archive - this assumes your code is running in the same directory as it was originally, as Archives are stored in 'daxa_output' (if the DAXA configuration file hasn't been altered). The configuration can also be altered so that all DAXA outputs are stored in an absolute path, in which case defining an Archive object with the name of an existing dataset would work from any directory).

Loading in an archive:

We note

In [8]:
help(Archive)

Help on class Archive in module daxa.archive.base:

class Archive(builtins.object)
 |  Archive(archive_name: str, missions: Union[List[daxa.mission.base.BaseMission], daxa.mission.base.BaseMission] = None, clobber: bool = False)
 |  
 |  The Archive class, which is to be used to consolidate and provide some interface with a set
 |  of mission's data. Archives can be passed to processing and cleaning functions in DAXA, and also
 |  contain convenience functions for accessing summaries of the available data.
 |  
 |  :param str archive_name: The name to be given to this archive - it will be used for storage
 |      and identification. If an existing archive with this name exists it will be read in, unless clobber=True.
 |  :param List[BaseMission]/BaseMission missions: The mission, or missions, which are to be included
 |      in this archive - any setup processes (i.e. the filtering of data to be acquired) should be
 |      performed prior to creating an archive. The default value is No

## Archive properties

### ...

## 