# Overview of the data flow in ASTRA

**Goals of this notebook:**

- Very brief explanation/overview of the basic user interface to load data from disk:
    - Load data from .fits files
    - Configure the "instrument"
    - Reject observations based on different conditions (e.g. HEADER values)
    - Reject wavelength regions

## Loading data from disk

In this Section we look at how we can load spectral data from disk, which can be done in a general way through the DataClass object

```py 
from ASTRA.data_objects import DataClass
```

This object will ingest a list of observations, attribute them IDs (based on the hash of the filename) and divide them into different sub-Instruments. Furthermore, it will only open the spectra in memory when it is needed.

### How to setup our instrument

We can configure ASTRA to load files two different ways:

1) Through a path to a file that contains (in each line) the full path to the desired fits file
2) An iterable python object (e.g., a list, tuple) where each entry is the path to a fits file

In [8]:
from pathlib import Path 

data_in_path = list(Path("/home/amiguel/spectra_collection/ESPRESSO/proxima").glob("*.fits"))

### Selection and configuration of the Instrument

After generating the paths of the observations, the next step is to configure the instrument that we are using. The current version of ASTRA has two limitations:

1) We can't mix data from multiple instruments in the same *DataClass* object
2) It is not able to automatically determine the instrument associated with a given file.

This means that the user must manually define the instrument that is in use. Then, similarly to all other ASTRA objects, we can
[configure](../../user_guide/configuration) multiple parameters to fine-tune the data pre-processing.



In [9]:
from ASTRA.Instruments import ESPRESSO

instrument = ESPRESSO

inst_options = {
    "minimum_order_SNR": 10,
}

### Loading the data from disk

There are two ways of loading the data from disk (that work in the same fashion):
-  Load the data as an independent process (through *DataClassManager*)
-  Load the data in the main python process (through *DataClass*)

**Note:** Option A) makes use of python's proxy objects, serializing all communication. This means that we can use option A) to open all observation in one python core and share that data with multiple processes without re-opening data. 

In [10]:
from ASTRA.data_objects import DataClassManager
from ASTRA.data_objects.DataClass import DataClass

load_independent_process = False

if load_independent_process:  # Option A)
    manager = DataClassManager()
    manager.start()
    
    # This makes available the same functions as the usual DataClass object
    data: DataClass = manager.DataClass(data_in_path, instrument=ESPRESSO, instrument_options=inst_options, storage_path="")
else:  # Option B)
    data = DataClass(data_in_path, instrument=instrument, instrument_options=inst_options, storage_path="")

[32m2025-04-14 21:09:31.120[0m | [34m[1mDEBUG   [0m | [36mASTRA.utils.UserConfigs[0m:[36mreceive_user_inputs[0m:[36m216[0m - [34m[1mGenerating internal configs of  - [0m
[32m2025-04-14 21:09:31.123[0m | [1mINFO    [0m | [36mASTRA.utils.UserConfigs[0m:[36mreceive_user_inputs[0m:[36m221[0m - [1mChecking for any parameter that will take default value[0m
[32m2025-04-14 21:09:31.124[0m | [34m[1mDEBUG   [0m | [36mASTRA.utils.UserConfigs[0m:[36mreceive_user_inputs[0m:[36m228[0m - [34m[1mConfiguration <SAVE_DISK_SPACE> using the default value: DISK_SAVE_MODE.DISABLED[0m
[32m2025-04-14 21:09:31.126[0m | [34m[1mDEBUG   [0m | [36mASTRA.utils.UserConfigs[0m:[36mreceive_user_inputs[0m:[36m228[0m - [34m[1mConfiguration <WORKING_MODE> using the default value: WORKING_MODE.ONE_SHOT[0m
[32m2025-04-14 21:09:31.126[0m | [1mINFO    [0m | [36mASTRA.data_objects.DataClass[0m:[36m__init__[0m:[36m126[0m - [1mDataClass opening 3 files from a list

## Removing activity indicators (Optional)

- ASTRA allows the rejection of specific wavelength intervals, that are known to be more sensitive to activity.
- By default, we remove lines that are typically used as activity indicators (on the optical domain, NIR is not yet included)
- This interface can also be used to manuall remove other wavelength regions, as long as it is configured to do so



In [5]:
from ASTRA.Quality_Control.activity_indicators import Indicators

inds = Indicators()

### Removing extra regions

- We must define a unique name (i.e. no repetitions, even among the default "features"
- We must define a region that will be removed from **all** observations that have been loaded from disk
- BY default we assume that the region is defined in air. Change to vacuum by passing vacuum_wavelength=True

In [6]:
inds.add_feature(name="feature_1", region=[5000, 5500], vacuum_wavelengths=True)

### Applying the selected region

Lastly, we have to ingest this object in our *DataClass* object, so that the rejected wavelengths are included in the spectral mask.

In [7]:
data.remove_activity_lines(inds)

[32m2025-04-14 17:16:32.407[0m | [1mINFO    [0m | [36mASTRA.data_objects.DataClass[0m:[36mremove_activity_lines[0m:[36m216[0m - [1mComputing activity windows for each RV measurements[0m
