# TAT-C for Scalable Services Report
PI: Paul T. Grogan paul.grogan@asu.edu

Contributors: Josue Tapia josue.tapia@asu.edu and Suvan Kumar skuma208@asu.edu

# Introduction

This report provides an overview of the Tradespace Analysis Tool for Constellations (TAT-C) architecture for scalable services to support trade space exploration and assesment of hypothetical mission designs autonomously and to interface with the NOAA's ASPEN trade space evaluation tool. The new architecture leverages the Celery python library to distribute and execute tasks in parallel, reducing the simulation runtime per mission assesment.This work has been performed under the project "OSSE / Trade Space Capability for NOAA's Future Mission Design."

# Background
## TAT-C Capabilities
The Tradespace Analysis Tool for Constellations (TAT-C) is an open-source Python software package for early-stage mission modeling, simulation, and analysis for Earth-observing satellite constellations. It was developed with support from the NASA Advanced Information Systems Technology (AIST) program grants NNX17AE06G, 80NSSC17K0586, 80NSSC20K1118, and 80NSSC21K1515. This project uses TAT-C version 3 which is the third major revision to the tool.

TAT-C simulates the orbital motion and observability conditions of satellite-based instruments and produces data such as orbit track, ground track (projected sensor area), and observation records. Analysis methods compute key mission performance metrics such as revisit time (time between subsequent observations of a fixed point) and data latency (time from observation of a fixed point until the first available downlink opportunity) to support trade studies.

The overall objective of this project is to use TAT-C as a pre-processor for the Advanced Systems Performance Evaluation tool for NOAA (ASPEN) Sensor Constellation/Performance (SCP) table. SCP columns that can be informed by TAT-C analysis include (descriptions from ASPEN-91 definition tables):

*  Temporal Refresh: "Time between obsevations at a location, i.e., time to obseve the geographic coverage region D."
*  Data Latency: "Time from 'image taken' to full relay of data to a ground station." (Note: TAT-C does not consider processing time as a part of data latency; in other words, an additional factor must be added to the TAT-C results.)
  
TAT-C models instrument observability constraints rather geophysical variables. Instruments selected for this report consist of infrared and microwave sounders and visual/infrared imagers.

# TAT-C Configuration
This report uses TAT-C version 3.1.3 which is currently in development on the main branch of GitHub. Technical documentation is available at ReadTheDocs.

To install TAT-C, clone the repository:

    git clone https://github.com/code-lab-org/tatc
and install the dependencies into a new environment (tatc_env) using conda:

    conda env create -f environment.yml
(note: dependency resolution can take upwards of 10 minutes). Activate the new environment when complete:

    conda activate tatc_env
and register the TAT-C library in "editable" mode (enables source changes, if desired):

    pip install -e .
This report also requires the following additional dependencies for parallel processing and interactive features of a Jupyter notebook:

    conda install ipython joblib pandarallel -c conda-forge
and a world country-level shapefile `ne_110m_admin_0_countries.zip` available from NaturalEarth (Select "Download countries" and save the .zip file in the same directory as this notebook).

#  Simulation Runtime Limitation
The TAT-C is a long runtime simulation analysis tool due to computing-intense tasks. As a result, TAT-C can be used for narrow trade space analysis, which evaluate the performance of a few misison architectures. This limitiation hinders mission designs that could maximize system performance while minimizing costs. 

This leads to the need for scalable approches that reduce the TAT-C's simulation runtime. The following section describes a scalable architecture that relies on the python Celery library, which computes tasks in parallel and asynchronously.

# Scalable Architecture

The image bellow represents the scalabe architectures developed in this report. There are three main components in the architecture: the TAT-C application, the backend and message broker, and the workers machine. The celery framework supports a message Rabbit MQ message broker that handles tasks through a task queue and that distributes tasks to worker machines, which execute the tasks. As workers complete tasks, they store the results in a database, the Redis database. Once all tasks are complete, TAT-C request the results to the Redis database and aggregates the results to compute coverage analysis. 

<img src='./images/Scalable_Architecture.png' align='center'/>
     

## TAT-C Application

This subsection shows how the TAT-C application breaks down a big task, such as the refresh analysis for a mission architecture, into small and independent tasks that can be executed in parallel. Additionally, the subsection describes the Celery workflows used in the architecture.  

### Celery Workflows

Celery process and handles data flows as shown in the figure below.

<img src='./images/celery_workflows.png' width='600' align='center'/>

* The chain workflow executes tasks in sequence, one after another. The results of each tasks are passed to the following task as the first argument. The output is a result value processed by the last task.
* The group workflow executes a set of tasks in parallel. The output is a list of results of each task.
* The chord workflow executes a task after a set of tasks computed in parallel. The results of the of the group tasks are passed to the last task as the first argument.The output is a result processed by the last task.

Note that the data trasnfered from one task to another needs to be serialized. Since tatc outputs geodataframes with coverage statistics, we need to serialize the geodataframes to pass results from one taks to the next ones.

### Grouping Small Tasks for Scalable Services

There are multiple ways to structure functions in tatc to provide coverage or data latency statistics using Celery's workflows. We determined that the following workflows offers a lower network latency. The diagram below shows the workflow implemented in this project. 

<img src='./images/run_coverage_analysis.jpg' width='600' align='center'/>

The `run_coverage_analysis_task` performs coverage analysis for a constellation over one target point in the grid. The result of the group workflow is a list of coverage analysis for all points in the grid. The results from the group workflows are passed to the `merge_feature_collection` task, which merges a list of geodataframes into one geoataframe.

##  Message Broker and Backend

This project utilzes the Rabbit MQ message broker and backend hosted in Amazon Web Servicies (AWS).  

## IP Data Transfer

The communication between message broker, backend, tatc appication, and machine workers is done by using the https protocol. The protocol encrypts the messages. The advantage of using the https protocol in the network is that it can connect workers that are distributed globally.

## Machine Workers
We built a docker image that builds a worker machine container. This container carries all dependencies needed to run a coverage analys using tatc. The image is publicly available in Docker Hub (josuetapia/parallelworker:latest). Alternatively, the image file is shown below:

In [None]:
# This block defines the TAT-C runtime container using the appropriate
# base Python environment.

FROM python:3.10 AS tatc_runtime

WORKDIR /var/tatc
COPY pyproject.toml ./
COPY src src
RUN python -m pip install . --no-cache-dir

# This block defines the TAT-C worker container. Using the TAT-C runtime
# container, it installs and starts the worker application.

FROM tatc_runtime AS tatc_worker

WORKDIR /var/tatc
RUN python -m pip install .[app] --no-cache-dir

COPY tatc_app tatc_app
COPY resources resources

ENV TATC_BROKER=amqp://tatc:tatc07030@tatc-test.code-lab.org:5671//
ENV TATC_BACKEND=redis://tatc07030@tatc-test.code-lab.org:6379/

CMD ["celery", "-A", "tatc_app.aws_worker", "worker", "--uid=nobody", "--gid=nogroup"]

The docker image file is located in the same directory as the tatc_app file, which contains the celery application initializer (`aws_worker` file) as well as coverage tasks. The celery application initilizer is shown below:

In [None]:
from celery import Celery
from skyfield.api import load
import ssl

app=Celery('tatc_app',
           broker='amqps://tatc:tatc07030@tatc-test.code-lab.org:5671//',
           broker_use_ssl= {
               "keyfile": None,
               "certfile":None,
               "ca_certs": None,
               "cert_reqs": ssl.CERT_NONE,
               },
           backend='rediss://:tatc07030@tatc-test.code-lab.org:6379/',
           redis_backend_use_ssl={
               "ssl_keyfile": None,
               "ssl_certfile": None,
               "ssl_ca_certs": None,
               "ssl_cert_reqs": ssl.CERT_NONE,
               },
           include=['tatc_app.latency_tasks', 'tatc_app.coverage_tasks']
    )

load("de421.bsp")
if __name__=='__main__':
    app.start()

The following script shows the underlying tatc functions that support the `run_coverage_analysis_task` and the `merge_feature_collection_task` 

In [None]:
from datetime import datetime, timedelta
import geopandas as gpd
from geojson_pydantic import FeatureCollection
import json
from itertools import chain
import pandas as pd
from tatc.schemas.instrument import Instrument
from tatc.schemas.point import Point
from tatc.schemas.satellite import Satellite
from tatc.analysis.coverage import (
    collect_multi_observations,
    aggregate_observations,
    reduce_observations,
    grid_observations,
)

#from .schemas import CoverageAnalysisResult
from .aws_worker import app
@app.task
def run_coverage_analysis_task(
    point: str, satellites: list, start: str, end: str
) -> str:
    """
    Task to run coverage analysis.

    Args:
        point (str): JSON serialized :class:`tatc.schemas.Point` object.
        satellites (list): List of JSON serialized :class:`tatc.schemas.Satellite` objects.
        start (str): ISO 8601 serialized start time.
        end (str): ISO 8601 serialized end time.

    Returns:
        str: GeoJSON serialized `FeatureCollection` containing coverage analysis.
    """
    # call analysis function, parsing the serialized arguments
    results = reduce_observations(
        aggregate_observations(
            collect_multi_observations(
                Point.parse_raw(point),
                [Satellite.parse_raw(satellite) for satellite in satellites],
                datetime.fromisoformat(start),
                datetime.fromisoformat(end),
            )
        )
    )
    # re-serialize constituent data
    results["access"] = results["access"].apply(lambda t: t/timedelta(hours=1))
    results["revisit"] = results["revisit"].apply(lambda t: t/timedelta(hours=1))
    return results.to_json(show_bbox=False, drop_id=True)

@app.task
def merge_feature_collections_task(collections: list) -> str:
    """
    Task to merge a list of feature collections into a single feature collection.

    Args:
        collections (list): GeoJSON serialized list of feature collections.

    Results:
        str: GeoJSON serialized feature collection.
    """
    return FeatureCollection(
        type="FeatureCollection",
        features=list(
            chain(
                *list(
                    FeatureCollection.model_validate_json(collection).features
                    for collection in collections
                )
            )
        ),
    ).model_dump_json()