---
title: IDs from Juno
format:
  html:
    code-fold: true
output-file: juno.html

---

## Background

Spacecraft-Solar equatorial

https://pds-ppi.igpp.ucla.edu/data/JNO-SS-3-FGM-CAL-V1.0/INDEX/INDEX.TAB

### Coordinate System of Data

1. **SE (Solar Equatorial)**
    - Code: `se`
    - Resampling options: 
        - Number of seconds (1 or 60): `se_rN[N]s`
        - Resampled 1 hour: `se_r1h`

2. **PC (Planetocentric)**
    - Code: `pc`
    - Resampling options: 
        - Number of seconds (1 or 60): `pc_rN[N]s`
        
3. **SS (Sun-State)**
    - Code: `ss`
    - Resampling options: 
        - Number of seconds (1 or 60): `ss_rN[N]s`
        
4. **PL (Payload)**
    - Code: `pl`
    - Resampling options: 
        - Number of seconds (1 or 60): `pl_rN[N]s`


```txt
------------------------------------------------------------------------------
Juno Mission Phases                                                           
------------------------------------------------------------------------------
Start       Mission                                                           
Date        Phase                                                             
==============================================================================
2011-08-05  Launch                                                            
2011-08-08  Inner Cruise 1                                                    
2011-10-10  Inner Cruise 2                                                    
2013-05-28  Inner Cruise 3                                                    
2013-11-05  Quiet Cruise                                                      
2016-01-05  Jupiter Approach                                                  
2016-06-30  Jupiter Orbital Insertion                                         
2016-07-05  Capture Orbit                                                     
2016-10-19  Period Reduction Maneuver                                         
2016-10-20  Orbits 1-2                                                        
2016-11-09  Science Orbits                                                    
2017-10-11  Deorbit
```

```txt
File Naming Convention                                                        
==============================================================================
Convention:                                                                   
   fgm_jno_LL_CCYYDDDxx_vVV.ext                                               
Where:                                                                        
   fgm - Fluxgate Magnetometer three character instrument abbreviation        
   jno - Juno                                                                 
    LL - CODMAC Data level, for example, l3 for level 3                       
    CC - The century portion of a date, 20                                    
    YY - The year of century portion of a date, 00-99                         
   DDD - The day of year, 001-366                                             
    xx - Coordinate system of data (se = Solar equatorial, ser = Solar        
         equatorial resampled, pc = Planetocentric, ss = Sun-State,           
         pl = Payload)                                                        
     v - separator to denote Version number                                   
    VV - version                                                              
   ext - file extension (sts = Standard Time Series (ASCII) file, lbl = Label 
         file)                                                                
Example:                                                                      
   fgm_jno_l3_2014055se_v00.sts    
```

There are three principal coordinate systems used to represent the data in this archive. The SE coordinate system is a Spacecraft- Solar equatorial system and it will be used for cruise data only. The sun-state (ss) and planetocentric (pc) will be used for Earth Fly By (EFB) and Jupiter orbital data. Cartesian representations are used for all three coordinate systems. These coordinate systems are specified relative to a “target body” which may be any solar system object (but for this orbital operations will Jupiter). In what follows we will reference Jupiter as the target body, but, for example, if observations near a satellite (such as Io) are desired in Io-centric coordinates, the satellite Io may be specified as the target body. 

The SE coordinate system is defined using the sun-spacecraft vector as the primary reference vector; sun’s rotation axis as the secondary reference vector (z). The x axis lies along the sun-spacecraft vector, the z axis is in the plane defined by the Sun’s rotation axis and the spacecraft-sun vector. The y axis completes the system.

The ss coordinate system is defined using the instantaneous Jupiter-Sun vector as the primary reference vector (x direction). The X-axis lies along this vector and is taken to be positive toward the Sun. The Jupiter orbital velocity vector is the second vector used to define the coordinate system; the y axis lies in the plane determined by the Jupiter-Sun vector and the velocity vector and is orthogonal to the x axis (very nearly the negative of the velocity vector). The vector cross product of x and y yields a vector z parallel to the northward (upward) normal of the orbit plane of Jupiter. This system is sometimes called a sun-state (ss) coordinate system since its principal vectors are the Sun vector and the Jupiter state vector.

## Setup

Need to run command in shell first as `pipeline` is project-specific command

```{sh}
kedro pipeline create juno
```

To get candidates data, run `kedro run --from-inputs=jno.feature_1s --to-outputs=candidates.jno_1s`

In [None]:
#| default_exp pipelines/juno/pipeline

In [None]:
#| hide
%load_ext autoreload
%autoreload 2

In [None]:
#| code-summary: import all the packages needed for the project
#| export
#| output: hide
from ids_finder.core import extract_features
from fastcore.utils import *
from fastcore.test import *

import polars as pl
import pandas as pd

from loguru import logger

from typing import Callable


#### `Kerdo`

In [None]:
#| export
from kedro.pipeline import Pipeline, node
from kedro.pipeline.modular_pipeline import pipeline

In [None]:
#| eval: false
from ids_finder.utils.basic import load_catalog
catalog = load_catalog()
catalog.list()

## Dataset Overview

### Index

In [None]:
pds_dir = "https://pds-ppi.igpp.ucla.edu/data"

possible_coords = ["se", "ser", "pc", "ss", "pl"]
possible_exts = ["sts", "lbl"]
possible_data_rates = ["1s", "1min", "1h"]

juno_ss_config = {
    "DATA_SET_ID": "JNO-SS-3-FGM-CAL-V1.0",
    "FILE_SPECIFICATION_NAME": "INDEX/INDEX.LBL",
}

juno_j_config = {
    "DATA_SET_ID": "JNO-J-3-FGM-CAL-V1.0",
    "FILE_SPECIFICATION_NAME": "INDEX/INDEX.LBL",
}

#### Process index

In [None]:
#| export
import pandas
import pdpipe as pdp

In [None]:
#| export
def process_jno_index(df: pandas.DataFrame):
    
    _index_time_format = "%Y-%jT%H:%M:%S.%f"
    
    df.columns = df.columns.str.replace(" ", "")
    jno_index_pipeline = pdp.PdPipeline(
        [
            pdp.ColDrop(["PRODUCT_ID", "CR_DATE", "PRODUCT_LABEL_MD5CHECKSUM"]),
            pdp.ApplyByCols("SID", str.rstrip),
            pdp.ApplyByCols("FILE_SPECIFICATION_NAME", str.rstrip),
            pdp.ColByFrameFunc(
                "START_TIME",
                lambda df: pandas.to_datetime(df["START_TIME"], format=_index_time_format),
            ),
            pdp.ColByFrameFunc(
                "STOP_TIME",
                lambda df: pandas.to_datetime(df["STOP_TIME"], format=_index_time_format),
            ),
            # pdp.ApplyByCols(['START_TIME', 'STOP_TIME'], pandas.to_datetime, format=_index_time_format), # NOTE: This is slow
        ]
    )
    
    return jno_index_pipeline(df)


#### Pipleline

In [None]:
#| export
from kedro.pipeline import pipeline, node

In [None]:
#| export
def create_jno_index_pipeline():
    jno_index_pipeline = pipeline([
        node(process_jno_index, inputs="raw_JNO_SS_index", outputs="JNO_SS_index"),
        node(process_jno_index, inputs="raw_JNO_J_index", outputs="JNO_J_index"),
        node(lambda x1, x2: pandas.concat([x1, x2]), inputs=["JNO_SS_index", "JNO_J_index"], outputs="JNO_index")
    ])
    return jno_index_pipeline

In [None]:
raw_JNO_SS_index = catalog.load('raw_JNO_SS_index')
raw_JNO_J_index = catalog.load('raw_JNO_J_index')
jno_index = catalog.load('JNO_index')

jno_ss_index = jno_index[lambda df: df["DATA_SET_ID"] == "JNO-SS-3-FGM-CAL-V1.0"]
jno_j_index  = jno_index[lambda df: df["DATA_SET_ID"] == "JNO-J-3-FGM-CAL-V1.0"]

#### Check the data

In [None]:
#| echo: false
starting_date = jno_ss_index['START_TIME'].min().date()
ending_date = jno_ss_index['STOP_TIME'].max().date()

print(f"JNO-SS Starting date: {starting_date}")
print(f"JNO-SS Ending date: {ending_date}")

starting_date = jno_j_index['START_TIME'].min().date()
ending_date = jno_j_index['STOP_TIME'].max().date()
print(f"JNO-J Starting date: {starting_date}")
print(f"JNO-J Ending date: {ending_date}")

JNO-SS Starting date: 2011-08-25
JNO-SS Ending date: 2016-06-29
JNO-J Starting date: 2016-07-07
JNO-J Ending date: 2022-12-15


In [None]:
#| echo: false
available_dates = pandas.concat([jno_ss_index['START_TIME'].dt.date, jno_ss_index['STOP_TIME'].dt.date]).unique()
full_year_range = pandas.date_range(start=starting_date, end=ending_date)

missing_dates = full_year_range[~full_year_range.isin(available_dates)]

if len(missing_dates) == 0:
    print(f"No days are missing.")
else:
    print(f"The following days are missing")
    print(coll_repr(missing_dates.map(lambda x: x.strftime("%Y-%m-%d"))))

The following days are missing
(#2353) ['2016-07-07','2016-07-08','2016-07-09','2016-07-10','2016-07-11','2016-07-12','2016-07-13','2016-07-14','2016-07-15','2016-07-16'...]


### Magnetic field data

#### Downloading data 

In [None]:
#| export
import pooch
from pooch import Unzip

In [None]:
# | export
time_resolutions = ['1sec', '1min']

def download_mag_data(
    start: str = None,
    end: str = None,
    ts: str = '1sec',  # time resolution
):
    base_url = 'https://pds-ppi.igpp.ucla.edu/ditdos/download?id=pds://PPI/JNO-SS-3-FGM-CAL-V1.0/DATA/CRUISE/SE'
    files = pooch.retrieve(
        url=f"{base_url}/{ts.upper()}",
        known_hash=None,
        path = "../data/01_raw/",
        processor=Unzip(extract_dir=f"jno_ss_se_{ts}")
    )
    return files
    

#### Preprocessing data

Convert all files from `lbl` format to `parquet` format for faster processing

In [None]:
#| export
from ids_finder.utils.basic import concat_partitions

In [None]:
# | export
def preprocess_mag_data(raw_data: Dict[str, Callable]) -> pl.DataFrame:
    """
    Preprocess the raw dataset (only minor transformations)

    - Applying naming conventions for columns
    - Parsing and typing data
    - Changing storing format (from `lbl` to `parquet`)
    - Dropping useless columns
    """

    df = concat_partitions(raw_data)
    df_pl = (
        pl.from_dataframe(df)
        .lazy()
        .with_columns(time=pl.col("SAMPLE UTC").str.to_datetime("%Y %j %H %M %S %f"))
        .drop(["SAMPLE UTC", "DECIMAL DAY", "INSTRUMENT RANGE"])
        .sort("time")
        .collect()
    )
    return df_pl

#### Processing data

In [None]:
# | export
from ids_finder.utils.basic import partition_data_by_year

In [None]:
#| export
def process_mag_data(
    raw_data: pl.DataFrame,
    ts: str = None,  # time resolution
    coord: str = None,
) -> pl.DataFrame | Dict[str, pl.DataFrame]:
    """
    Partitioning data, for the sake of memory
    """
    return partition_data_by_year(raw_data)

#### Pipeline

In [None]:
# | export
def create_mag_data_pipeline(
    sat_id,
    ts: str = "1s",  # time resolution,
    tau: str = "60s",  # time window
    **kwargs,
) -> Pipeline:
    node_download_mag_data = node(
        download_mag_data,
        inputs=dict(
            start="params:start_date",
            end="params:end_date",
        ),
        outputs=f"raw_mag_{ts}",
        name=f"download_{sat_id.upper()}_magnetic_field_data",
    )

    node_preprocess_data = node(
        preprocess_mag_data,
        inputs=dict(
            raw_data=f"raw_mag_{ts}",
        ),
        outputs=f"inter_mag_{ts}",
        name=f"preprocess_{sat_id.upper()}_magnetic_field_data",
    )
    
    node_process_data = node(
        process_mag_data,
        inputs=f"inter_mag_{ts}",
        outputs=f"primary_mag_rtn_{ts}",
        name=f"process_{sat_id.upper()}_magnetic_field_data",
    )
    
    node_extract_features = node(
        extract_features,
        inputs=[f"primary_mag_rtn_{ts}", "params:tau", "params:extract_params"],
        outputs=f"feature_tau_{tau}",
        name=f"extract_{sat_id}_features",
    )

    nodes = [
        node_download_mag_data,
        node_preprocess_data,
        node_process_data,
        node_extract_features,
    ]

    pipelines = pipeline(
        nodes,
        namespace=sat_id,
        parameters={
            "params:tau": "params:tau",
            "params:extract_params": "params:jno_1s_params",
        },
    )
    return pipelines

## Processing the whole data

In [None]:
#| export
from ids_finder.candidates import create_candidate_pipeline
#| export
def create_pipeline(**kwargs) -> Pipeline:
    # create_jno_index_pipeline()
    sat_id = "jno"
    return create_mag_data_pipeline(sat_id) + create_candidate_pipeline(sat_id)

## Test

In [None]:
jno_ss_se_1s = catalog.load('primary_jno_ss_se_1s')
jno_1s_params = catalog.load('params:jno_1s_params')
candidates_jno_ss_se_1s = catalog.load('candidates_jno_ss_se_1s')

### Estimate

1 day of data resampled by 1 sec is about 12 MB.

So 1 year of data is about 4 GB, and 6 years of JUNO Cruise data is about 24 GB.

Downloading rate is about 250 KB/s, so it will take about 3 days to download all the data.

In [None]:
num_of_files = 6*365
jno_file_size = 12e3
thm_file_size = 40e3
files_size = jno_file_size + thm_file_size
downloading_rate = 250
processing_rate = 1/60

time_to_download = num_of_files * files_size / downloading_rate / 60 / 60
space_required = num_of_files * files_size / 1e6
time_to_process = num_of_files / processing_rate / 60 / 60

print(f"Time to download: {time_to_download:.2f} hours")
print(f"Disk space required: {space_required:.2f} GB")
print(f"Time to process: {time_to_process:.2f} hours")

Time to download: 126.53 hours
Disk space required: 113.88 GB
Time to process: 36.50 hours
