### Imports

We start with importing all necessary libraries, setting the current working directory and defining the relevant physical constant such as the _mass of the earth_ $M_{earth}$, Newton's _gravitational constant_ $G$ and the _standard gravitational parameter_ $\mu \approx G \cdot M_{earth}$ (wher in the approximation we omit the mass of the satellite since $M_{earth} + M_{sat}\approx M_{earth}$) 

In [1]:
# PROCESSING DATA IMPORTS
import numpy as np
import pandas as pd
from datetime import datetime
from datetime import timedelta
import scipy.constants
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# SYSTEM 
import os
import sys
from pathlib import Path
import time

# SPECIFY WORKING DIRECTORY (LOCATION OF SATELLITE .csv FILES)
cwd = Path(os.getcwd())

# VISUALISATION
import matplotlib.pyplot as plt
%matplotlib inline

# IMPORT UTILITY FUNCIONS
sys.path.append(str(cwd.parent)+'/src/') # append path to ./src/ for following imports
import kepler_utils as kutls
import ana_utils as autls
import preprocessing_utils as preputls

# ML IMPORTS
from sklearn.model_selection import TimeSeriesSplit
from sklearn.preprocessing import MinMaxScaler

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

import statsmodels.tsa.stattools as sts
import statsmodels.graphics.tsaplots as sgt

import pmdarima as pm

from scipy.optimize import curve_fit



# CONSTANTS 
M_earth = 5.972E24 # earth mass [kg]
G = scipy.constants.G # grav constant [m^3 s^{-2} kg^{-1}]
mu = G*M_earth # standard grav parameter for m (mass moving object) << M_earth [m^3 s^{-2}]  

print('imports complete.')

imports complete


### Loading and Preprocessing Data

The script _download_DORIS_data.py_ downloads the data for a chosen satellite in a chosen time frame and for chosen control centers (which have computed the state vector from the raw observational data) and returns a single .csv-file. 

After the .csv-file is loaded, it contains duplicated time indices because the data is seperated into several .Z-files (to make downloads feasible) with overlapping starting and ending times of the observations. These indices have to be dropped.

Moreover, the original data is provided in a _SP3 format_ which records positions in **km** and velocities in **dm/s**. 
To be conform with the phyisical constant we have defined, we convert these units into **m** and **m/s**.

For the sake of readibility of the code, we create a single pipeline to load and preprocess the data (as above) using `sklearn.pipeline.Pipeline` class.

In [11]:
sat_path = str(cwd.parent)+'/sat/s6assa2024/s6assa_20_24.csv'

if not os.path.isfile(sat_path):
    raise Exception('Indicated path does not point to a valid file')

# DEFINE CUSTOM TRANSFORMERS
load_sat = preputls.LoadSingleSat(path=sat_path)
drop_dupl_idx = preputls.DropDuplIdx()
convert_units = preputls.ConvertUnits()

# BUILD PIPELINE FOR PREPROCESSING
prep_pipeline = Pipeline(
    steps=[
    ('load_sat', load_sat),                 # load satellite data
    ('drop_duplicated_idx', drop_dupl_idx), # drop duplicated indices
    ('convert_units',convert_units)         # convert units 
])

# LOAD AND PREPROCESS SATELLITE DATA 
s6ssa = prep_pipeline.fit_transform(None)

print('\n loading of satellite data complete.')


 loading of satellite data complete.


Next, we load the scheduled manoeuvres obtained from the _download_maneuver_schedule.py_ script and filter the data for the manoeuvres scheduled for _Sentinel 6a_ between 01/01/2020 and 12/31/2024.

In [10]:
# LOADING LIST OF PRE-SCHEDULED MANEUVERS
path_ref = str(cwd.parent)+'/ref/'

if os.path.isdir(path_ref):
    path_list_man = path_ref + '/manoeuvres_schedule.csv'
    if os.path.isfile(path_list_man):
           manoeuvres = pd.read_csv(path_list_man,index_col=0)
    else: 
        raise Exception(f'No file named maneuvers_schedule.csv found.')
else:
    raise Exception(f'No directory named {path_ref} found.')
   
manoeuvres.start = pd.to_datetime(manoeuvres.start)
manoeuvres.end = pd.to_datetime(manoeuvres.end)

mans6a2024 = manoeuvres[(manoeuvres.sat_id == 's6a') & (manoeuvres.start.dt.year.isin(range(2020,2025)))].drop(['sat_id','end'],axis=1).set_index('start').sort_index()

print('loading manoeuvres complete.')

loading manoeuvres complete
