# Orbital Decay Time Prediction

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv('space_decay_wpca.csv')

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14125 entries, 0 to 14124
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   OBJECT_NAME            14125 non-null  object 
 1   OBJECT_ID              14125 non-null  object 
 2   EPOCH                  14125 non-null  object 
 3   INCLINATION            14125 non-null  float64
 4   RA_OF_ASC_NODE         14125 non-null  float64
 5   ARG_OF_PERICENTER      14125 non-null  float64
 6   MEAN_ANOMALY           14125 non-null  float64
 7   NORAD_CAT_ID           14125 non-null  int64  
 8   REV_AT_EPOCH           14125 non-null  int64  
 9   BSTAR                  14125 non-null  float64
 10  MEAN_MOTION_DOT        14125 non-null  float64
 11  MEAN_MOTION_DDOT       14125 non-null  float64
 12  OBJECT_TYPE            14125 non-null  int64  
 13  RCS_SIZE               14125 non-null  int64  
 14  COUNTRY_CODE           14125 non-null  object 
 15  LA

## Feature Selection

> In the context of predicting orbital decay, I initially performed a mutual information (MI) analysis to measure the dependency between each feature and the target variable. MI is a non-parametric method that captures both linear and nonlinear relationships between variables. The following features were selected based on their relatively high mutual information scores, indicating that they each contribute unique and significant information regarding the decay behavior of satellites.

'BSTAR','CROSS_SECTIONAL_AREA','MEAN_MOTION_DOT', 'DRAG_EFFECTIVE_AREA','APOPERI_RATIO','REV_AT_EPOCH','INCLINATION', (PC1,PC2)

In [5]:
from sklearn.feature_selection import RFE
from catboost import CatBoostRegressor
from sklearn.model_selection import cross_val_score

In [8]:
X = df.drop(columns=['ORBITAL_DECAY_TIME','EPOCH','OBJECT_ID','OBJECT_NAME','COUNTRY_CODE','SITE','ESTIMATED_DECAY_EPOCH'])
Y = df['ORBITAL_DECAY_TIME']

model = CatBoostRegressor(verbose=0, n_estimators=100)

cv_scores = []
feature_counts = range(1, X.shape[1] + 1)
selected_features_list = []
feature_rankings_list = []

for n_features in feature_counts:
    rfe = RFE(estimator=model, n_features_to_select=n_features)
    X_reduced = rfe.fit_transform(X, Y)

    scores = cross_val_score(model, X_reduced, Y, cv=4, scoring='neg_mean_squared_error')
    cv_scores.append(np.mean(scores))

    selected_features_list.append(X.columns[rfe.support_])
    feature_rankings_list.append(rfe.ranking_)

optimal_features = feature_counts[np.argmax(cv_scores)]
optimal_selected_features = selected_features_list[np.argmax(cv_scores)]
optimal_feature_rankings = feature_rankings_list[np.argmax(cv_scores)]

print(f"Optimal number of features: {optimal_features}")
print(f"Selected features: {list(optimal_selected_features)}")
print(f"Feature rankings: {optimal_feature_rankings}")

Optimal number of features: 5
Selected features: ['BSTAR', 'MEAN_MOTION_DOT', 'APOPERI_RATIO', 'DRAG_EFFECTIVE_AREA', 'PC2']
Feature rankings: [ 6 11  5  3  9  4  1  1  2 10 13 12  1  8  1  7  1]


> To ensure both statistical relevance and predictive performance in modeling orbital decay time, I performed feature selection using Recursive Feature Elimination (RFE) with a CatBoost regressor, a gradient boosting model known for handling non-linear relationships and categorical features effectively. While any machine learning model could be employed for feature selection, CatBoost was chosen for its robustness and interpretability in tabular data. The model identified five optimal features which consistently yielded the lowest mean squared error across cross-validation folds. Notably, all selected features also demonstrated high mutual information with the target variable, confirming their strong statistical dependency and physical relevance in the orbital decay process. Therefore, moving forward, these five features will be used as the core input set for model development and evaluation.

'BSTAR', 'MEAN_MOTION_DOT', 'APOPERI_RATIO', 'DRAG_EFFECTIVE_AREA', 'PC2'

In [10]:
optimal_selected_features=['BSTAR', 'MEAN_MOTION_DOT', 'APOPERI_RATIO', 'DRAG_EFFECTIVE_AREA', 'PC2']

## Model Training

In [9]:
import os
import json
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
import autogluon.tabular as ag

  from .autonotebook import tqdm as notebook_tqdm
