# Solution - Learning Satellite orbit from data through  Perturbation forces (in RSW) - by PySR

In [51]:
import pysr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from src.utils import *
from src.propagation import *
import pickle

Load the CSV file with correct column extraction<br>
Assuming the first column is time, next three are position (r1, r2, r3), last three are velocity (v1, v2, v3)

In [52]:
df = pd.read_csv('./data/Challenge1.csv', header=0, usecols=[0,1,2,3,4,5,6], names=['time','r1', 'r2', 'r3', 'v1', 'v2', 'v3'])

Extract position and velocity arrays<br>
Convert time strings to pandas datetime

In [53]:
t_raw = pd.to_datetime(df['time'])
# Calculate seconds relative to the first time point
t = (t_raw - t_raw.iloc[0]).dt.total_seconds().values # Time in seconds relative to first time point
r = df[['r1', 'r2', 'r3']].values
v = df[['v1', 'v2', 'v3']].values

Copying in parameters (from `src/parameter.csv`)

In [54]:
mu = 3.986004414498200e14  # Central Body's gravitational constant (m^3/s^2) - from parameter.csv
Re = 6.378136460000000e6  # Central Body's equatorial radius (m) - from parameter.csv

**Here is an important step**:
To help the algorithm learn better, it is often a good idea to non-dimensionalise / rescale your data such that it is at order $\mathcal{O}(1)$. 

Alternatively, you may want to use PySR in-built dimension tool to aid the learning.

In [55]:
## Finding the typical time scale and length scale of the data
T_dim = np.sqrt(Re**3 / mu)  # Time unit for non-dimensionalization (s)
L_dim = Re  # Length unit for non-dimensionalization (m)

Convert the r,v representation to equinoctial

In [56]:
equi = ijk2equinoctial(r,v,mu)

 **Non-dimensionalize equinoctial elements** (see comment above)

In [57]:
equi[..., 0] = equi[..., 0] / L_dim  # Only the first element of equinoctial elements has the dimension of length. 
# The rest are dimensionless
t = t / T_dim  # Non-dimensional time

 Setting up a 2nd order finite difference matrix

In [58]:
from scipy.sparse import spdiags

In [59]:
N_pt = equi.shape[0]  # Number of points in the time series
dt = t[1] - t[0]
I  = spdiags(np.array([[0],[0],[0],[0],[0], [1],[0],[0],[0],[0],[0]]) @ np.ones((1,N_pt)) , np.array([0,1,2,3,4,5,6,7,8,9,10]), m = N_pt - 10, n = N_pt)
D1 = spdiags(np.array([[-1/1260], [5/504], [-5/84], [5/21], [-5/6], [0], [5/6], [-5/21], [5/84], [-5/504], [1/1260]]) @ np.ones((1,N_pt)) / dt, np.array([0,1,2,3,4,5,6,7,8,9,10]), m = N_pt - 10, n = N_pt)

 Applying to equinoctial elements

In [60]:
d_equi_dt = D1 @ equi # Time derivative of equinoctial elements 
equi_ = I @ equi # The equinoctial elements value at the time point of the derivative

The eqn. to transform  FR FS and FW into d (element)/ dt can be written as:

    d (element)/ dt = A(element,mu) * F_RSW(element) + b(element,mu)

The function A and b can be found in propagation.py
(hint: look at EquinoctialPropagation.ipynb or KeplerianPropagation.ipynb to see how the derivatives are computed from FR FS and FW)!

In [61]:
A_ = np.stack([RSW2equi_A(e, 1) for e in equi_])
b_ = np.stack([RSW2equi_b(e, 1) for e in equi_])

In [62]:
F_RSW_equi = np.array([
    # np.linalg.pinv(A_[i]) @ (d_equi_dt[i] - b_[i])
    np.linalg.lstsq(A_[i][:-1,...], d_equi_dt[i][:-1])[0]  # More efficient and stable than pinv
    for i in range(A_.shape[0])
])

In [63]:
print("F_RSW_equi shape:", F_RSW_equi.shape) 
print("equi_ shape:", equi_.shape)
print("d_equi_dt shape:", d_equi_dt.shape)

F_RSW_equi shape: (17271, 3)
equi_ shape: (17271, 6)
d_equi_dt shape: (17271, 6)


In [64]:
FR_tar = F_RSW_equi[:,0] 
FS_tar = F_RSW_equi[:,1]
FW_tar = F_RSW_equi[:,2]
RHS_in = equi_ 

Calculate extra inputs from equi_ (midpoint equinoctial elements)

In [65]:
p_ = equi_[:, 0]        
f_ = equi_[:, 1]        
g_ = equi_[:, 2]        
h_ = equi_[:, 3]        
k_ = equi_[:, 4]       
L_ = equi_[:, 5]        

While PySR can learn directly the formula for $F_R$, $F_S$, $F_W$ as function of ($p$,$f$,$g$,$h$,$k$,$L$), it is often more helpful to incorperate prior knowledge to aid the learning. 

Below codify a few functions of ($p$,$f$,$g$,$h$,$k$,$L$) that is going to be helpful.

In [66]:
s_ = 1 + h_**2 + k_**2  # s = 1 + h^2 + k^2 = sec^2(i/2)
w_ = 1 + f_ * np.cos(L_) + g_ * np.sin(L_)  # w = 1 + f*cos(L) + g*sin(L)
r_val_ = p_ / w_         # r = p / w

In [67]:
sin_u_sin_i_ = 2 * (h_ * np.sin(L_) - k_ * np.cos(L_)) / s_      # sin(u)*sin(i)
cos_u_sin_i_ = 2 * (h_ * np.cos(L_) + k_ * np.sin(L_)) / s_      # cos(u)*sin(i)
cos_i_ = (1 - h_**2 - k_**2) / s_                                # cos(i)

Stack all features together for PySR input

In [None]:
RHS_in = np.column_stack([
    r_val_**(-4), sin_u_sin_i_,
    cos_u_sin_i_, cos_i_
])
# You may also want to try a larger feature set, e.g.,
# RHS_in = np.column_stack([p_, f_, g_, h_, k_, L_, s_, w_, r_val_, sin_u_sin_i_, cos_u_sin_i_, cos_i_, mu, Re])

Set up PySR symbolic regression model parameters (control in one place)

In [69]:
pysr_params = dict(
    niterations=200,                      # Number of iterations for symbolic search
    binary_operators=["+","*"], # Allowed binary operators
    # unary_operators=["sin", "cos", "sqrt"], # Allowed unary operators
    populations=5,                        # Number of populations
    population_size=50,                   # Population size per generation
    maxsize=25,                            # Maximum expression size
    maxdepth=4,                            # Maximum expression tree depth
)

Model for FR

In [70]:
model_FR = pysr.PySRRegressor(**pysr_params)
model_FR.fit(RHS_in, FR_tar)
print("Best equations for FR:")
print(model_FR)

Compiling Julia backend...
[ Info: Note: you are running with more than 10,000 datapoints. You should consider turning on batching (`options.batching`), and also if you need that many datapoints. Unless you have a large amount of noise (in which case you should smooth your dataset first), generally < 10,000 datapoints is enough to find a functional form.
[ Info: Started!



Expressions evaluated per second: 2.680e+04
Progress: 97 / 1000 total iterations (9.700%)
════════════════════════════════════════════════════════════════════════════════════════════════════
───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           1.662e-02  0.000e+00  y = 0.032106
3           9.595e-03  2.746e-01  y = x₀ * x₀
5           4.248e-03  4.074e-01  y = (x₁ * x₀) * x₁
7           1.557e-03  5.017e-01  y = (x₃ + 0.35538) * (x₀ * -12.17)
9           9.613e-04  2.412e-01  y = (x₀ * -2.3923) * ((x₂ * x₂) + -0.54311)
11          4.654e-05  1.514e+00  y = ((x₁ * x₁) + (x₃ * 0.91754)) * (x₀ * 2.6446)
13          4.177e-05  5.412e-02  y = ((x₀ * 0.41114) + (x₀ * -2.3923)) * ((x₂ * x₂) + -0.54...
                                      311)
───────────────────────────────────────────────────────────────────────────────────────────────────
══════════════════════════════════════════════════

[ Info: Final population:
[ Info: Results saved to:


Best equations for FR:
PySRRegressor.equations_ = [
	   pick      score                                           equation  \
	0         0.000000                                         0.03210611   
	1         0.274561                                            x0 * x0   
	2         0.407405                                     (x1 * x0) * x1   
	3         0.666627               ((x3 * -12.167491) + -4.436855) * x0   
	4         0.522903                ((x1 * x1) + x3) * (x0 * 2.6445718)   
	5  >>>>  12.893233           (x0 * -0.75) + ((x1 * x0) * (x1 * 2.25))   
	6         0.000252  ((x0 * -0.75) + 3.696897e-9) + ((x1 * x0) * (x...   
	
	           loss  complexity  
	0  1.661541e-02           1  
	1  9.594684e-03           3  
	2  4.247793e-03           5  
	3  1.119795e-03           7  
	4  3.935053e-04           9  
	5  2.489033e-15          11  
	6  2.487776e-15          13  
]
  - outputs/20250924_164928_cKYzBW/hall_of_fame.csv


Model for FS

In [71]:
model_FS = pysr.PySRRegressor(**pysr_params)
model_FS.fit(RHS_in, FS_tar)
print("Best equations for FS:")
print(model_FS)

[ Info: Note: you are running with more than 10,000 datapoints. You should consider turning on batching (`options.batching`), and also if you need that many datapoints. Unless you have a large amount of noise (in which case you should smooth your dataset first), generally < 10,000 datapoints is enough to find a functional form.
[ Info: Started!



Expressions evaluated per second: 3.780e+04
Progress: 130 / 1000 total iterations (13.000%)
════════════════════════════════════════════════════════════════════════════════════════════════════
───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           6.858e-03  0.000e+00  y = -0.00010568
3           6.847e-03  7.743e-04  y = x₂ * 0.0049519
5           3.713e-03  3.060e-01  y = (x₂ * x₁) * -0.18369
7           1.516e-16  1.541e+01  y = (x₀ * (x₁ * x₂)) * -1.5
───────────────────────────────────────────────────────────────────────────────────────────────────
════════════════════════════════════════════════════════════════════════════════════════════════════
Press 'q' and then <enter> to stop execution early.

Expressions evaluated per second: 3.800e+04
Progress: 267 / 1000 total iterations (26.700%)
═════════════════════════════════════════════════════════════════════════════════════════════

[ Info: Final population:
[ Info: Results saved to:



Expressions evaluated per second: 3.620e+04
Progress: 884 / 1000 total iterations (88.400%)
════════════════════════════════════════════════════════════════════════════════════════════════════
───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           6.858e-03  0.000e+00  y = -0.00010568
3           6.847e-03  7.743e-04  y = x₂ * 0.0049519
5           3.713e-03  3.060e-01  y = (x₂ * x₁) * -0.18369
7           1.516e-16  1.541e+01  y = (x₀ * (x₁ * x₂)) * -1.5
15          1.499e-16  1.458e-03  y = ((x₁ * -1.2265) * (x₀ * x₂)) + ((x₁ * x₀) * (x₂ * -0.2...
                                      7352))
───────────────────────────────────────────────────────────────────────────────────────────────────
════════════════════════════════════════════════════════════════════════════════════════════════════
Press 'q' and then <enter> to stop execution early.

Expressions evaluated per second: 3.500e+04


Model for FW

In [72]:
model_FW = pysr.PySRRegressor(**pysr_params)
model_FW.fit(RHS_in, FW_tar)
print("Best equations for FW:")
print(model_FW)

[ Info: Note: you are running with more than 10,000 datapoints. You should consider turning on batching (`options.batching`), and also if you need that many datapoints. Unless you have a large amount of noise (in which case you should smooth your dataset first), generally < 10,000 datapoints is enough to find a functional form.
[ Info: Started!



Expressions evaluated per second: 3.510e+04
Progress: 123 / 1000 total iterations (12.300%)
════════════════════════════════════════════════════════════════════════════════════════════════════
───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           6.979e-03  0.000e+00  y = -0.0045701
3           2.433e-03  5.268e-01  y = x₁ * x₀
5           4.906e-05  1.952e+00  y = x₀ * (x₁ * 0.63064)
7           2.571e-16  1.299e+01  y = ((x₃ * x₀) * x₁) * -1.5
───────────────────────────────────────────────────────────────────────────────────────────────────
════════════════════════════════════════════════════════════════════════════════════════════════════
Press 'q' and then <enter> to stop execution early.

Expressions evaluated per second: 3.500e+04
Progress: 253 / 1000 total iterations (25.300%)
════════════════════════════════════════════════════════════════════════════════════════════════════
─

[ Info: Final population:
[ Info: Results saved to:


───────────────────────────────────────────────────────────────────────────────────────────────────
Complexity  Loss       Score      Equation
1           6.979e-03  0.000e+00  y = -0.0045701
3           2.433e-03  5.268e-01  y = x₁ * x₀
5           4.906e-05  1.952e+00  y = x₀ * (x₁ * 0.63064)
7           2.571e-16  1.299e+01  y = ((x₃ * x₀) * x₁) * -1.5
───────────────────────────────────────────────────────────────────────────────────────────────────
Best equations for FW:
PySRRegressor.equations_ = [
	   pick      score                 equation          loss  complexity
	0         0.000000            -0.0045700823  6.978936e-03           1
	1         0.526796                  x1 * x0  2.433437e-03           3
	2         1.952038    x0 * (x1 * 0.6306385)  4.905707e-05           5
	3  >>>>  12.987334  ((x3 * x0) * x1) * -1.5  2.570671e-16           7
]
  - outputs/20250924_165058_ZVzksr/hall_of_fame.csv


Save trained models for later use

In [None]:
with open("PySRmodel_FR.pkl", "wb") as f:
    pickle.dump(model_FR, f)
with open("PySRmodel_FS.pkl", "wb") as f:
    pickle.dump(model_FS, f)
with open("PySRmodel_FW.pkl", "wb") as f:
    pickle.dump(model_FW, f)

To propagate the model and compare the result with the correct answer, we can open the above saved models in the `EquinoctialPropagation.ipynb` and propagate the learnt model.