# Project Direction -- Connecting the Dots

After exploring both the CDM training data and the live TLE orbital catalog, here's where I'm at with this project. Putting my thoughts together before I start building models.

## The problem

There are ~30,000 tracked objects in Earth orbit. Most are dead satellites, rocket bodies, and debris from collisions. When two objects are predicted to pass close to each other, the tracking network generates conjunction data messages (CDMs) -- a series of increasingly refined predictions as the closest approach gets nearer.

The question is: **given a sequence of CDMs for a conjunction event, can we predict whether it's genuinely dangerous (requiring an avoidance maneuver) or a safe pass?**

This matters because:
- Operators get thousands of conjunction warnings per week
- Most are false alarms -- the objects pass safely
- But maneuvers are expensive (fuel, mission disruption, other conjunction risks)
- Missing a real threat could be catastrophic (Kessler Syndrome)

## What we're working with

**Training data (ESA Kelvins CDM Dataset)**:
- ~162k CDM records from ~13k conjunction events (2015-2019)
- 103 features per CDM including orbital elements, miss distance, covariance matrices
- Binary risk labels from ESA operators
- Heavily imbalanced (maybe ~2% high-risk events)

**Live data (CelesTrak TLEs)**:
- 27,000+ tracked objects available in JSON format, no API key
- Updated every 2 hours
- 17 orbital parameters per object
- Can compute any object's position at any time using SGP4 propagation

**Satellite metadata (UCS Database)**:
- 7,500+ active satellites with operator, purpose, country, mass

## Three models I'm planning

### 1. Naive baseline -- orbital shell density prior

The simplest possible predictor: just look at what altitude the conjunction happens at. Busier altitude bands (like ~550 km where Starlink operates or ~780 km where Fengyun debris clusters) have higher baseline risk.

This should be terrible at distinguishing individual events but it sets the floor. If our ML models can't beat "just guess the average for this altitude", something is wrong.

### 2. XGBoost on engineered features

Take the latest CDM in each sequence plus some trend features (is miss distance converging? is uncertainty shrinking?) and throw it at gradient boosting. The ESA Kelvins challenge winners used something like this, so it should be competitive.

Key features I want to engineer:
- Latest miss distance + trend over last N CDMs
- Covariance shrink rate (uncertainty should decrease as TCA approaches)
- Relative velocity at closest approach
- Object types (payload vs debris -- debris can't maneuver)
- Time to TCA at prediction time

### 3. Physics-Informed Temporal Fusion Transformer

This is the ambitious one. Instead of flattening each event into a single feature vector, treat the CDM sequence as a time series and let a Transformer attend over it. The key innovation is adding a physics constraint to the loss function -- the model gets penalized if it predicts a miss distance that violates orbital mechanics.

The physics constraint works like this: given two objects' orbital elements, there's a minimum possible intersection distance (MOID) based on geometry alone. The actual miss distance can never be less than the MOID without a maneuver. So if the model predicts miss < MOID, that's physically impossible and it should be penalized.

This should help the model generalize better, especially on rare high-risk events where data is scarce. The physics constrains the hypothesis space so the model can't learn nonsense patterns.

## The experiment -- TLE staleness

Here's the part I think could actually be published. Satellite operators need to decide how often to request fresh tracking data. Fresh data is better but costs resources (radar time, bandwidth, compute). So the question is:

**How stale can your data be before conjunction predictions become unreliable?**

I can simulate this by truncating CDM sequences at different time horizons -- only using CDMs from 6 hours before TCA, 1 day before, 3 days before, etc. Then measure how each model's accuracy degrades.

I'm also planning a secondary experiment: does the physics constraint in the Transformer help? Compare the exact same architecture with and without the physics-informed loss. I expect the constrained model to degrade more gracefully with stale data because physics doesn't change -- even with old data, the orbital mechanics constraints are still valid.

## The app

For the interactive application, I want to build a 3D globe showing all tracked objects in real-time. This is the visual hook that makes the problem tangible -- you see the actual cloud of debris swarming around Earth.

Plan:
- **3D Earth with orbiting objects** (React Three Fiber + satellite.js)
  - Color-coded by type: green (active), yellow (rocket body), red (debris)
  - Click any object to see its info and upcoming conjunctions
  - Time slider to propagate forward/backward
- **Conjunction alert dashboard** (top-10 highest risk pairs)
- **Risk heatmap by altitude** (which shells are most crowded)
- **Model comparison view** (side-by-side predictions from all three models)

The orbital propagation happens entirely client-side in the browser using satellite.js -- no server needed for the visualization. The server is only called when you want ML predictions on a specific conjunction pair.

Hosting is free using the same infrastructure pattern from a previous project:
- React frontend on GitHub Pages
- FastAPI backend on GitHub Actions with Cloudflare tunnel
- Model weights on HuggingFace Hub
- Dual ping-pong servers for continuous uptime

## Implementation plan

Rough order of operations:

1. **Data pipeline** -- load CDMs, engineer features, build sequence datasets
2. **Baseline + XGBoost** -- get a working prediction pipeline end to end
3. **Transformer** -- build the PI-TFT, train overnight on GPU
4. **Experiments** -- staleness analysis + physics ablation
5. **Backend API** -- FastAPI with all three models
6. **Frontend 3D globe** -- the visual centerpiece
7. **Dashboard panels** -- alerts, charts, filters
8. **Deployment** -- GitHub Actions, Pages, HuggingFace
9. **Polish** -- mobile responsiveness, error handling, documentation

Let's get started.