
# 🛰️ Satellite Telemetry Data Analysis Tool (Jupyter Notebook)

**Goal:** Parse, clean, visualize, and generate simple status reports from satellite-like telemetry data.

**Dataset options:**  
- **Recommended (real telemetry):** *ESA OPS-SAT 1 Re-entry UHF Telemetry* (CSV inside the ZIP). After downloading, set `data_file` to the extracted CSV file path.  
- **Quick start (no download):** This notebook includes a fallback **synthetic dataset** at `/mnt/data/telemetry_example.csv`.

**What you'll learn/do:**
1. Load and inspect telemetry data
2. Clean timestamps & handle missing values
3. Plot key health parameters (voltage, temperature, current)
4. Detect simple anomalies (rule-based + rolling z-score)
5. Generate a daily status summary and export reports

> Tip: Run each cell from top to bottom. If you don’t have the real dataset yet, the notebook will automatically use the synthetic CSV.



## 0) Setup

Install these if needed in your own environment (skip here if already available):

```bash
pip install pandas matplotlib numpy
```


In [None]:

# Imports and display options
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)



## 1) Choose your data file

- If you downloaded **OPS-SAT 1 UHF telemetry** (CSV), set `data_file` to its path.  
- Otherwise, leave it as `None` and the notebook will use the built-in synthetic dataset.


In [None]:

# Path to your real telemetry CSV (set this if you downloaded OPS-SAT 1 data)
data_file = None  # e.g., data_file = "/home/you/downloads/uhf_telemetry.csv"

# Fallback synthetic data shipped with this project (works out of the box)
fallback_file = "/mnt/data/telemetry_example.csv"

csv_path = Path(data_file) if data_file else Path(fallback_file)
print(f"Using: {csv_path}")



## 2) Load and preview data


In [None]:

df = pd.read_csv(csv_path)
print("Shape:", df.shape)
display(df.head())
display(df.describe(include='all'))



## 3) Parse timestamps and basic cleaning
- Convert timestamp to `datetime`
- Drop duplicate rows
- Sort by time and set index
- Optional: interpolate small gaps


In [None]:

# Try common timestamp column names
time_col_candidates = [c for c in df.columns if c.lower() in ["timestamp", "time", "datetime", "date_time"]]
if not time_col_candidates:
    raise ValueError("No obvious timestamp column found. Please rename your time column to 'timestamp'.")

time_col = time_col_candidates[0]
df[time_col] = pd.to_datetime(df[time_col], errors='coerce', utc=True)

# Drop bad timestamps
df = df.dropna(subset=[time_col]).copy()
df = df.drop_duplicates().sort_values(time_col).reset_index(drop=True)

# Set index
df = df.set_index(time_col)

# Basic numeric columns (auto-detect common telemetry names)
def pick(colset):
    cols = []
    for name in colset:
        cols.extend([c for c in df.columns if name in c.lower()])
    return list(dict.fromkeys(cols))

voltage_cols = pick(["volt"])
temp_cols    = pick(["temp"])
curr_cols    = pick(["curr"])
rssi_cols    = pick(["rssi"])

numeric_cols = list(dict.fromkeys([*voltage_cols, *temp_cols, *curr_cols, *rssi_cols]))
numeric_cols = [c for c in numeric_cols if pd.api.types.is_numeric_dtype(df[c])]

print("Detected numeric telemetry columns:", numeric_cols)

# Interpolate tiny gaps (optional)
df[numeric_cols] = df[numeric_cols].interpolate(limit=5).bfill().ffill()

# Resample to a uniform timeline (e.g., 1 minute) if data is high-rate
df_res = df.copy()
if df.index.inferred_type != "datetime64":
    raise TypeError("Index must be datetime after parsing.")

# Use 1-minute resample only if density is high; otherwise keep original
if df.index.to_series().diff().median() < pd.Timedelta("30s"):
    df_res = df[numeric_cols].resample("1min").mean().join(df.drop(columns=numeric_cols).resample("1min").first())
else:
    df_res = df

print("Final shape after cleaning/resampling:", df_res.shape)
display(df_res.head())



## 4) Plot key parameters (one chart per variable)
> Note: We avoid setting custom colors and only use Matplotlib, as requested.


In [None]:

def plot_series(series, title):
    plt.figure(figsize=(10, 4))
    series.plot()
    plt.title(title)
    plt.xlabel("Time (UTC)")
    plt.ylabel(series.name)
    plt.tight_layout()
    plt.show()

for col in numeric_cols[:6]:  # keep it reasonable
    plot_series(df_res[col].dropna(), f"{col} over time")



## 5) Simple anomaly detection
Two approaches:
1. **Rule-based thresholds** (set plausible min/max per signal)  
2. **Rolling z-score** (flag points deviating strongly from local mean)


In [None]:

# 5.1 Rule-based thresholds (customize as needed)
thresholds = {}
for c in numeric_cols:
    s = df_res[c].dropna()
    if s.empty:
        continue
    lo, hi = s.quantile(0.01), s.quantile(0.99)  # heuristic start
    # widen a bit
    pad = 0.1 * (hi - lo) if np.isfinite(hi - lo) else 0
    thresholds[c] = (float(lo - pad), float(hi + pad))

thresholds


In [None]:

def rule_based_flags(df_num, thresholds):
    flags = {}
    for c, (lo, hi) in thresholds.items():
        s = df_num[c]
        flags[c] = (s < lo) | (s > hi)
    return pd.DataFrame(flags, index=df_num.index)

rb_flags = rule_based_flags(df_res[numeric_cols], thresholds)
rb_counts = rb_flags.sum().sort_values(ascending=False)
print("Rule-based anomaly counts per signal:")
display(rb_counts.to_frame("count"))


In [None]:

# 5.2 Rolling z-score
def rolling_z_flags(df_num, window=30, z=3.0):
    flags = {}
    for c in df_num.columns:
        s = df_num[c].astype(float)
        mu = s.rolling(window, min_periods=max(5, window//3)).mean()
        sd = s.rolling(window, min_periods=max(5, window//3)).std(ddof=0)
        zsc = (s - mu) / sd
        flags[c] = zsc.abs() > z
    return pd.DataFrame(flags, index=df_num.index)

z_flags = rolling_z_flags(df_res[numeric_cols], window=30, z=3.0)
z_counts = z_flags.sum().sort_values(ascending=False)
print("Rolling-z anomaly counts per signal:")
display(z_counts.to_frame("count"))



## 6) Daily status summary
Aggregate min/max/mean and anomaly counts per day.


In [None]:

daily_stats = df_res[numeric_cols].resample("1D").agg(["min","mean","max"])
daily_rb = rb_flags.resample("1D").sum().add_suffix("_rbFlags")
daily_z  = z_flags.resample("1D").sum().add_suffix("_zFlags")

daily = daily_stats.join(daily_rb).join(daily_z)
display(daily.tail())

# Save report files
out_dir = Path("/mnt/data")
(out_dir / "status_reports").mkdir(parents=True, exist_ok=True)
daily.to_csv(out_dir / "status_reports" / "daily_status_summary.csv")

print("Saved:", out_dir / "status_reports" / "daily_status_summary.csv")



## 7) Export cleaned data and a short text report


In [None]:

clean_path = Path("/mnt/data/cleaned_telemetry.csv")
df_res.to_csv(clean_path)
print("Cleaned data saved to:", clean_path)

# Create a tiny text summary
top_rb = rb_counts.head(3).to_string()
top_z  = z_counts.head(3).to_string()

report = f"""Satellite Telemetry Status Report
===============================
Rows (cleaned): {len(df_res):,}
Time span: {df_res.index.min()}  →  {df_res.index.max()}

Top rule-based anomalies:
{top_rb}

Top rolling-z anomalies:
{top_z}

Notes:
- Thresholds were initialized from the 1st and 99th percentiles (then padded).
- Rolling window = 30 samples for z-score (adjust if your sampling period differs).
- Inspect plots above for context around flagged points.
"""

txt_path = Path("/mnt/data/status_reports/summary.txt")
txt_path.write_text(report, encoding="utf-8")
print("Text report saved to:", txt_path)



## 8) Next steps
- Tune thresholds per signal with domain knowledge
- Add per-mode analysis (e.g., split by ADCS mode if available)
- Create a Streamlit or PyQt dashboard
- Automate daily report generation (cron, Airflow, or simple script)
