# 📚 Plotbot Data Access Patterns Guide

This notebook demonstrates the **correct** ways to access data in Plotbot, including:
1. **Time Series Data** - Different access methods for `mag_rtn_4sa.br`
2. **Spectral Data** - Working with `epad` pitch angle distributions
3. **Plot/Figure Control** - How to return (or not return) plot objects

**Created:** 2025-09-30  
**Purpose:** Reference guide for data access patterns

⚠️ **CRITICAL**: Shows why `.data` property is required for time-clipped data!


In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import plotbot
from plotbot import *


---
## 🔵 Part 1: Time Series Data Access (mag_rtn_4sa.br)

### 📊 Understanding What Different Access Methods Return

When you call `plotbot()` multiple times with **different time ranges**, Plotbot:
1. **Downloads** new data if needed
2. **MERGES** it with existing data in the data cubby (accumulates ALL data!)
3. Different access methods return **different views** of this data

### What Each Method Returns:
```python
# After 2 plotbot calls with different tranges:
mag_rtn_4sa.br              # Returns: ALL merged data (32,959 points)
np.array(mag_rtn_4sa.br)    # Returns: ALL merged data (32,959 points)
mag_rtn_4sa.br.data         # Returns: Time-clipped data (16,480 points for trange2)
mag_rtn_4sa.br.all_data     # Returns: ALL merged data (32,959 points)
```

**None of these are "wrong" - they just return different things!**  
**Choose based on what you need.**

Let's demonstrate this step-by-step...


In [None]:
# Define two NON-OVERLAPPING time ranges (separated by 2 hours)
trange1 = ['2021-01-19/02:00:00', '2021-01-19/03:00:00']  # 1 hour
trange2 = ['2021-01-19/05:00:00', '2021-01-19/06:00:00']  # 1 hour, 2 hours later

print("📅 Test Setup:")
print(f"   trange1: {trange1[0]} to {trange1[1]}")
print(f"   trange2: {trange2[0]} to {trange2[1]}")
print(f"   Gap: 2 hours - data will NOT overlap")


In [None]:
# Enable debug output to see what's happening
print_manager.show_data_cubby = True
print_manager.show_status = True
print_manager.show_debug = False

# FIRST PLOTBOT CALL
print("\n" + "="*80)
print("🔵 FIRST PLOTBOT CALL (trange1)")
print("="*80)

plotbot(trange1, mag_rtn_4sa.br, 1);


In [None]:
# Access data using different methods
print("\n📊 After trange1 - Testing Access Methods:")

# Method 1: .data property (time-clipped)
data_1 = mag_rtn_4sa.br.data
print(f"\n1️⃣ mag_rtn_4sa.br.data → {len(data_1):,} points (time-clipped)")

# Method 2: Direct array access (full underlying array)
direct_1 = np.array(mag_rtn_4sa.br)
print(f"2️⃣ np.array(mag_rtn_4sa.br) → {len(direct_1):,} points (full array)")

# Method 3: .all_data property (all accumulated data)
all_data_1 = mag_rtn_4sa.br.all_data
print(f"3️⃣ mag_rtn_4sa.br.all_data → {len(all_data_1):,} points (all data)")

print(f"\n   All methods return same length? {len(data_1) == len(direct_1) == len(all_data_1)}")
print(f"   ℹ️ After FIRST call, all methods return the same (only 1 trange loaded)")


In [None]:
# SECOND PLOTBOT CALL - DIFFERENT TIME RANGE!
print("\n" + "="*80)
print("🟢 SECOND PLOTBOT CALL (trange2 - DIFFERENT TIME!)")
print("="*80)
print("⚠️ Watch for: 'ULTIMATE MERGE ENGINE' and 'NO OVERLAP - Simple concatenation'")
print("   This means data cubby will MERGE both time ranges!\n")

fig2 = plotbot(trange2, mag_rtn_4sa.br, 1)


In [None]:
# THE KEY COMPARISON
print("\n" + "="*80)
print("📊 KEY DIFFERENCE: What Does Each Method Return After trange2?")
print("="*80)

# Method 1: .data property (time-clipped to requested trange)
data_2 = mag_rtn_4sa.br.data
times_2 = mag_rtn_4sa.br.datetime_array

print(f"\n1️⃣ mag_rtn_4sa.br.data:")
print(f"   Points: {len(data_2):,}")
print(f"   First timestamp: {times_2[0]}")
print(f"   Last timestamp: {times_2[-1]}")
print(f"   → Returns ONLY data for trange2 (time-clipped)")

# Method 2: Direct array access (returns underlying full array)
direct_2 = np.array(mag_rtn_4sa.br)

print(f"\n2️⃣ np.array(mag_rtn_4sa.br):")
print(f"   Points: {len(direct_2):,}")
print(f"   → Returns {len(direct_2):,} = {len(data_1):,} (trange1) + {len(data_2):,} (trange2)")
print(f"   → Returns ALL data from both time ranges (merged)")

# Method 3: .all_data (explicitly returns all data)
all_data_2 = mag_rtn_4sa.br.all_data

print(f"\n3️⃣ mag_rtn_4sa.br.all_data:")
print(f"   Points: {len(all_data_2):,}")
print(f"   → Returns ALL cached data (same as direct access)")


### 📋 Summary Table


In [None]:
print("\n" + "="*80)
print("📋 SUMMARY: Data Access Methods - What Each Returns")
print("="*80 + "\n")

summary = pd.DataFrame({
    'Access Method': [
        'mag_rtn_4sa.br.data',
        'mag_rtn_4sa.br.datetime_array',
        'np.array(mag_rtn_4sa.br)',
        'mag_rtn_4sa.br.all_data'
    ],
    'After trange1': [
        f"{len(data_1):,}",
        f"{len(data_1):,}",
        f"{len(direct_1):,}",
        f"{len(all_data_1):,}"
    ],
    'After trange2': [
        f"{len(data_2):,}",
        f"{len(times_2):,}",
        f"{len(direct_2):,}",
        f"{len(all_data_2):,}"
    ],
    'What It Returns': [
        'Time-clipped to trange',
        'Time-clipped to trange',
        'Full underlying array',
        'Full underlying array'
    ],
    'Use When': [
        'Want current trange only',
        'Want current trange times',
        'Want all accumulated data',
        'Want all accumulated data'
    ]
})

print(summary.to_string(index=False))

print("\n💡 KEY INSIGHT:")
print("   • If you want data for the CURRENT trange → use .data")
print("   • If you want ALL accumulated data → use np.array() or .all_data")
print("   • Neither is 'wrong' - they serve different purposes!")
