# CDF to Plotbot Integration - Wave Data Example

This notebook demonstrates how to build a modular `cdf_to_plotbot` function using PSP Wave data as a test case.

## Goals:
1. **Learn CDF file structure and metadata extraction**
2. **Build foundation for `cdf_to_plotbot()` function** 
3. **Automatically create plotbot-compatible classes**
4. **Extract units, labels, and plot formatting from metadata**
5. **Handle different datetime formats automatically**

## Test Files:
- `PSP_WaveAnalysis_2021-04-29_0600_v1.2.cdf` (1.5GB) - 2D spectra data
- `PSP_wavePower_2021-04-29_v1.3.cdf` (4.5MB) - 1D summed quantities (RH/LH wave power)

**Strategy**: Start with the smaller wave power file, then move to the complex spectra file.


In [1]:
# Import required packages
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from warnings import simplefilter
simplefilter(action='ignore', category=DeprecationWarning)

# CDF handling
import cdflib
from pytplot import cdf_to_tplot

# Add plotbot to path if running from this directory
if os.path.exists('../../../../plotbot'):
    sys.path.append('../../../../')
    from plotbot import print_manager
    print("✅ Plotbot modules available")
else:
    print("⚠️  Plotbot not in path - will create standalone functions")

print("📦 All imports successful")


initialized server_access
initialized global_tracker
initialized ploptions
initialized plot_manager
initialized epad class
initialized epad_hr class
initialized proton class
initialized proton_hr class
initialized ham_class
initialized psp_alpha class
initialized psp_qtn class
initialized psp_orbit class
initialized data_cubby.
initialized proton_fits class
initialized alpha_fits class
No save directory previously set. Defaulting to: ./audio_files
🔉 initialized audifier

Importing libraries, this may take a moment. Hold tight... 

✅ Imported standard libraries and utilities.
✅ Imported numpy, pandas, and scipy libraries.
✅ Imported matplotlib libraries.
✅ Imported cdflib, BeautifulSoup, requests, dateutil, and datetime libraries.

🤖 Plotbot Initialized
✨ Showdahodo initialized
Positional Data Helper Initialized
📈📉 Multiplot Initialized

🤖 Plotbot Initialized
📈📉 Multiplot Initialized
   Version: 2025_07_14_v2.86
   Commit: v2.86 CUSTOM VARIABLES FIX: Scalar operations now work - enables

In [2]:
# File setup and validation
current_dir = os.getcwd()
wave_analysis_file = 'PSP_WaveAnalysis_2021-04-29_0600_v1.2.cdf'
wave_power_file = 'PSP_wavePower_2021-04-29_v1.3.cdf'

# Check file availability
files_status = {}
for filename in [wave_analysis_file, wave_power_file]:
    filepath = os.path.join(current_dir, filename)
    exists = os.path.exists(filepath)
    files_status[filename] = {'path': filepath, 'exists': exists}
    
    if exists:
        file_size = os.path.getsize(filepath) / (1024**3)  # GB
        print(f"✅ {filename} ({file_size:.1f}GB)")
    else:
        print(f"❌ {filename} - NOT FOUND")

# Start with the smaller wave power file
test_file = wave_power_file if files_status[wave_power_file]['exists'] else wave_analysis_file
print(f"\n🎯 Starting analysis with: {test_file}")


✅ PSP_WaveAnalysis_2021-04-29_0600_v1.2.cdf (1.5GB)
✅ PSP_wavePower_2021-04-29_v1.3.cdf (0.0GB)

🎯 Starting analysis with: PSP_wavePower_2021-04-29_v1.3.cdf


In [3]:
def explore_cdf_structure(filepath):
    """
    Core function to explore CDF file structure and extract metadata.
    This is the foundation for the future cdf_to_plotbot() function.
    """
    print(f"\n🔍 Exploring CDF structure: {os.path.basename(filepath)}")
    print("-" * 60)
    
    # Open CDF file
    cdf_file = cdflib.CDF(filepath)
    
    # Get all variables
    info = cdf_file.cdf_info()
    zvars = info['zVariables']  # Most data variables
    rvars = info['rVariables']  # Record variables (often metadata)
    
    print(f"📊 Found {len(zvars)} zVariables and {len(rvars)} rVariables")
    
    # Store metadata for each variable
    variable_metadata = {}
    
    print("\n📋 Variable Summary:")
    for var_name in zvars:
        var_info = cdf_file.varinq(var_name)
        var_attrs = cdf_file.varattsget(var_name)
        
        # Extract key metadata
        metadata = {
            'data_type': var_info['Data_Type_Description'],
            'dimensions': var_info['Dim_Sizes'],
            'units': var_attrs.get('UNITS', var_attrs.get('units', 'Unknown')),
            'description': var_attrs.get('CATDESC', var_attrs.get('description', 'No description')),
            'depend_0': var_attrs.get('DEPEND_0', None),  # Time variable
            'display_type': var_attrs.get('DISPLAY_TYPE', 'time_series'),
            'scale_typ': var_attrs.get('SCALETYP', 'linear'),
            'var_type': var_attrs.get('VAR_TYPE', 'data')
        }
        
        variable_metadata[var_name] = metadata
        
        # Print summary
        dims_str = f"{metadata['dimensions']}" if metadata['dimensions'] else "scalar"
        print(f"  • {var_name:20} | {metadata['data_type']:15} | {dims_str:15} | {metadata['units']}")
    
    cdf_file.close()
    return variable_metadata

# Explore the test file
if files_status[test_file]['exists']:
    metadata = explore_cdf_structure(files_status[test_file]['path'])
else:
    print("❌ No CDF files available for exploration")



🔍 Exploring CDF structure: PSP_wavePower_2021-04-29_v1.3.cdf
------------------------------------------------------------


TypeError: 'CDFInfo' object is not subscriptable

In [4]:
def extract_datetime_info(cdf_file, metadata):
    """
    Extract and convert datetime information from CDF file.
    Different CDF files use different time formats - detect and convert automatically.
    """
    print("\n🕒 Analyzing datetime variables...")
    
    # Find time variables (usually DEPEND_0 or contain 'time' in name)
    time_vars = []
    for var_name, meta in metadata.items():
        if ('time' in var_name.lower() or 
            meta['var_type'] == 'support_data' and 'time' in meta['description'].lower()):
            time_vars.append(var_name)
    
    print(f"Found potential time variables: {time_vars}")
    
    datetime_info = {}
    for time_var in time_vars:
        try:
            time_data = cdf_file.varget(time_var)
            
            # Try different conversion methods
            if len(time_data) > 0:
                print(f"\n📅 Processing {time_var}:")
                print(f"   Raw data type: {type(time_data[0])}")
                print(f"   Shape: {time_data.shape}")
                print(f"   Sample values: {time_data[:3]}")
                
                # Attempt conversion using cdflib's epoch conversion
                try:
                    converted_times = cdflib.cdfepoch.to_datetime(time_data)
                    datetime_info[time_var] = {
                        'raw_data': time_data,
                        'datetime_array': converted_times,
                        'start_time': converted_times[0] if len(converted_times) > 0 else None,
                        'end_time': converted_times[-1] if len(converted_times) > 0 else None,
                        'conversion_method': 'cdfepoch.to_datetime'
                    }
                    print(f"   ✅ Converted successfully: {converted_times[0]} to {converted_times[-1]}")
                except Exception as e:
                    print(f"   ❌ CDF epoch conversion failed: {e}")
                    datetime_info[time_var] = {'error': str(e)}
                    
        except Exception as e:
            print(f"   ❌ Failed to process {time_var}: {e}")
    
    return datetime_info

# Extract datetime info from our test file
if files_status[test_file]['exists']:
    cdf_file = cdflib.CDF(files_status[test_file]['path'])
    datetime_info = extract_datetime_info(cdf_file, metadata)
    cdf_file.close()
else:
    datetime_info = {}


NameError: name 'metadata' is not defined

In [None]:
# Quick test of the generated class structure
if 'plotbot_class' in locals():
    print("🧪 Testing generated class structure...")
    print(f"Class name: {plotbot_class['class_name']}")
    print(f"Variables: {len(plotbot_class['variables'])}")
    print(f"Datetime vars: {len(plotbot_class['datetime_vars'])}")
    
    print("\n🔍 Sample variables:")
    for i, (var_name, config) in enumerate(list(plotbot_class['variables'].items())[:5]):
        print(f"  {i+1}. {var_name}: {config['plot_type']} plot, {config['units']}")
        
    # Show RH/LH color assignments (per Jaye's notes)
    rh_lh_vars = {k: v for k, v in plotbot_class['variables'].items() 
                  if 'RH' in k or 'LH' in k or 'rh' in k.lower() or 'lh' in k.lower()}
    
    if rh_lh_vars:
        print(f"\n🎨 RH/LH color assignments ({len(rh_lh_vars)} variables):")
        for var_name, config in rh_lh_vars.items():
            print(f"  • {var_name}: {config['color']}")
    
    print(f"\n✅ Prototype cdf_to_plotbot function working!")
    print("📋 Next steps:")
    print("  1. Generate actual .py and .pyi files")
    print("  2. Integration with plotbot data_classes/")
    print("  3. Test with larger CDF files")
    print("  4. Add CSV support (csv_to_plotbot)")
else:
    print("❌ Run previous cells to generate class structure")


In [5]:
def cdf_to_plotbot_prototype(filepath, class_name=None):
    """
    Prototype of the cdf_to_plotbot function based on Jaye's specifications.
    
    Input: 
    - filepath: '/path/to/file.cdf'
    - class_name: optional new class name
    
    Output:
    - Create new plotbot-compatible class with all CDF variables
    - Extract metadata for units, labels, plot types
    - Handle datetime conversion automatically
    - Generate .pyi file for type hints
    """
    print(f"\n🚀 CDF to Plotbot Conversion: {os.path.basename(filepath)}")
    print("=" * 70)
    
    # Auto-generate class name if not provided
    if class_name is None:
        base_name = os.path.splitext(os.path.basename(filepath))[0]
        class_name = f"PSP_{base_name.replace('-', '_').replace(' ', '_')}"
    
    print(f"📝 Creating class: {class_name}")
    
    # Open CDF and extract all information
    cdf_file = cdflib.CDF(filepath)
    metadata = explore_cdf_structure(filepath)
    datetime_info = extract_datetime_info(cdf_file, metadata)
    
    # Structure for the new plotbot class
    class_structure = {
        'class_name': class_name,
        'variables': {},
        'datetime_vars': datetime_info,
        'plot_configs': {}
    }
    
    print(f"\n🔧 Processing {len(metadata)} variables...")
    
    # Process each variable
    for var_name, meta in metadata.items():
        try:
            data = cdf_file.varget(var_name)
            
            # Create plotbot variable config
            var_config = {
                'data': data,
                'units': meta['units'],
                'ylabel': f"{var_name} ({meta['units']})" if meta['units'] != 'Unknown' else var_name,
                'description': meta['description'],
                'plot_type': 'line',  # Default, can be updated based on dimensions
                'color': None,  # Will assign rainbow colors
                'scale': 'log' if meta['scale_typ'] == 'log' else 'linear'
            }
            
            # Determine plot type based on dimensions
            if len(data.shape) == 1:
                var_config['plot_type'] = 'line'
            elif len(data.shape) == 2:
                var_config['plot_type'] = 'spectrogram'  # 2D contour
                
            # Handle special cases based on Jaye's notes
            if 'RH' in var_name or 'rh' in var_name.lower():
                var_config['color'] = 'red'
            elif 'LH' in var_name or 'lh' in var_name.lower():
                var_config['color'] = 'blue'
                
            class_structure['variables'][var_name] = var_config
            
            print(f"  ✅ {var_name:25} | {str(data.shape):15} | {meta['units']:15} | {var_config['plot_type']}")
            
        except Exception as e:
            print(f"  ❌ {var_name:25} | Error: {e}")
    
    cdf_file.close()
    
    # Generate rainbow colors for variables without specific colors
    import matplotlib.cm as cm
    variables_needing_colors = [v for v in class_structure['variables'].values() if v['color'] is None]
    if variables_needing_colors:
        colors = cm.rainbow(np.linspace(0, 1, len(variables_needing_colors)))
        for i, var_config in enumerate(variables_needing_colors):
            var_config['color'] = colors[i]
    
    print(f"\n🎨 Assigned colors to {len(variables_needing_colors)} variables")
    print(f"📊 Class structure complete: {len(class_structure['variables'])} variables ready")
    
    return class_structure

# Test the prototype function
if files_status[test_file]['exists']:
    plotbot_class = cdf_to_plotbot_prototype(files_status[test_file]['path'])
else:
    print("❌ No file available for conversion")



🚀 CDF to Plotbot Conversion: PSP_wavePower_2021-04-29_v1.3.cdf
📝 Creating class: PSP_PSP_wavePower_2021_04_29_v1.3

🔍 Exploring CDF structure: PSP_wavePower_2021-04-29_v1.3.cdf
------------------------------------------------------------


TypeError: 'CDFInfo' object is not subscriptable

## Summary & Integration Plan

This notebook demonstrates the core functionality for Jaye's `cdf_to_plotbot` vision:

### ✅ What We've Built:
1. **CDF structure exploration** - automatically extracts variables and metadata
2. **Datetime handling** - detects and converts different time formats 
3. **Metadata extraction** - units, descriptions, plot types from CDF attributes
4. **Color assignment** - RH=red, LH=blue per Jaye's specs, rainbow for others
5. **Plot type detection** - 1D=line plots, 2D=spectrograms
6. **Class structure generation** - foundation for plotbot integration

### 🔄 Integration into Plotbot System:

**File Location**: `plotbot/data_classes/cdf_integration.py`

```python
# In plotbot system:
from plotbot.data_classes.cdf_integration import cdf_to_plotbot

# Usage:
wave_data = cdf_to_plotbot('/path/to/wave_file.cdf', 'PSP_Waves')
# Creates: plotbot/data_classes/psp_waves.py 
#          plotbot/data_classes/psp_waves.pyi
```

**Benefits:**
- **Modular**: Works with any CDF file
- **Automatic**: Extracts all metadata without manual coding
- **Plotbot-compatible**: Generates classes that fit existing patterns
- **Extensible**: Foundation for `csv_to_plotbot`, `fits_to_plotbot`

### 🎯 Next Steps:
1. **Move to plotbot system** - implement in `data_classes/`
2. **Test with FITS/HAM CSV files** - extend to other formats
3. **Add to data import workflow** - integrate with existing download functions
4. **Generate proper .pyi files** - for type checking and IDE support

### 💡 Key Insight:
Instead of manually coding each data class, we can **automatically generate** them from metadata, making plotbot much more flexible for new data sources!
