# Batch Processing for All Burst Events

This notebook processes all manually annotated burst events from `burst_list_240330_240729.csv` and generates 128×128 training windows for GAN training, separated by burst type.

## Data Overview:
- **Type 2**: 9 events (12%) 
- **Type 3**: 62 events (85%)
- **Type 5**: 2 events (3%)
- **Total**: 73 burst events


In [1]:
import pandas as pd
import os
from batch_processing import process_all_bursts_by_type, load_burst_catalog

print("✅ Batch processing imports successful!")


✅ Batch processing imports successful!


In [3]:
# Configuration
CATALOG_PATH = "/Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/original/burst_list_240330_240729.csv"
ORIGINAL_CSV_DIR = "/Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/original"
OUTPUT_BASE_DIR = "/Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/gan_training_windows_128"

print(f"📋 Configuration:")
print(f"   Burst catalog: {os.path.basename(CATALOG_PATH)}")
print(f"   Original CSV dir: {ORIGINAL_CSV_DIR}")
print(f"   Output base dir: {OUTPUT_BASE_DIR}")

# Load and preview burst catalog
burst_df = load_burst_catalog(CATALOG_PATH)
print(f"\n📄 Sample burst entries:")
print(burst_df.head())


📋 Configuration:
   Burst catalog: burst_list_240330_240729.csv
   Original CSV dir: /Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/original
   Output base dir: /Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/gan_training_windows_128
📖 Loading burst catalog: /Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/original/burst_list_240330_240729.csv
   Total bursts: 74
   Date range: 240330 to 240729
   Locations: ['PeachMountian' 'SkylineHighSchool' 'MarquetteSeniorHig' 'Huron']
   Type distribution:
     Type 2: 9 events (12.2%)
     Type 3: 63 events (85.1%)
     Type 5: 2 events (2.7%)

📄 Sample burst entries:
                               file_name    date            location  \
0         240330182002-PeachMountain.csv  240330       PeachMountian   
1   240417180250-Skyline High School.csv  240417   SkylineHighSchool   
2  240419152923-Marquette Senior Hig.csv  240419  MarquetteSeniorHig   
3  240419152923-Marquette Senior Hig.cs

In [None]:
# 🚀 Execute batch processing for all burst types
print("🚀 Starting batch processing...")
print("This will generate 4-minute windows with 50% overlap for all 73 burst events")
print("⚠️  This may take 10-30 minutes depending on your system")

# Run the batch processing with fast mode
results = process_all_bursts_by_type(
    catalog_path=CATALOG_PATH,
    original_csv_dir=ORIGINAL_CSV_DIR,
    output_base_dir=OUTPUT_BASE_DIR,
    window_duration=4*60,    # 4 minutes
    overlap_ratio=0.5,       # 50% overlap
    apply_denoising=False,    # Apply noise removal
    cleaning_method="fast"   # Use fast mode (skip Step 4 for speed)
)

print("\n🎉 Batch processing completed!")


In [10]:
# 🔍 Random Burst Visualization - Inspect Slicing Position and Results
print("🔍 Randomly selecting a burst for detailed slicing visualization...")

import random
import matplotlib.pyplot as plt
import numpy as np
from batch_processing import BurstSlicer, time_to_column_indices

# Randomly select a burst from the catalog
random_burst = burst_df.sample(1).iloc[0]
print(f"\n📋 Selected burst for visualization:")
print(f"   Date: {random_burst['Date']}")
print(f"   Start Time: {random_burst['Start Time']}")
print(f"   End Time: {random_burst['End Time']}")
print(f"   Type: {random_burst['Type']}")
print(f"   Location: {random_burst['Location']}")

# Construct file path
csv_filename = f"{random_burst['Date']}_SkylineHS.csv"
csv_file_path = os.path.join(ORIGINAL_CSV_DIR, csv_filename)

if not os.path.exists(csv_file_path):
    print(f"❌ File not found: {csv_filename}")
    print("   Trying alternative naming...")
    # Try alternative naming patterns
    alt_patterns = [
        f"{random_burst['Date']}_PeachMountain.csv",
        f"{random_burst['Date']}_SkylineHS_2020.csv",
        f"{random_burst['Date']}.csv"
    ]
    
    for pattern in alt_patterns:
        alt_path = os.path.join(ORIGINAL_CSV_DIR, pattern)
        if os.path.exists(alt_path):
            csv_file_path = alt_path
            csv_filename = pattern
            print(f"✅ Found alternative: {pattern}")
            break
    else:
        print("❌ No matching file found, skipping visualization")
        csv_file_path = None

if csv_file_path and os.path.exists(csv_file_path):
    print(f"✅ Processing: {csv_filename}")
    
    # Initialize slicer for 128x128 windows
    slicer = BurstSlicer(
        window_duration=4*60,  # 4 minutes
        overlap_ratio=0.5,     # 50% overlap
        target_size=(128, 128)
    )
    
    # Load and process the data
    print("📊 Loading original data for window position visualization...")
    original_data, times, raw_data = slicer.load_and_preprocess_csv(
        csv_file_path, 
        apply_denoising=False  # Show original data to see window placement
    )
    transposed_original = slicer.transpose_data(original_data)
    
    # Get burst time indices
    start_idx, end_idx = time_to_column_indices(times, random_burst['Start Time'], random_burst['End Time'])
    
    # Generate slicing windows
    result = slicer.slice_burst_region(
        transposed_original, 
        times, 
        random_burst['Start Time'], 
        random_burst['End Time']
    )
    
    print(f"\\n📊 Slicing Results:")
    print(f"   Generated windows: {len(result['windows'])}")
    print(f"   Window positions: {result['positions']}")
    print(f"   Burst region: [{start_idx}, {end_idx}] (duration: {(end_idx-start_idx)*0.1:.1f}s)")
    
    # === VISUALIZATION 1: Window Position Overview ===
    vis_buffer = 2000  # Show 2000 samples (~3.3 minutes) before and after
    vis_start = max(0, start_idx - vis_buffer)
    vis_end = min(transposed_original.shape[1], end_idx + vis_buffer)
    vis_data = transposed_original[:, vis_start:vis_end]
    
    print(f"\\nCreating window position visualization...")
    fig = plt.figure(figsize=(18, 8))
    
    # Show original data with window positions
    ax1 = plt.subplot(1, 1, 1)
    im1 = ax1.imshow(vis_data, aspect='auto', origin='lower', cmap='viridis')
    ax1.set_title(f'Original Data with 4-Minute Window Positions\\nFile: {csv_filename} | Shape: {vis_data.shape}', fontsize=14)
    ax1.set_ylabel('Frequency Channel (411 total)')
    ax1.set_xlabel('Time Samples (100ms each)')
    
    # Mark burst boundaries
    burst_start_vis = start_idx - vis_start
    burst_end_vis = end_idx - vis_start
    ax1.axvline(burst_start_vis, color='red', linestyle='--', linewidth=3, alpha=0.9, label='Burst Start')
    ax1.axvline(burst_end_vis, color='red', linestyle='--', linewidth=3, alpha=0.9, label='Burst End')
    
    # Mark window positions
    colors = ['yellow', 'orange', 'cyan', 'magenta', 'lime', 'pink', 'lightblue']
    for i, pos in enumerate(result['positions']):
        if vis_start <= pos <= vis_end:
            window_start_vis = pos - vis_start
            window_end_vis = window_start_vis + slicer.window_samples
            
            # Window boundaries
            color = colors[i % len(colors)]
            ax1.axvline(window_start_vis, color=color, linestyle='-', linewidth=3, alpha=0.8)
            ax1.axvline(window_end_vis, color=color, linestyle='-', linewidth=3, alpha=0.8)
            
            # Window label
            ax1.text(window_start_vis + slicer.window_samples//2, vis_data.shape[0]*0.9, 
                    f'W{i+1}', ha='center', va='center', 
                    color=color, fontweight='bold', fontsize=14,
                    bbox=dict(boxstyle='round,pad=0.5', facecolor='black', alpha=0.8))
    
    ax1.legend(loc='upper right', fontsize=12)
    plt.colorbar(im1, ax=ax1, label='Power')
    
    plt.tight_layout()
    plt.show()
    
    # === VISUALIZATION 2: Final 128×128 Windows ===
    print("\\n📊 Displaying final 128×128 windows for GAN training...")
    
    windows = result['windows']
    positions = result['positions']
    num_windows = len(windows)
    
    if num_windows > 0:
        # Create window display
        cols = min(4, num_windows)  # Max 4 columns
        rows = (num_windows + cols - 1) // cols
        fig, axes = plt.subplots(rows, cols, figsize=(5*cols, 4*rows))
        
        if num_windows == 1:
            axes = [axes]
        elif rows == 1:
            axes = axes.reshape(1, -1)
        
        # Display each generated window
        for i in range(num_windows):
            row = i // cols
            col = i % cols
            
            if rows == 1:
                ax = axes[col] if cols > 1 else axes
            else:
                ax = axes[row, col]
            
            im = ax.imshow(windows[i], aspect='auto', origin='lower', cmap='plasma')
            
            # Calculate time info
            start_time_s = positions[i] * 0.1  # Convert to seconds
            end_time_s = (positions[i] + slicer.window_samples) * 0.1
            
            ax.set_title(f'Window {i+1} (128×128)\\nPosition: x{positions[i]}\\nTime: {start_time_s:.1f}s - {end_time_s:.1f}s', 
                        fontsize=11)
            ax.set_xlabel('Time (compressed 18.75×)')
            ax.set_ylabel('Frequency (compressed 3.2×)')
            plt.colorbar(im, ax=ax, fraction=0.046)
        
        # Hide unused subplots
        for i in range(num_windows, rows * cols):
            row = i // cols
            col = i % cols
            if rows == 1:
                ax = axes[col] if cols > 1 else axes
            else:
                ax = axes[row, col]
            ax.set_visible(False)
        
        plt.tight_layout()
        plt.show()
        
        # Detailed statistics
        print(f"\\n📈 Final Data Statistics:")
        print(f"   Generated windows: {num_windows}")
        print(f"   Window size: {slicer.target_size}")
        print(f"   Compression ratios:")
        print(f"     Time: {slicer.window_samples}→128 ({slicer.window_samples/128:.1f}×)")
        print(f"     Frequency: 411→128 ({411/128:.1f}×)")
        print(f"     Total: {slicer.window_samples*411/(128*128):.1f}× data reduction")
        
        print(f"\\n📋 Window Details for GAN Training:")
        for i, (window, pos) in enumerate(zip(windows, positions)):
            start_time = pos * 0.1
            end_time = (pos + slicer.window_samples) * 0.1
            print(f"   Window {i+1}: x{pos} ({start_time:.1f}s-{end_time:.1f}s)")
            print(f"     Shape: {window.shape} | Range: [{window.min():.3f}, {window.max():.3f}] | Mean: {window.mean():.3f}")
        
        print(f"\\n🎯 This burst will contribute {num_windows} windows to GAN training data")
        
    else:
        print("⚠️  No windows generated for this burst (may be too short or outside time range)")

else:
    print("❌ Skipping visualization due to missing file")

print("\\n" + "="*80)


🔍 Randomly selecting a burst for detailed slicing visualization...


ImportError: cannot import name 'BurstSlicer' from 'batch_processing' (/Users/remiliascarlet/Desktop/MDP/transfer_learning/radburst_tl/data_preprocessing_new/batch_processing.py)

In [11]:
# 📊 Analyze results and prepare for GAN training
print("📊 Analyzing generated training data...")

# Detailed analysis by type
for burst_type, result_info in results.items():
    print(f"\n🔍 Type {burst_type} Analysis:")
    print(f"   Successful processing: {result_info['successful_bursts']}/{result_info['total_bursts']} bursts")
    print(f"   Generated windows: {result_info['total_windows']}")
    print(f"   Output directory: {result_info['output_directory']}")
    print(f"   Average windows per burst: {result_info['total_windows']/result_info['successful_bursts'] if result_info['successful_bursts'] > 0 else 0:.1f}")
    
    # Check directory contents
    if os.path.exists(result_info['output_directory']):
        files = [f for f in os.listdir(result_info['output_directory']) if f.endswith('.csv')]
        print(f"   Files in directory: {len(files)}")
        if files:
            print(f"   Sample filenames:")
            for filename in files[:3]:
                print(f"     - {filename}")

# Total statistics
total_windows = sum(r['total_windows'] for r in results.values())
total_bursts = sum(r['successful_bursts'] for r in results.values())

print(f"\n🎯 GAN Training Data Ready:")
print(f"   Total windows: {total_windows}")
print(f"   Total bursts: {total_bursts}")
print(f"   Types available: {list(results.keys())}")

print(f"\n💡 Next Steps:")
print(f"   1. Train separate GANs for each type (recommended)")
print(f"   2. Or combine Type 3 + others for mixed training")
print(f"   3. Use the 128×128 CSV files directly in DCGAN training")

# Recommendations based on data size
print(f"\n📋 Training Recommendations:")
for burst_type, result_info in results.items():
    windows_count = result_info['total_windows']
    if windows_count >= 500:
        print(f"   Type {burst_type}: {windows_count} windows → ✅ Excellent for GAN training")
    elif windows_count >= 200:
        print(f"   Type {burst_type}: {windows_count} windows → ✅ Good for GAN training")
    elif windows_count >= 50:
        print(f"   Type {burst_type}: {windows_count} windows → ⚠️  Limited, consider data augmentation")
    else:
        print(f"   Type {burst_type}: {windows_count} windows → ❌ Too few, combine with others")


📊 Analyzing generated training data...

🔍 Type 3 Analysis:
   Successful processing: 62/63 bursts
   Generated windows: 218
   Output directory: /Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/gan_training_windows_128/type_3
   Average windows per burst: 3.5
   Files in directory: 218
   Sample filenames:
     - window_type3_240708143141-Skyline High School_x10336_burst_144056to144701.csv
     - window_type3_240519161711-Skyline High School_x36616_burst_171835to171942.csv
     - window_type3_240422122420-Skyline High School_x142219_burst_162440to162612.csv

🔍 Type 5 Analysis:
   Successful processing: 2/2 bursts
   Generated windows: 4
   Output directory: /Users/remiliascarlet/Desktop/MDP/transfer_learning/burst_data/csv/gan_training_windows_128/type_5
   Average windows per burst: 2.0
   Files in directory: 4
   Sample filenames:
     - window_type5_240417180250-Skyline High School_x71032_burst_200328to200501.csv
     - window_type5_240417180250-Skyline High School