# SeeSpot API Query Example

This notebook demonstrates how to query the `get_real_spots_data` API endpoint from your SeeSpot FastAPI application and process the response data.

## Overview

The API endpoint `/api/real_spots_data` returns:
- **channel_pairs**: Available channel combinations for visualization
- **spots_data**: Individual spot records with intensities and metadata
- **spot_details**: Detailed coordinates and cell information for neuroglancer
- **fused_s3_paths**: S3 paths to fused image data
- **ratios** (optional): Spectral unmixing ratio matrix
- **summary_stats** (optional): Processing summary statistics

## 1. Import Required Libraries

Import necessary libraries for making API requests and data analysis.

In [3]:
import requests
import pandas as pd
import numpy as np
import json
#import matplotlib.pyplot as plt
#import seaborn as sns
from typing import Dict, List, Any

# Set up plotting style
# plt.style.use('default')
# sns.set_palette("husl")

## 2. Set Up API Configuration

Configure the API endpoint and request parameters. Make sure your SeeSpot server is running on the specified port.

In [6]:
# API Configuration
BASE_URL = "http://localhost:9995"  # Adjust port if different
API_ENDPOINT = f"{BASE_URL}/api/real_spots_data"

# Request parameters
params = {
    "sample_size": 5000,        # Number of spots to sample
    "force_refresh": False      # Set to True to bypass cache
}

print(f"API Endpoint: {API_ENDPOINT}")
print(f"Parameters: {params}")

API Endpoint: http://localhost:9995/api/real_spots_data
Parameters: {'sample_size': 5000, 'force_refresh': False}


## 3. Make the API Request

Send a GET request to the API endpoint and handle the response.

In [7]:
try:
    # Make the API request
    print("Making API request...")
    response = requests.get(API_ENDPOINT, params=params, timeout=60)
    
    # Check if request was successful
    if response.status_code == 200:
        print(f"✅ Success! Status code: {response.status_code}")
        
        # Parse JSON response
        data = response.json()
        print(f"📊 Response data type: {type(data)}")
        
    else:
        print(f"❌ Error! Status code: {response.status_code}")
        print(f"Response: {response.text}")
        data = None
        
except requests.exceptions.RequestException as e:
    print(f"❌ Request failed: {e}")
    data = None

Making API request...
✅ Success! Status code: 200
📊 Response data type: <class 'dict'>


## 4. Parse and Explore the Response

Examine the structure of the returned data and understand what's available.

In [8]:
if data is not None:
    print("🔍 Exploring API response structure...")
    print(f"Response keys: {list(data.keys())}")
    print()
    
    # Examine each component
    for key, value in data.items():
        if isinstance(value, list):
            print(f"📋 {key}: list with {len(value)} items")
            if len(value) > 0:
                print(f"   First item type: {type(value[0])}")
                if isinstance(value[0], dict):
                    print(f"   First item keys: {list(value[0].keys())[:5]}...")
        elif isinstance(value, dict):
            print(f"📚 {key}: dict with {len(value)} keys")
            print(f"   Sample keys: {list(value.keys())[:3]}...")
        else:
            print(f"📄 {key}: {type(value)} - {value}")
        print()
else:
    print("⚠️ No data available. Make sure the SeeSpot server is running!")

🔍 Exploring API response structure...
Response keys: ['channel_pairs', 'spots_data', 'spot_details', 'fused_s3_paths', 'ratios', 'summary_stats']

📋 channel_pairs: list with 10 items
   First item type: <class 'list'>

📋 spots_data: list with 5000 items
   First item type: <class 'dict'>
   First item keys: ['spot_id', 'chan', 'r', 'dist', 'unmixed_chan']...

📚 spot_details: dict with 5000 keys
   Sample keys: ['7966486', '5243252', '7691947']...

📋 fused_s3_paths: list with 5 items
   First item type: <class 'str'>

📋 ratios: list with 5 items
   First item type: <class 'list'>

📋 summary_stats: list with 5 items
   First item type: <class 'dict'>
   First item keys: ['min_dist', 'round', 'total_spots', 'kept_spots', 'reassigned_spots']...



## 5. Extract Specific Data Components

Convert the API response into useful data structures for analysis.

In [11]:
# Let's check the current memory usage and data types
if data is not None:
    spots_data = data.get('spots_data', [])
    if spots_data:
        df_spots = pd.DataFrame(spots_data)
        
        print("🔍 Current DataFrame Info:")
        print(f"Shape: {df_spots.shape}")
        print(f"Memory usage: {df_spots.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
        print("\n📋 Current Data Types:")
        print(df_spots.dtypes)
        
        # Show memory usage by column
        print("\n💾 Memory Usage by Column:")
        memory_usage = df_spots.memory_usage(deep=True)
        for col, usage in memory_usage.items():
            if usage > 0:
                print(f"  {col}: {usage / 1024:.1f} KB")
else:
    print("⚠️ No data available!")

🔍 Current DataFrame Info:
Shape: (5000, 12)
Memory usage: 0.89 MB

📋 Current Data Types:
spot_id                 int64
chan                   object
r                     float64
dist                  float64
unmixed_chan           object
reassigned               bool
unmixed_removed          bool
chan_488_intensity    float64
chan_514_intensity    float64
chan_561_intensity    float64
chan_594_intensity    float64
chan_638_intensity    float64
dtype: object

💾 Memory Usage by Column:
  Index: 0.1 KB
  spot_id: 39.1 KB
  chan: 293.0 KB
  r: 39.1 KB
  dist: 39.1 KB
  unmixed_chan: 293.0 KB
  reassigned: 4.9 KB
  unmixed_removed: 4.9 KB
  chan_488_intensity: 39.1 KB
  chan_514_intensity: 39.1 KB
  chan_561_intensity: 39.1 KB
  chan_594_intensity: 39.1 KB
  chan_638_intensity: 39.1 KB


### Memory Optimization

Let's optimize the data types to reduce memory usage. We can convert float64 to float32 and use smaller integer types where appropriate.

In [None]:
def optimize_dataframe_dtypes(df):
    """Optimize DataFrame data types for memory efficiency."""
    
    # Make a copy to avoid modifying the original
    df_optimized = df.copy()
    
    # Define type mappings for specific columns
    dtype_map = {
        # Boolean columns
        'valid_spot': 'bool',
        'reassigned': 'bool', 
        'unmixed_removed': 'bool',
        
        # String columns (will convert to category if beneficial)
        'chan': 'category',
        'unmixed_chan': 'category',
        'cell_id': 'category',
        
        # Integer columns - use smaller types where possible
        'round': 'int8',  # rounds are typically 1-10
    }
    
    # Apply explicit type conversions
    for col, dtype in dtype_map.items():
        if col in df_optimized.columns:
            try:
                if dtype == 'category':
                    df_optimized[col] = df_optimized[col].astype('category')
                else:
                    df_optimized[col] = df_optimized[col].astype(dtype)
                print(f"✅ Converted {col} to {dtype}")
            except Exception as e:
                print(f"⚠️ Could not convert {col} to {dtype}: {e}")
    
    # Convert float64 to float32 for numeric columns
    float_cols = df_optimized.select_dtypes(include=['float64']).columns
    for col in float_cols:
        # Check if values fit in float32 range
        max_val = df_optimized[col].max()
        min_val = df_optimized[col].min()
        
        if pd.isna(max_val) or pd.isna(min_val):
            continue
            
        # Float32 range is approximately ±3.4e38
        if abs(max_val) < 3.4e38 and abs(min_val) < 3.4e38:
            df_optimized[col] = df_optimized[col].astype('float32')
            print(f"✅ Converted {col} from float64 to float32")
    
    # Convert large integers to smaller types if possible
    int_cols = df_optimized.select_dtypes(include=['int64']).columns
    for col in int_cols:
        if col in dtype_map:
            continue  # Already handled above
            
        max_val = df_optimized[col].max()
        min_val = df_optimized[col].min()
        
        # Try int32 first, then int16, then int8
        if min_val >= -2147483648 and max_val <= 2147483647:
            df_optimized[col] = df_optimized[col].astype('int32')
            print(f"✅ Converted {col} from int64 to int32")
        elif min_val >= -32768 and max_val <= 32767:
            df_optimized[col] = df_optimized[col].astype('int16')
            print(f"✅ Converted {col} from int64 to int16")
        elif min_val >= -128 and max_val <= 127:
            df_optimized[col] = df_optimized[col].astype('int8')
            print(f"✅ Converted {col} from int64 to int8")
    
    return df_optimized

# Apply optimization if we have data
if data is not None and 'spots_data' in data:
    print("🚀 Optimizing DataFrame...")
    df_spots_optimized = optimize_dataframe_dtypes(df_spots)
    
    # Compare memory usage
    original_memory = df_spots.memory_usage(deep=True).sum() / 1024**2
    optimized_memory = df_spots_optimized.memory_usage(deep=True).sum() / 1024**2
    savings = original_memory - optimized_memory
    percentage_saved = (savings / original_memory) * 100
    
    print(f"\n📊 Memory Usage Comparison:")
    print(f"Original:  {original_memory:.2f} MB")
    print(f"Optimized: {optimized_memory:.2f} MB")
    print(f"Savings:   {savings:.2f} MB ({percentage_saved:.1f}% reduction)")
    
    # Show optimized dtypes
    print(f"\n🎯 Optimized Data Types:")
    print(df_spots_optimized.dtypes)
    
else:
    print("⚠️ No data available for optimization!")

### Memory Usage Analysis

Let's analyze the memory impact and see specific column optimizations.

In [None]:
if data is not None and 'spots_data' in data:
    # Create a detailed comparison
    print("📈 Detailed Memory Comparison by Column:")
    print("-" * 60)
    
    original_usage = df_spots.memory_usage(deep=True)
    optimized_usage = df_spots_optimized.memory_usage(deep=True)
    
    print(f"{'Column':<20} {'Original':<12} {'Optimized':<12} {'Savings':<10} {'%':<6}")
    print("-" * 60)
    
    total_original = 0
    total_optimized = 0
    
    for col in df_spots.columns:
        orig_mem = original_usage[col] / 1024  # KB
        opt_mem = optimized_usage[col] / 1024   # KB
        savings = orig_mem - opt_mem
        pct_savings = (savings / orig_mem * 100) if orig_mem > 0 else 0
        
        total_original += orig_mem
        total_optimized += opt_mem
        
        print(f"{col:<20} {orig_mem:>8.1f} KB {opt_mem:>8.1f} KB {savings:>6.1f} KB {pct_savings:>4.1f}%")
    
    print("-" * 60)
    total_savings = total_original - total_optimized
    total_pct = (total_savings / total_original * 100) if total_original > 0 else 0
    print(f"{'TOTAL':<20} {total_original:>8.1f} KB {total_optimized:>8.1f} KB {total_savings:>6.1f} KB {total_pct:>4.1f}%")
    
    # Show biggest memory savers
    print(f"\n🏆 Biggest Memory Savers:")
    savings_by_col = {}
    for col in df_spots.columns:
        orig_mem = original_usage[col]
        opt_mem = optimized_usage[col]
        savings_by_col[col] = orig_mem - opt_mem
    
    top_savers = sorted(savings_by_col.items(), key=lambda x: x[1], reverse=True)[:5]
    for col, savings in top_savers:
        if savings > 0:
            print(f"  {col}: {savings/1024:.1f} KB saved")
    
    # Update our working dataframe to the optimized version
    df_spots = df_spots_optimized
    print(f"\n✅ DataFrame updated to optimized version!")
    
else:
    print("⚠️ No data available for detailed analysis!")

In [9]:
if data is not None:
    # Extract channel pairs
    channel_pairs = data.get('channel_pairs', [])
    print(f"📊 Available channel pairs: {channel_pairs}")
    
    # Convert spots data to DataFrame
    spots_data = data.get('spots_data', [])
    if spots_data:
        df_spots = pd.DataFrame(spots_data)
        print(f"\n📋 Spots DataFrame shape: {df_spots.shape}")
        print(f"Columns: {list(df_spots.columns)}")
        print(f"\nFirst few rows:")
        display(df_spots.head())
    
    # Extract spot details
    spot_details = data.get('spot_details', {})
    print(f"\n🎯 Spot details available for {len(spot_details)} spots")
    if spot_details:
        # Show example spot detail
        sample_spot_id = list(spot_details.keys())[0]
        print(f"Example spot {sample_spot_id}: {spot_details[sample_spot_id]}")
    
    # Extract S3 paths
    fused_s3_paths = data.get('fused_s3_paths', [])
    print(f"\n💾 Fused S3 paths ({len(fused_s3_paths)} files):")
    for path in fused_s3_paths[:3]:  # Show first 3
        print(f"  {path}")
    if len(fused_s3_paths) > 3:
        print(f"  ... and {len(fused_s3_paths) - 3} more")
    
    # Check for optional data
    if 'ratios' in data:
        ratios = np.array(data['ratios'])
        print(f"\n🧮 Ratios matrix shape: {ratios.shape}")
    
    if 'summary_stats' in data:
        summary_stats = data['summary_stats']
        print(f"\n📈 Summary stats: {len(summary_stats)} records")
else:
    print("⚠️ No data to extract!")

📊 Available channel pairs: [['488', '514'], ['488', '561'], ['488', '594'], ['488', '638'], ['514', '561'], ['514', '594'], ['514', '638'], ['561', '594'], ['561', '638'], ['594', '638']]

📋 Spots DataFrame shape: (5000, 12)
Columns: ['spot_id', 'chan', 'r', 'dist', 'unmixed_chan', 'reassigned', 'unmixed_removed', 'chan_488_intensity', 'chan_514_intensity', 'chan_561_intensity', 'chan_594_intensity', 'chan_638_intensity']

First few rows:


Unnamed: 0,spot_id,chan,r,dist,unmixed_chan,reassigned,unmixed_removed,chan_488_intensity,chan_514_intensity,chan_561_intensity,chan_594_intensity,chan_638_intensity
0,7966486,638,0.659797,0.55472,638,False,False,24.28455,4.731705,15.53658,7.6748,495.6748
1,5243252,561,0.804026,0.573335,561,False,False,12.07317,7.17073,342.0169,39.49594,19.414635
2,7691947,638,0.540704,0.6764,638,False,False,8.25203,5.17886,49.23578,14.373985,415.8231
3,4108598,514,0.907056,0.289419,488,True,False,215.69172,241.91058,14.23578,51.88618,12.86992
4,4060257,514,0.874865,0.155913,488,True,False,207.57724,195.63416,13.26016,48.47154,15.861786



🎯 Spot details available for 5000 spots
Example spot 7966486: {'cell_id': 41951, 'round': '4', 'z': 1150, 'y': 5895, 'x': 2717}

💾 Fused S3 paths (5 files):
  s3://aind-open-data/HCR_749315_2025-05-08_14-00-00_processed_2025-05-17_22-15-31/image_tile_fusing/fused/channel_488.zarr
  s3://aind-open-data/HCR_749315_2025-05-08_14-00-00_processed_2025-05-17_22-15-31/image_tile_fusing/fused/channel_514.zarr
  s3://aind-open-data/HCR_749315_2025-05-08_14-00-00_processed_2025-05-17_22-15-31/image_tile_fusing/fused/channel_561.zarr
  ... and 2 more

🧮 Ratios matrix shape: (5, 5)

📈 Summary stats: 5 records


## 6. Visualize Sample Data

Create basic visualizations to explore the spot data and understand the channel relationships.

In [None]:
if data is not None and 'spots_data' in data:
    # Create scatter plot for first channel pair
    if len(channel_pairs) > 0 and len(df_spots) > 0:
        # Get first channel pair
        x_chan, y_chan = channel_pairs[0]
        x_col = f'chan_{x_chan}_intensity'
        y_col = f'chan_{y_chan}_intensity'
        
        # Create figure with subplots
        fig, axes = plt.subplots(2, 2, figsize=(12, 10))
        fig.suptitle(f'SeeSpot Data Analysis: {x_chan} vs {y_chan} Channels', fontsize=16)
        
        # 1. Scatter plot of channel intensities
        colors = ['red' if x else 'blue' for x in df_spots.get('reassigned', [False] * len(df_spots))]
        axes[0,0].scatter(df_spots[x_col], df_spots[y_col], c=colors, alpha=0.6, s=20)
        axes[0,0].set_xlabel(f'{x_chan} Intensity')
        axes[0,0].set_ylabel(f'{y_chan} Intensity')
        axes[0,0].set_title('Channel Intensities (Red=Reassigned, Blue=Original)')
        axes[0,0].grid(True, alpha=0.3)
        
        # 2. Histogram of R values
        axes[0,1].hist(df_spots['r'], bins=30, alpha=0.7, color='green', edgecolor='black')
        axes[0,1].set_xlabel('R Value')
        axes[0,1].set_ylabel('Frequency')
        axes[0,1].set_title('Distribution of R Values')
        axes[0,1].grid(True, alpha=0.3)
        
        # 3. Histogram of distance values
        axes[1,0].hist(df_spots['dist'], bins=30, alpha=0.7, color='orange', edgecolor='black')
        axes[1,0].set_xlabel('Distance')
        axes[1,0].set_ylabel('Frequency')
        axes[1,0].set_title('Distribution of Distance Values')
        axes[1,0].grid(True, alpha=0.3)
        
        # 4. Channel distribution
        chan_counts = df_spots['chan'].value_counts()
        axes[1,1].bar(chan_counts.index, chan_counts.values, alpha=0.7, color='purple')
        axes[1,1].set_xlabel('Channel')
        axes[1,1].set_ylabel('Count')
        axes[1,1].set_title('Original Channel Distribution')
        axes[1,1].tick_params(axis='x', rotation=45)
        
        plt.tight_layout()
        plt.show()
        
        # Print summary statistics
        print("📊 Summary Statistics:")
        print(f"Total spots: {len(df_spots)}")
        print(f"Reassigned spots: {df_spots.get('reassigned', []).sum() if 'reassigned' in df_spots.columns else 'N/A'}")
        print(f"Average {x_chan} intensity: {df_spots[x_col].mean():.2f}")
        print(f"Average {y_chan} intensity: {df_spots[y_col].mean():.2f}")
        print(f"Average R value: {df_spots['r'].mean():.3f}")
        print(f"Average distance: {df_spots['dist'].mean():.3f}")
        
else:
    print("⚠️ No data available for visualization!")

## Next Steps

You now have a working example of how to query the SeeSpot API! Here are some suggestions for further exploration:

### Additional API Endpoints
- **`/api/datasets`**: List available datasets
- **`/api/datasets/set-active`**: Switch to a different dataset  
- **`/api/create-neuroglancer-link`**: Generate neuroglancer visualization links

### Data Analysis Ideas
- **Channel comparison**: Analyze intensity relationships across all channel pairs
- **Reassignment patterns**: Study which spots get reassigned and why
- **Spatial analysis**: Use spot coordinates for spatial clustering
- **Quality metrics**: Explore the relationship between R values and data quality

### Performance Tips
- Use `sample_size` parameter to control response size
- Set `force_refresh=False` to leverage server-side caching
- Consider pagination for very large datasets