# Behavioral EDA Comparison: Yasmin vs Fiona

This notebook creates side-by-side comparisons of key behavioral analyses between Yasmin and Fiona using the `BehavioralEDA` class.

We'll focus on three key plots:
1. **Signal Delay Performance** - Stop and Continue performance (Figure 1b replication)
2. **RT Scatter Plots** - Session mean RT comparisons
3. **RT Distributions** - Continue and error stop RT distributions (Figure 1d replication)

## Setup and Initialization

In [1]:
# Import required libraries
from behavioral_eda_class import BehavioralEDA
from pathlib import Path
import holoviews as hv
from holoviews import opts

# Enable Jupyter notebook display
from bokeh.io import output_notebook
output_notebook()
hv.extension('bokeh')

In [2]:
# Memory-efficient approach: Process one monkey at a time
# This avoids loading 25GB+ of data simultaneously

# Define processing function
def process_monkey_data(monkey_name):
    """Process a single monkey's data and return summary + plots"""
    try:
        base_path = Path.cwd().parent / 'data' / f'{monkey_name}_sst'
        filepath = base_path.parent / 'csst_trials_pkls' / f'all_{monkey_name}_CSST_trials_df.pkl'
        
        print(f"Loading data for {monkey_name.title()}...")
        print(f"Path: {filepath}")
        print(f"File exists: {filepath.exists()}")
        
        if not filepath.exists():
            print(f"❌ Data file not found for {monkey_name.title()}")
            return None
        
        # Create EDA instance
        eda = BehavioralEDA(str(filepath))
        print(f"✓ Successfully loaded {monkey_name.title()}'s data")
        
        # Extract all needed data and plots
        results = {
            'basic_summary': eda.get_basic_summary(),
            'signal_delay_plot': eda.plot_signal_delay_performance(),
            'signal_delay_data': eda.get_signal_delay_performance_data(),
            'rt_scatter_plot': eda.plot_rt_scatter(),
            'rt_scatter_data': eda.get_rt_scatter_data(),
            'rt_distribution_plot': eda.plot_rt_distributions(),
            'rt_distribution_data': eda.get_rt_distribution_data()
        }
        
        print(f"✓ Extracted all plots and data for {monkey_name.title()}")
        
        # Explicitly delete the EDA instance to free memory
        del eda
        print(f"✓ Freed memory for {monkey_name.title()}")
        
        return results
        
    except Exception as e:
        print(f"❌ Error processing {monkey_name.title()}'s data: {e}")
        return None

# Process monkeys sequentially
monkeys = ['yasmin', 'fiona']
monkey_results = {}

print("Processing monkeys sequentially to minimize memory usage...")
print("="*60)

for monkey in monkeys:
    print(f"\n{'='*20} PROCESSING {monkey.upper()} {'='*20}")
    result = process_monkey_data(monkey)
    if result:
        monkey_results[monkey] = result
        print(f"✓ {monkey.title()} processing complete")
    print()

print(f"Successfully processed data for: {list(monkey_results.keys())}")
print("Ready for analysis and plotting!")

Processing monkeys sequentially to minimize memory usage...

Loading data for Yasmin...
Path: /home/barak/Projects/population_analysis/data/csst_trials_pkls/all_yasmin_CSST_trials_df.pkl
File exists: True






Loaded data for yasmin
Total trials: 123,178
Date range: ya230501 to ya230904
✓ Reaction time data available, will add derived columns as needed
✓ Successfully loaded Yasmin's data
Processing reaction times and adding to original DataFrame...
✓ Using existing reaction_time column
✓ Reaction time processing completed and added to original DataFrame
1.0 :  24
2.0 :  96
3.0 :  168
4.0 :  264
1.0 :  24
2.0 :  96
3.0 :  168
4.0 :  264
1.0 :  24
2.0 :  96
3.0 :  168
4.0 :  264
✓ Extracted all plots and data for Yasmin
✓ Freed memory for Yasmin
✓ Yasmin processing complete


Loading data for Fiona...
Path: /home/barak/Projects/population_analysis/data/csst_trials_pkls/all_fiona_CSST_trials_df.pkl
File exists: True
✓ Extracted all plots and data for Yasmin
✓ Freed memory for Yasmin
✓ Yasmin processing complete


Loading data for Fiona...
Path: /home/barak/Projects/population_analysis/data/csst_trials_pkls/all_fiona_CSST_trials_df.pkl
File exists: True






Loaded data for fiona
Total trials: 110,358
Date range: fi210628 to fi211125
✓ Reaction time data available, will add derived columns as needed
✓ Successfully loaded Fiona's data
Processing reaction times and adding to original DataFrame...
✓ Using existing reaction_time column
✓ Reaction time processing completed and added to original DataFrame
1.0 :  48
2.0 :  108
3.0 :  168
4.0 :  228
1.0 :  48
2.0 :  108
3.0 :  168
4.0 :  228
✓ Extracted all plots and data for Fiona
✓ Freed memory for Fiona
✓ Fiona processing complete

Successfully processed data for: ['yasmin', 'fiona']
Ready for analysis and plotting!
✓ Extracted all plots and data for Fiona
✓ Freed memory for Fiona
✓ Fiona processing complete

Successfully processed data for: ['yasmin', 'fiona']
Ready for analysis and plotting!


## Basic Summary Comparison

In [3]:
# Print basic summaries for both monkeys
for monkey, results in monkey_results.items():
    if results and 'basic_summary' in results:
        print(f"{'='*60}")
        print(f"BASIC SUMMARY - {monkey.upper()}")
        print(f"{'='*60}")
        
        basic_summary = results['basic_summary']
        print(f"Total trials: {basic_summary['total_trials']:,}")
        print(f"Overall success rate: {basic_summary['overall_success_rate']:.1f}%")
        print("Trial types:")
        for trial_type, count in basic_summary['trial_types'].items():
            print(f"  {trial_type}: {count:,}")
        print()
    else:
        print(f"❌ No data available for {monkey.title()}")

BASIC SUMMARY - YASMIN
Total trials: 123,178
Overall success rate: 82.6%
Trial types:
  GO: 69,104
  CONT: 28,416
  STOP: 25,658

BASIC SUMMARY - FIONA
Total trials: 110,358
Overall success rate: 84.6%
Trial types:
  GO: 61,460
  CONT: 25,294
  STOP: 23,604



## 1. Signal Delay Performance Comparison

This replicates Figure 1b from the original paper, showing stop error rates and continue success rates as a function of signal delay.

In [4]:
# Signal delay performance plots are already created and stored
# Just extract them from our results
signal_delay_plots = {}

for monkey, results in monkey_results.items():
    if results and 'signal_delay_plot' in results:
        signal_delay_plots[monkey] = results['signal_delay_plot']
        print(f"✓ Signal delay plot available for {monkey.title()}")
    else:
        print(f"❌ No signal delay plot available for {monkey.title()}")

print(f"\nReady to display {len(signal_delay_plots)} signal delay plots")

✓ Signal delay plot available for Yasmin
✓ Signal delay plot available for Fiona

Ready to display 2 signal delay plots


In [5]:
# Display Yasmin's signal delay performance
if 'yasmin' in signal_delay_plots:
    print("YASMIN - Signal Delay Performance (Figure 1b)")
    display(signal_delay_plots['yasmin'])
else:
    print("❌ Yasmin's signal delay plot not available")

YASMIN - Signal Delay Performance (Figure 1b)


In [6]:
# Display Fiona's signal delay performance
if 'fiona' in signal_delay_plots:
    print("FIONA - Signal Delay Performance (Figure 1b)")
    display(signal_delay_plots['fiona'])
else:
    print("❌ Fiona's signal delay plot not available")

FIONA - Signal Delay Performance (Figure 1b)


In [None]:
# print(signal_delay_plots['yasmin'])
(signal_delay_plots['yasmin'].opts(
    opts.Curve(line_dash='dashed')
) * signal_delay_plots['fiona']).opts(
    legend_position='bottom_right',
    title='Stop and continue performance',
    xlabel='Stop/Continue signal delay (ms)',
    ylabel='Presentage of saccades (%)',
)

### Signal Delay Performance Analysis

Key patterns to look for:
- **Error stop rates should INCREASE** with longer signal delays (race model prediction)
- **Continue success rates should remain relatively STABLE** across different delays
- Compare the slopes and overall performance levels between the two monkeys

In [8]:
# # Get the underlying data for comparison
# print("Signal Delay Performance Data Comparison:")
# print("="*50)

# for monkey, results in monkey_results.items():
#     if results and 'signal_delay_data' in results:
#         stop_perf, cont_perf = results['signal_delay_data']
        
#         print(f"\n{monkey.upper()} - Stop Performance:")
#         print(stop_perf[['ssd_len', 'error_percentage', 'total_trials']])
        
#         print(f"\n{monkey.upper()} - Continue Performance:")
#         print(cont_perf[['ssd_len', 'correct_percentage', 'total_trials']])
#     else:
#         print(f"❌ No signal delay data available for {monkey.title()}")

## 2. RT Scatter Plot Comparison

These plots compare session mean reaction times across different trial types, showing consistency and relationships between GO, Continue, and Error Stop RTs.

In [9]:
# RT scatter plots are already created and stored
# Just extract them from our results
rt_scatter_plots = {}

for monkey, results in monkey_results.items():
    if results and 'rt_scatter_plot' in results:
        rt_scatter_plots[monkey] = results['rt_scatter_plot']
        print(f"✓ RT scatter plot available for {monkey.title()}")
    else:
        print(f"❌ No RT scatter plot available for {monkey.title()}")

print(f"\nReady to display {len(rt_scatter_plots)} RT scatter plots")

✓ RT scatter plot available for Yasmin
✓ RT scatter plot available for Fiona

Ready to display 2 RT scatter plots


In [10]:
# Display Yasmin's RT scatter plot
if 'yasmin' in rt_scatter_plots:
    print("YASMIN - Session Mean RT Scatter Plot")
    display(rt_scatter_plots['yasmin'])
else:
    print("❌ Yasmin's RT scatter plot not available")

YASMIN - Session Mean RT Scatter Plot


In [11]:
# Display Fiona's RT scatter plot
if 'fiona' in rt_scatter_plots:
    print("FIONA - Session Mean RT Scatter Plot")
    display(rt_scatter_plots['fiona'])
else:
    print("❌ Fiona's RT scatter plot not available")

FIONA - Session Mean RT Scatter Plot


In [25]:
print(rt_scatter_plots['fiona'])  #* rt_scatter_plots['yasmin']

(rt_scatter_plots['yasmin'].opts(
    opts.Scatter('Scatter.Continue_continue_RT_yasmin', color='blue'),
    opts.Scatter('Scatter.Error_stop_RT_yasmin', color='red'),
) * rt_scatter_plots['fiona']).opts(
    legend_position='bottom_right'
)


:Overlay
   .Scatter.Continue_continue_RT_fiona :Scatter   [GO_RT]   (Continue_RT)
   .Scatter.Error_stop_RT_fiona        :Scatter   [GO_RT]   (Error_Stop_RT)
   .Curve.I                            :Curve   [x]   (y)


### RT Scatter Analysis

Key patterns to examine:
- **Diagonal line** represents equal RTs between conditions
- **Continue RTs** (purple) vs GO RTs: Should be similar (points near diagonal)
- **Error Stop RTs** (green) vs GO RTs: May be faster (race model prediction)
- **Session consistency**: Tight clustering indicates consistent performance

In [13]:
rt_scatter_plots['yasmin'] * rt_scatter_plots['fiona']

In [14]:
# Get RT scatter data for statistical comparison
print("RT Scatter Data Comparison:")
print("="*40)

for monkey, results in monkey_results.items():
    if results and 'rt_scatter_data' in results:
        rt_data = results['rt_scatter_data']
        
        print(f"\n{monkey.upper()} - RT Summary by Type:")
        rt_summary = rt_data.groupby('rt_type')['mean_rt'].agg(['count', 'mean', 'std']).round(1)
        print(rt_summary)
        
        # Calculate correlations between RT types
        rt_pivot = rt_data.pivot(index='trial_session', columns='rt_type', values='mean_rt')
        if len(rt_pivot.columns) > 1:
            print(f"\n{monkey.upper()} - RT Correlations:")
            correlations = rt_pivot.corr().round(3)
            print(correlations)
    else:
        print(f"❌ No RT scatter data available for {monkey.title()}")

RT Scatter Data Comparison:

YASMIN - RT Summary by Type:
               count   mean   std
rt_type                          
Continue_RT       57  214.9  25.6
Error_Stop_RT     57  151.6  19.6
GO_RT             57  198.6  25.7

YASMIN - RT Correlations:
rt_type        Continue_RT  Error_Stop_RT  GO_RT
rt_type                                         
Continue_RT          1.000          0.775  0.960
Error_Stop_RT        0.775          1.000  0.782
GO_RT                0.960          0.782  1.000

FIONA - RT Summary by Type:
               count   mean   std
rt_type                          
Continue_RT       88  254.0  30.2
Error_Stop_RT     87  181.1  16.7
GO_RT             88  208.3  18.0

FIONA - RT Correlations:
rt_type        Continue_RT  Error_Stop_RT  GO_RT
rt_type                                         
Continue_RT          1.000          0.602  0.776
Error_Stop_RT        0.602          1.000  0.559
GO_RT                0.776          0.559  1.000


## 3. RT Distribution Comparison

These plots replicate Figure 1d, showing the distribution of reaction times for successful continue trials and failed stop trials across different signal delays.

In [15]:
# RT distribution plots are already created and stored
# Just extract them from our results
rt_dist_plots = {}

for monkey, results in monkey_results.items():
    if results and 'rt_distribution_plot' in results:
        rt_dist_plots[monkey] = results['rt_distribution_plot']
        print(f"✓ RT distribution plot available for {monkey.title()}")
    else:
        print(f"❌ No RT distribution plot available for {monkey.title()}")

print(f"\nReady to display {len(rt_dist_plots)} RT distribution plots")

✓ RT distribution plot available for Yasmin
✓ RT distribution plot available for Fiona

Ready to display 2 RT distribution plots


In [16]:
# Display Yasmin's RT distribution plot
if 'yasmin' in rt_dist_plots:
    print("YASMIN - RT Distributions (Figure 1d)")
    display(rt_dist_plots['yasmin'])
else:
    print("❌ Yasmin's RT distribution plot not available")

YASMIN - RT Distributions (Figure 1d)


In [17]:
# Display Fiona's RT distribution plot
if 'fiona' in rt_dist_plots:
    print("FIONA - RT Distributions (Figure 1d)")
    display(rt_dist_plots['fiona'])
else:
    print("❌ Fiona's RT distribution plot not available")

FIONA - RT Distributions (Figure 1d)


### RT Distribution Analysis

Key patterns in Figure 1d replication:
- **Continue trials** (dashed lines): Should show consistent RT distributions across different CSDs
- **Error stop trials** (solid lines): RT distributions for failed stop trials across different SSDs
- **Overlap analysis**: The amount of overlap between continue and error stop distributions informs the race model
- **Peak positions**: Earlier peaks in error stop RTs suggest faster "escape" responses

In [18]:
# Get RT distribution data for detailed comparison
print("RT Distribution Data Summary:")
print("="*40)

for monkey, results in monkey_results.items():
    if results and 'rt_distribution_data' in results:
        cont_dist, stop_dist = results['rt_distribution_data']
        
        print(f"\n{monkey.upper()} - Continue RT Distribution:")
        print(f"  Total data points: {len(cont_dist):,}")
        print(f"  RT range: {cont_dist['Reaction Time Bin'].min():.0f} - {cont_dist['Reaction Time Bin'].max():.0f} ms")
        print(f"  SSD conditions: {sorted(cont_dist['SSD Number'].unique())}")
        
        print(f"\n{monkey.upper()} - Stop RT Distribution:")
        print(f"  Total data points: {len(stop_dist):,}")
        print(f"  RT range: {stop_dist['Reaction Time Bin'].min():.0f} - {stop_dist['Reaction Time Bin'].max():.0f} ms")
        print(f"  SSD conditions: {sorted(stop_dist['SSD Number'].unique())}")
        
        # Peak RT analysis
        if len(cont_dist) > 0:
            peak_rt_cont = cont_dist.loc[cont_dist['percentage'].idxmax(), 'Reaction Time Bin']
            print(f"  Continue RT peak: ~{peak_rt_cont:.0f} ms")
        
        if len(stop_dist) > 0:
            peak_rt_stop = stop_dist.loc[stop_dist['percentage'].idxmax(), 'Reaction Time Bin']
            print(f"  Error Stop RT peak: ~{peak_rt_stop:.0f} ms")
    else:
        print(f"❌ No RT distribution data available for {monkey.title()}")

RT Distribution Data Summary:

YASMIN - Continue RT Distribution:
  Total data points: 89
  RT range: 0 - 580 ms
  SSD conditions: ['CSD1', 'CSD2', 'CSD3', 'CSD4']

YASMIN - Stop RT Distribution:
  Total data points: 143
  RT range: 0 - 880 ms
  SSD conditions: ['SSD1', 'SSD2', 'SSD3', 'SSD4']
  Continue RT peak: ~240 ms
  Error Stop RT peak: ~100 ms

FIONA - Continue RT Distribution:
  Total data points: 108
  RT range: 0 - 700 ms
  SSD conditions: ['CSD1', 'CSD2', 'CSD3', 'CSD4']

FIONA - Stop RT Distribution:
  Total data points: 142
  RT range: 0 - 880 ms
  SSD conditions: ['SSD1', 'SSD2', 'SSD3', 'SSD4']
  Continue RT peak: ~280 ms
  Error Stop RT peak: ~240 ms


## Cross-Monkey Comparison Summary

In [19]:
# Comprehensive comparison summary
print("YASMIN vs FIONA - BEHAVIORAL COMPARISON SUMMARY")
print("="*60)

if len(monkey_results) == 2:
    yasmin_summary = monkey_results.get('yasmin', {}).get('basic_summary') if 'yasmin' in monkey_results else None
    fiona_summary = monkey_results.get('fiona', {}).get('basic_summary') if 'fiona' in monkey_results else None
    
    if yasmin_summary and fiona_summary:
        print("\n📊 DATASET SIZE COMPARISON:")
        print(f"  Yasmin: {yasmin_summary['total_trials']:,} trials")
        print(f"  Fiona:  {fiona_summary['total_trials']:,} trials")
        
        print("\n🎯 OVERALL SUCCESS RATE COMPARISON:")
        print(f"  Yasmin: {yasmin_summary['overall_success_rate']:.1f}%")
        print(f"  Fiona:  {fiona_summary['overall_success_rate']:.1f}%")
        
        print("\n📈 TRIAL TYPE DISTRIBUTION:")
        all_types = set(yasmin_summary['trial_types'].keys()) | set(fiona_summary['trial_types'].keys())
        for trial_type in sorted(all_types):
            y_count = yasmin_summary['trial_types'].get(trial_type, 0)
            f_count = fiona_summary['trial_types'].get(trial_type, 0)
            print(f"  {trial_type}:")
            print(f"    Yasmin: {y_count:,}")
            print(f"    Fiona:  {f_count:,}")
    
    print("\n🔍 KEY ANALYSES COMPLETED:")
    print("  ✓ Signal Delay Performance (Figure 1b replication)")
    print("  ✓ Session RT Scatter Analysis")
    print("  ✓ RT Distribution Analysis (Figure 1d replication)")
    
    print("\n💡 INTERPRETATION NOTES:")
    print("  • Compare error stop slopes between monkeys in signal delay plots")
    print("  • Look for RT consistency differences in scatter plots")
    print("  • Examine RT distribution peaks and overlaps for race model validation")
    print("  • Consider individual differences in overall performance levels")
    
    print(f"\n💾 MEMORY EFFICIENCY:")
    print("  • Sequential processing prevents simultaneous 25GB+ memory usage")
    print("  • Only plot objects and summary data retained in memory")
    print("  • Raw DataFrames freed after processing each monkey")

else:
    available_monkeys = list(monkey_results.keys())
    print(f"\n⚠️  Only {len(available_monkeys)} monkey(s) available for comparison: {available_monkeys}")
    print("   Check data file paths and availability for complete comparison.")

YASMIN vs FIONA - BEHAVIORAL COMPARISON SUMMARY

📊 DATASET SIZE COMPARISON:
  Yasmin: 123,178 trials
  Fiona:  110,358 trials

🎯 OVERALL SUCCESS RATE COMPARISON:
  Yasmin: 82.6%
  Fiona:  84.6%

📈 TRIAL TYPE DISTRIBUTION:
  CONT:
    Yasmin: 28,416
    Fiona:  25,294
  GO:
    Yasmin: 69,104
    Fiona:  61,460
  STOP:
    Yasmin: 25,658
    Fiona:  23,604

🔍 KEY ANALYSES COMPLETED:
  ✓ Signal Delay Performance (Figure 1b replication)
  ✓ Session RT Scatter Analysis
  ✓ RT Distribution Analysis (Figure 1d replication)

💡 INTERPRETATION NOTES:
  • Compare error stop slopes between monkeys in signal delay plots
  • Look for RT consistency differences in scatter plots
  • Examine RT distribution peaks and overlaps for race model validation
  • Consider individual differences in overall performance levels

💾 MEMORY EFFICIENCY:
  • Sequential processing prevents simultaneous 25GB+ memory usage
  • Only plot objects and summary data retained in memory
  • Raw DataFrames freed after processing

## Conclusion

This notebook provides a comprehensive comparison of key behavioral metrics between Yasmin and Fiona using the `BehavioralEDA` class.

### What We've Analyzed:

1. **Signal Delay Performance**: Race model validation through stop/continue performance curves
2. **RT Scatter Analysis**: Session-to-session consistency and RT relationships
3. **RT Distributions**: Detailed timing analysis supporting the race model framework

### Benefits of This Approach:

- **Standardized Analysis**: Both subjects analyzed with identical methods
- **Easy Comparison**: Side-by-side plots reveal individual differences
- **Comprehensive Coverage**: Key behavioral measures from the stop signal task
- **Extensible**: Easy to add more subjects or additional analyses

### Next Steps:

- **Statistical Testing**: Add formal statistical comparisons between subjects
- **Combined Plots**: Create overlay plots for direct visual comparison
- **Additional Metrics**: Include more behavioral measures (e.g., trial length, go cue timing)
- **Export Results**: Save comparison data for publication or further analysis