# Analyzing Heusser et al. (2018) Feature-Rich Free Recall Data

This tutorial demonstrates analyzing the Feature-Rich Free Recall (FRFR) dataset from Heusser et al. (2018), which investigates how different word features affect memory organization during free recall.

The dataset contains **452 subjects** across **11 experimental conditions**, each varying which word features were made salient during encoding. Each subject studied **16 lists of 16 words**.

**Experimental conditions:**
- **Feature rich**: All features varied (color, location, category, size, etc.)
- **Category**: Only category information varied
- **Color**: Only color information varied
- **Word length**: Only word length varied
- **First letter**: Only first letter varied
- **Location**: Only spatial location varied
- **Size**: Only semantic size varied
- **Adaptive**: Features adapted based on participant performance
- **Reduced**: Minimal feature variation
- **Reduced early**: Reduced features in early lists (1-8)
- **Reduced late**: Reduced features in late lists (9-16)

We'll analyze recall performance using:
1. Probability of First Recall (PFR)
2. Lag-CRP (conditional recall probability by temporal lag)
3. Serial Position Curve (SPC)
4. Memory Fingerprint (clustering by multiple features)

**Reference:**
Heusser, A.C., Fitzpatrick, P.C., & Manning, J.R. (2018). How is experience transformed into memory? *bioRxiv*. https://doi.org/10.1101/409987

In [None]:
from collections import Counter

import quail
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Suppress RuntimeWarnings about empty slices
warnings.filterwarnings('ignore', category=RuntimeWarning)

# Create viridis palette for distinguishing 11 conditions
viridis_palette = sns.color_palette("viridis", n_colors=11)

## Load the dataset

The FRFR dataset is included with quail and can be loaded using `load_example_data()`.

In [None]:
# Load the FRFR dataset
egg = quail.load_example_data('frfr')

print(f"Loaded FRFR data: {egg.n_subjects} subjects, {egg.n_lists} lists, "
      f"{egg.list_length} items per list")

## Explore the data structure

Each item in the FRFR dataset has multiple features that can be used for fingerprint analysis.

In [None]:
# Look at the features available for the first item
first_item = egg.pres.iloc[0][0]
print("Available features:")
for key, value in first_item.items():
    print(f"  {key}: {value}")

## Set up subject groupings

Since each subject belongs to a single experimental condition, we create a `subjgroup` list that maps each subject to their condition. This allows us to plot separate curves for each condition.

In [None]:
# Build subjgroup: map each subject to its experimental condition
subjgroup = []
for subj_idx in range(egg.n_subjects):
    try:
        sample = egg.pres.loc[(subj_idx, 0)][0]
        if sample and 'Condition' in sample:
            subjgroup.append(sample['Condition'])
        else:
            subjgroup.append('Unknown')
    except (KeyError, IndexError, TypeError):
        subjgroup.append('Unknown')

# Define condition order for consistent plotting
condition_order = [
    'Feature rich',
    'Reduced early',
    'Reduced late',
    'Reduced',
    'Adaptive',
    'Category',
    'Size',
    'Color',
    'Location',
    'Word length',
    'First letter'
]

# Count subjects per condition
condition_counts = Counter(subjgroup)
print("Subjects per condition:")
for cond in condition_order:
    if cond in condition_counts:
        print(f"  {cond}: {condition_counts[cond]}")

## Split data into early and late lists

The experiment design includes "early" (lists 1-8) and "late" (lists 9-16) phases. Some conditions (Reduced early, Reduced late) specifically manipulate features in these phases, so we'll analyze them separately.

In [None]:
# Split egg into early (lists 0-7) and late (lists 8-15) lists
egg_early = egg.crack(lists=list(range(8)))
egg_late = egg.crack(lists=list(range(8, 16)))

print(f"Early lists: {egg_early.n_lists} lists per subject")
print(f"Late lists: {egg_late.n_lists} lists per subject")

# Create listgroup for averaging across lists within each split
listgroup_early = ['average'] * egg_early.n_lists
listgroup_late = ['average'] * egg_late.n_lists

## Figure 1: PFR, Lag-CRP, and SPC by condition

We'll create a 2x3 figure showing the three basic analyses for early (top row) and late (bottom row) lists.

In [None]:
fig1, axes1 = plt.subplots(2, 3, figsize=(18, 10), sharey='col')

# --- Top row: Early lists ---

# PFR - Early
pfr_early = egg_early.analyze('pfr', listgroup=listgroup_early)
pfr_early.plot(ax=axes1[0, 0], subjgroup=subjgroup, plot_type='subject', legend=True,
               hue_order=condition_order, palette=viridis_palette)
axes1[0, 0].set_title('Probability of First Recall (Early Lists)')
axes1[0, 0].set_xlabel('Serial Position')
axes1[0, 0].set_ylabel('Probability')
axes1[0, 0].set_ylim([0, 0.3])
axes1[0, 0].legend(loc='upper right', fontsize=6, ncol=2)

# Lag-CRP - Early
lagcrp_early = egg_early.analyze('lagcrp', listgroup=listgroup_early)
lagcrp_early.plot(ax=axes1[0, 1], subjgroup=subjgroup, plot_type='subject', legend=False,
                  hue_order=condition_order, palette=viridis_palette)
axes1[0, 1].set_title('Lag-CRP (Early Lists)')
axes1[0, 1].set_xlabel('Lag')
axes1[0, 1].set_ylabel('Conditional Recall Probability')
axes1[0, 1].set_xlim([-10, 10])
axes1[0, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# SPC - Early
spc_early = egg_early.analyze('spc', listgroup=listgroup_early)
spc_early.plot(ax=axes1[0, 2], subjgroup=subjgroup, plot_type='subject', legend=False,
               hue_order=condition_order, palette=viridis_palette)
axes1[0, 2].set_title('Serial Position Curve (Early Lists)')
axes1[0, 2].set_xlabel('Serial Position')
axes1[0, 2].set_ylabel('Recall Probability')
axes1[0, 2].set_ylim([0, 1])

# --- Bottom row: Late lists ---

# PFR - Late
pfr_late = egg_late.analyze('pfr', listgroup=listgroup_late)
pfr_late.plot(ax=axes1[1, 0], subjgroup=subjgroup, plot_type='subject', legend=False,
              hue_order=condition_order, palette=viridis_palette)
axes1[1, 0].set_title('Probability of First Recall (Late Lists)')
axes1[1, 0].set_xlabel('Serial Position')
axes1[1, 0].set_ylabel('Probability')
axes1[1, 0].set_ylim([0, 0.3])

# Lag-CRP - Late
lagcrp_late = egg_late.analyze('lagcrp', listgroup=listgroup_late)
lagcrp_late.plot(ax=axes1[1, 1], subjgroup=subjgroup, plot_type='subject', legend=False,
                 hue_order=condition_order, palette=viridis_palette)
axes1[1, 1].set_title('Lag-CRP (Late Lists)')
axes1[1, 1].set_xlabel('Lag')
axes1[1, 1].set_ylabel('Conditional Recall Probability')
axes1[1, 1].set_xlim([-10, 10])
axes1[1, 1].axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# SPC - Late
spc_late = egg_late.analyze('spc', listgroup=listgroup_late)
spc_late.plot(ax=axes1[1, 2], subjgroup=subjgroup, plot_type='subject', legend=False,
              hue_order=condition_order, palette=viridis_palette)
axes1[1, 2].set_title('Serial Position Curve (Late Lists)')
axes1[1, 2].set_xlabel('Serial Position')
axes1[1, 2].set_ylabel('Recall Probability')
axes1[1, 2].set_ylim([0, 1])

plt.tight_layout()
fig1.suptitle('FRFR Dataset: Recall Analyses by Condition (Early vs Late Lists)',
              y=1.02, fontsize=14)
plt.show()

## Figure 2: Memory Fingerprints

The memory fingerprint analysis reveals how subjects organize their recall based on different stimulus features. Each condition should show different clustering patterns based on which features were made salient.

In [None]:
# Stacked layout: 2 rows, 1 column
fig2, axes2 = plt.subplots(2, 1, figsize=(8, 8), sharey=True)

# Features for fingerprint analysis
fingerprint_features = ['Category', 'Size', 'Color', 'Location',
                        'Word length', 'First letter', 'Temporal']

# Fingerprint - Early lists (top panel)
fp_early = egg_early.analyze('fingerprint', features=fingerprint_features,
                             listgroup=listgroup_early)
fp_early.plot(ax=axes2[0], subjgroup=subjgroup, plot_type='subject',
              plot_style='bar', ylim=[0.5, 0.81], legend=False,
              hue_order=condition_order, palette=viridis_palette)
axes2[0].set_title('Memory fingerprints\nEarly lists', fontsize=14)
axes2[0].set_xlabel('')  # Remove x-label on top panel
axes2[0].set_ylabel('Clustering score', fontsize=14)
axes2[0].tick_params(axis='x', labelbottom=False)  # Hide x tick labels on top

# Fingerprint - Late lists (bottom panel)
fp_late = egg_late.analyze('fingerprint', features=fingerprint_features,
                           listgroup=listgroup_late)
fp_late.plot(ax=axes2[1], subjgroup=subjgroup, plot_type='subject',
             plot_style='bar', ylim=[0.5, 0.81], legend=True,
             hue_order=condition_order, palette=viridis_palette)
axes2[1].set_title('Late lists', fontsize=14)
axes2[1].set_xlabel('Feature', fontsize=14)
axes2[1].set_ylabel('Clustering score', fontsize=14)
# Move legend to upper right of bottom panel
axes2[1].legend(loc='upper right', fontsize=6, ncol=3, title='Condition')

plt.tight_layout()
plt.show()

## Key findings

The FRFR dataset reveals several important findings about memory organization:

1. **Feature-specific clustering**: Conditions where a specific feature was made salient (e.g., Category, Color, Size) show elevated clustering scores for that feature compared to other conditions.

2. **Temporal contiguity**: All conditions show above-chance temporal clustering, indicating that temporal proximity during encoding influences recall organization.

3. **Early vs Late effects**: The Reduced early and Reduced late conditions show different fingerprint patterns depending on which lists are analyzed, demonstrating that feature salience affects memory organization.

4. **Feature rich condition**: When all features vary (Feature rich condition), subjects show moderate clustering across multiple dimensions rather than strong clustering on any single feature.

5. **Serial position effects**: All conditions show classic primacy and recency effects in the SPC, with slight variations based on encoding condition.

These results demonstrate that memory organization is flexible and shaped by the features that are made salient during encoding.