TWO-SET PROACTIVE INTERFERENCE EXPERIMENT Data Structure and Analysis Pipeline Documentation
Overview This document describes the structure of the raw data files and the complete analysis workflow used in the Two-Set Proactive Interference (PI) experiment. Each participant completed one training block and two experimental blocks. Depending on the participant number, the experimental blocks followed either an ABAB or ABBA sequence design. All data were recorded and processed in MATLAB (.mat format), and final statistical analyses were conducted in R.
Raw Data Files Each participant’s data is stored in a file named: TwoSet_PI_Subject_<#>_Result.mat where <#> indicates the participant number. If the participant number is odd, the first block was conducted under the ABAB design and the second block under the ABBA design.
Each .mat file contains the following variables:
saveTraining: Training block stimuli and responses.
saveResponses: First experimental block stimuli and responses.
saveResponses2: Second experimental block stimuli and responses.
Results: Concatenated version of all three variables above.
Raw Data Structure Regardless of whether the block was ABAB or ABBA, the data structure is always identical. Each variable (saveTraining, saveResponses, saveResponses2) is a 2 × 12 × 3 cell array. The first row contains the stimuli, and the second row contains the responses. Each cell holds either a 3 × 1 or 1 × 3 cell array, representing three words or numbers presented to or produced by the participant. Each page along the third dimension (1 to 3) corresponds to a mini-block.
The first six columns of each page contain the stimuli and responses for the first tested lists, and the remaining six columns correspond to the second tested lists. For example, for an odd-numbered subject, the first six columns of the first row in the first page contain six successive stimuli lists of type A, and the next six columns contain six stimuli lists of type B. The second row in the same structure contains the responses corresponding to each of those stimuli.
MATLAB Analysis Workflow All main analyses are implemented in Analysis_Script.m, with several helper functions described below.
Step 1. DimensionReduction This function reorganizes and cleans the original raw data structure. It takes saveTraining, saveResponses, and saveResponses2 as inputs and outputs three variables: Training, FirstBlock, and SecondBlock. The output variables have a 2 × 36 × 3 cell structure in which each column represents a single stimulus-response pair. Row 1 contains the stimulus, and Row 2 contains the participant’s response to that stimulus. This adjustment eliminates arbitrary nesting in the original experimental script and ensures that successive trials are stored in the correct order.
Step 2. DataTable This function takes FirstBlock, SecondBlock, and the participant ID as inputs and produces a long-format data table for that participant. Each row in the resulting table corresponds to a single trial, including participant ID, block number, mini-block number, list type (A or B), trial number, stimulus, and response. The tables for all participants are concatenated into one combined dataset.
Step 3. LevenshteinDistance This function (originally misspelled as LevenhsteinDistance) takes the combined long-format dataset as input and calculates response accuracy and response category based on the Levenshtein distance. It returns a final dataset called TwoSet_PI_Data, in which accuracy and category are added as new columns.
Step 4. Plotting and Extended Analyses Several analysis sections are included in Analysis_Script.m.
The “Mean Accuracy Across Trials” section computes and plots average recall accuracy across trials separately for ABAB and ABBA designs.
The “Mini-block Comparison” section averages performance within each mini-block to examine learning effects across repetitions.
The “Serial Recall Accuracy” section uses the SerialRecallCheck function to compute serial recall accuracy (SRA). This function evaluates whether participants recalled items in the correct order by calculating the Levenshtein distance between each response and its corresponding stimulus position. For responses previously scored as correct (accuracy = 1), the serial recall is counted as correct if the Levenshtein distance is less than or equal to 2 for words or equal to 0 for numbers. The output of this function is stored in TSPID_SRA (TwoSet_PI_Data_Serial_Recall_Accuracy).
Verification and Reliability Checks To confirm the accuracy of the analyses, two validation sections are included:
“Double-check for the reliability of serial recall adjustment” verifies that the SerialRecallCheck computations are correct.
“The Great Double-check” recomputes all averages using a direct method to confirm that previous analyses produce consistent results.
Performance Across Blocks The “Performance Analysis Over Blocks” section compares recall performance between the two experimental blocks, both by design (ABAB vs ABBA) and by block order.
R Analysis The final dataset, TwoSet_PI_Data, is analyzed in R using the script Linear_Statistical_Analyses.R. This script performs linear ANOVA and Bayesian analyses (Bayes Factors), with ID, List, and Trial defined as factors. It also generates publication-ready summary tables. Due to known display inconsistencies with the kableExtra package in RStudio, it is recommended to export tables directly to HTML or PDF for reliable visualization.
Outputs After the full analysis, the following outputs are produced:
TwoSet_PI_Long.mat: Combined long-format dataset of all participants.
TwoSet_PI_Data.mat: Final dataset including categories and accuracy scores.
TSPID_SRA.mat: Dataset containing serial recall accuracy scores.
Additional figure files showing mean accuracy, mini-block effects, and performance over blocks.
Contact For any questions about the data or analysis scripts, please contact: ycinceler@gmail.com