## Extended MFRM case study

This notebook contains the code to run both a global MFRM analysis (Linacre, 1994) and an extended (matrix) MFRM analysis (Elliott and Buttery, 2022) of a real-world case study, as presented in Elliott and Buttery (2025), in order to highlight the differences in inferences obtained from a global MFRM analysis and an extended MFRM analysis, and also the differences in inferences obtained by selecting a different anchoring frame of raters. The data is from a test of creativity involvin writing rater-scored metaphors about boredom and disgust, originally published in Sylivia & Beaty (2012) and later analysed using the standard (global) MFRM by Primi, Silvia, Jauk and Benedek (2019). The data set for the analyses is available for download at:

[http://www.labape.com.br/metaphor/df.xlsx]({http://www.labape.com.br/metaphor/df.xlsx)

**References**

Elliott, M., & Buttery, P. J. (2022). Extended rater representations in the many-facet Rasch model. *Journal of Applied Measurement*, *22*(1), 133–160.

Elliott, M., & Buttery, P. J. (2025). *Addressing non-uniform rater effects with extended many-facet Rasch models: A case study*. Paper to be presented at the Nordic Educational Research Association Conference 2025, Helsinki, Finland, March 5-7.

Linacre, J. M. (1994). *Many-Facet Rasch Measurement*. MESA Press.

Primi, R., Silvia, P. J., Jauk, E., & Benedek, M. (2019). Applying Many-Facet Rasch Modeling in the Assessment of Creativity. *Psychology of Aesthetics, Creativity, and the Arts*, *13*(2), 176–186.

Silvia, P. J., & Beaty, R. E. (2012). Making creative metaphors: The importance of fluid intelligence for
creative thought. *Intelligence*, *40*(4), 343–351.

Import the packages and set the working directory (here called `my_working_directory`) - you need to save the response file here before starting and will also save output files here.

In [None]:
import os
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import RaschPy as rp

# my_working_directory
os.chdir('C:/Users/elliom/Downloads/Chapter_7')

Load the data and check the first 5 rows.

In [None]:
data = pd.read_excel('df.xlsx', header=0)
data.head(5)

Rescore the data to set the minimum score to 0 (currently 1), reformat the dataframe to the correct format for a *RaschPy* MFRM analysis, and check the first 5 rows.

In [None]:
data.set_index('subject', inplace=True)
data -= 1

df_1 = data[['met1_rater1', 'met2_rater1']]
df_1.columns = ['Boredom', 'Disgust']

df_2 = data[['met1_rater2', 'met2_rater2']]
df_2.columns = ['Boredom', 'Disgust']

df_3 = data[['met1_rater3', 'met2_rater3']]
df_3.columns = ['Boredom', 'Disgust']

data_dict = {'Rater_1': df_1, 'Rater_2': df_2, 'Rater_3': df_3}
data = pd.concat(data_dict.values(), keys=data_dict.keys())
data.index.set_names(['Rater', 'Person'], inplace=True)

data.head(5)

Create a *RaschPy* MFRM object from the scores and generate unanchored parameter estimates under the global MFRM.

In [None]:
mfrm = rp.MFRM(data)
mfrm.calibrate_global()

View the item difficulty estimates, threshold estimates and rater severity estimates.

In [None]:
mfrm.diffs

In [None]:
mfrm.thresholds

In [None]:
mfrm.severities_global

Create item, threshold and rater stats tables, save to file and view.

In [None]:
%%time
mfrm.item_stats_df_global()
mfrm.item_stats_global.to_csv('item_stats_global_unanchored.csv')
mfrm.item_stats_global

In [None]:
%%time
mfrm.threshold_stats_df_global()
mfrm.threshold_stats_global.to_csv('threshold_stats_global_unanchored.csv')
mfrm.threshold_stats_global

In [None]:
%%time
mfrm.rater_stats_df_global()
mfrm.rater_stats_global.to_csv('rater_stats_global_unanchored.csv')
mfrm.rater_stats_global

Generate plots of item characteristic curve (item response function) and category response curves for *Boredom* and save to file.

In [None]:
mfrm.icc_global('Boredom', xmin=-2, xmax=2, title=None,
                filename='icc_boredom_rater1_global_unanchored', dpi=600)

In [None]:
mfrm.crcs_global('Boredom', xmin=-2, xmax=2, title=None,
                 filename='crcs_boredom_rater1_global_unanchored', dpi=600)

Generate two anchored rater stats tables: firstly anchored to Raters 1 and 2, then anchored to Rater 3, save to file, and view.

In [None]:
%%time
mfrm.rater_stats_df_global(anchor_raters=['Rater_1', 'Rater_2'])
mfrm.rater_stats_global.to_csv('rater_stats_global_anchored_rater1_rater2.csv')
mfrm.rater_stats_global

In [None]:
%%time
mfrm.rater_stats_df_global(anchor_raters=['Rater_3'])
mfrm.rater_stats_global.to_csv('rater_stats_global_anchored_rater3.csv')
mfrm.rater_stats_global

View the bootstrapped standard error estimates for the category widths, both unanchored and anchored. (For the global representation, this will be the same apart from the natural stochastic variation resulting from the bootstrap procedure since the threshold structure is unchanged by the anchoring process).

In [None]:
mfrm.cat_width_se_global

In [None]:
mfrm.anchor_cat_width_se_global 

Generate unanchored matrix MFRM parameter estimates

In [None]:
%%time
mfrm.calibrate_matrix()

Generate item, threshold and rater stats dataframes, save to file and view. The rater stats dataframe will, by default (as here), produce the marginal mean severity vectors by item and threshold rather than the full matrix of severities.

In [None]:
%%time
mfrm.item_stats_df_matrix()
mfrm.item_stats_matrix.to_csv('item_stats_matrix_unanchored.csv')
mfrm.item_stats_matrix

In [None]:
%%time
mfrm.threshold_stats_df_matrix()
mfrm.threshold_stats_matrix.to_csv('threshold_stats_matrix_unanchored.csv')
mfrm.threshold_stats_matrix

In [None]:
%%time
mfrm.rater_stats_df_matrix()
mfrm.rater_stats_matrix.to_csv('rater_stats_matrix_unanchored_marginal.csv')
mfrm.rater_stats_matrix.T

Generate a rater stats dataframe with the full matrix of unanchored severities.

In [None]:
%%time
mfrm.rater_stats_df_matrix(marginal=False)
mfrm.rater_stats_matrix.to_csv('rater_stats_matrix_unanchored_full.csv')
mfrm.rater_stats_matrix.T

Generate unanchored item characteristic curve and category response curves for *Boredom*, rated by Rater 1.

In [None]:
mfrm.icc_matrix('Boredom', rater='Rater_1', xmin=-2, xmax=2, title=None,
                filename='icc_boredom_rater1_matrix_unanchored', dpi=600)

In [None]:
mfrm.crcs_matrix('Boredom', rater='Rater_1', xmin=-2, xmax=2, title=None,
                 filename='crcs_boredom_rater1_matrix_unanchored', dpi=600)

Run an anchored matrix calibration with Raters 1 and 2 as the refrence frame; generate item, threshold and rater (with marginal severities) stats dataframes and save to file.

In [None]:
%%time
mfrm.calibrate_matrix_anchor(anchor_raters=['Rater_1', 'Rater_2'])
mfrm.item_stats_df_matrix(anchor_raters=['Rater_1', 'Rater_2'])
mfrm.item_stats_matrix.to_csv('item_stats_matrix_anchored_rater1_rater_2.csv')
mfrm.item_stats_matrix

In [None]:
%%time
mfrm.threshold_stats_df_matrix(anchor_raters=['Rater_1', 'Rater_2'])
mfrm.threshold_stats_matrix.to_csv('threshold_stats_matrix_anchored_rater1_rater_2.csv')
mfrm.threshold_stats_matrix

In [None]:
%%time
mfrm.rater_stats_df_matrix(anchor_raters=['Rater_1', 'Rater_2'])
mfrm.rater_stats_matrix.to_csv('rater_stats_matrix_anchored_rater1_rater2_marginal.csv')
mfrm.rater_stats_matrix

Generate 'neutral rater' category response curves under the matrix representation: first unanchored,than anchored (to Raters 1 and 2).

In [None]:
mfrm.crcs_matrix('Boredom', anchor=False, xmin=-2, xmax=2, title=None,
                 filename='crcs_boredom_matrix_unanchored', dpi=600)

In [None]:
mfrm.crcs_matrix('Boredom', anchor=True, xmin=-2, xmax=2, title=None,
                 filename='crcs_boredom_matrix_anchored', dpi=600)

Generate category count dataframes, save to file and view (two dataframes generated: overall and by rater)

In [None]:
%%time
mfrm.category_counts_df()
mfrm.category_counts.to_csv('category_counts.csv')
mfrm.category_counts

In [None]:
mfrm.category_counts_raters.to_csv('category_counts_raters.csv')
mfrm.category_counts_raters

Generate matrx MFRM parameter estimates anchored to Rater 3; produce item, threshold and rater stats dataframes, save to file and view.

In [None]:
%%time
mfrm.calibrate_matrix_anchor(anchor_raters=['Rater_3'])
mfrm.item_stats_df_matrix(anchor_raters=['Rater_3'])
mfrm.item_stats_matrix.to_csv('item_stats_matrix_anchored_rater3.csv')
mfrm.item_stats_matrix

In [None]:
%%time
mfrm.threshold_stats_df_matrix(anchor_raters=['Rater_3'])
mfrm.threshold_stats_matrix.to_csv('threshold_stats_matrix_anchored_rater3.csv')
mfrm.threshold_stats_matrix

In [None]:
%%time
mfrm.rater_stats_df_matrix(anchor_raters=['Rater_3'])
mfrm.rater_stats_matrix.to_csv('rater_stats_matrix_anchored_rater3_marginal.csv')
mfrm.rater_stats_matrix

Plot global person estimates versus matrix person estimates, across all raters for non-extreme scores.

In [None]:
fig, ax = plt.subplots()
scores = np.arange(17) + 1

global_data = [mfrm.score_abil_global(score, raters='all', anchor=True)
               for score in scores]
matrix_data = [mfrm.score_abil_matrix(score, raters='all', anchor=True)
               for score in scores]
ax.scatter(global_data, matrix_data, s=30, color='black')

plt.plot([-2, 4], [-2, 4], color='darkred', linestyle='dashed')

ax.set_aspect('equal', 'box')

plt.xticks(np.arange(-5, 5, step=1))
plt.yticks(np.arange(-5, 5, step=1))

plt.xlim(-2, 4)
plt.ylim(-2, 4)

plt.xlabel('Global',font='Times', fontsize=15)
plt.ylabel('Matrix',font='Times', fontsize=15)

fig.tight_layout()

plt.savefig('abils_global_v_matrix_all_raters.png', dpi=600)

plt.show()

Plot global person estimates versus matrix person estimates, by individual rater for non-extreme scores.

In [None]:
fig, ax = plt.subplots()
scores = np.arange(5) + 1

def get_data(rater):
    
    global_data = [mfrm.score_abil_global(score, anchor=True, raters=[rater])
                   for score in scores]  
    matrix_data = [mfrm.score_abil_matrix(score, anchor=True, raters=[rater])
                   for score in scores]
    
    return global_data, matrix_data

global_data_1, matrix_data_1 = get_data('Rater_1')
ax.scatter(global_data_1, matrix_data_1, marker='^', s=50, color='darkgrey')

global_data_2, matrix_data_2 = get_data('Rater_2')
ax.scatter(global_data_2, matrix_data_2, marker='x', s=50, color='black')

global_data_3, matrix_data_3 = get_data('Rater_3')
ax.scatter(global_data_3, matrix_data_3, marker='+', s=70, color='black')

plt.plot([-2, 4], [-2, 4], color='darkred', linestyle='dashed')

ax.set_aspect('equal', 'box')

plt.xticks(np.arange(-5, 5, step=1))
plt.yticks(np.arange(-5, 5, step=1))

plt.xlim(-2, 4)
plt.ylim(-2, 4)

plt.xlabel('Global',font='Times', fontsize=15)
plt.ylabel('Matrix',font='Times', fontsize=15)

ax.legend(['Rater 1', 'Rater 2', 'Rater 3'])

fig.tight_layout()

plt.savefig('abils_global_v_matrix_by_rater.png', dpi=600)

plt.show()

Define function to calculate root mean square (RMS) difference of two arrays of values.

In [None]:
def rms (a, b):
    
    a = np.array(a)
    b = np.array(b)
    
    sq_errors = ((a - b) ** 2).mean()
    
    return round(np.sqrt(sq_errors), 3)

Calculate RMS difference of global versus matrix person estimates across all raters.

In [None]:
global_data = [mfrm.score_abil_global(score, raters='all', anchor=True)
               for score in scores]
matrix_data = [mfrm.score_abil_matrix(score, raters='all', anchor=True)
               for score in scores]

rms(global_data, matrix_data)

Calculate RMS difference of global versus matrix person estimates by individual rater.

In [None]:
scores = np.arange(5) + 1

def get_data(rater):
    
    global_data = [mfrm.score_abil_global(score, anchor=True, raters=[rater])
                   for score in scores]  
    matrix_data = [mfrm.score_abil_matrix(score, anchor=True, raters=[rater])
                   for score in scores]
    
    return global_data, matrix_data

global_data_1, matrix_data_1 = get_data('Rater_1')
print(f'Rater_1: {rms(global_data_1, matrix_data_1)}')

global_data_2, matrix_data_2 = get_data('Rater_2')
print(f'Rater_2: {rms(global_data_2, matrix_data_2)}')

global_data_3, matrix_data_3 = get_data('Rater_3')
print(f'Rater_3: {rms(global_data_3, matrix_data_3)}')

Plot category response curves for *Boredom* under the matrix MFRM, first anchored to Raters 1 and 2, then anchored to Rater 3.

In [None]:
mfrm.calibrate_matrix_anchor(anchor_raters=['Rater_1', 'Rater_2'])
mfrm.crcs_matrix('Boredom', anchor=True, xmin=-2, xmax=2, title=None,
                 filename='crcs_boredom_matrix_anchored_rater1_rater2', dpi=600)

In [None]:
mfrm.calibrate_matrix_anchor(anchor_raters=['Rater_3'])
mfrm.crcs_matrix('Boredom', anchor=True, xmin=-2, xmax=2, title=None,
                 filename='crcs_boredom_matrix_anchored_rater3', dpi=600)