## *RaschPy* simulation functionality

This notebook works through examples of how to generate simulated data sets with `RaschPy` for experimental use where knowledge of the underlying 'ground truth' of the generating parameters is useful, for example when comparing the efficacy of different estimation algorithms, such as in Elliott & Buttery (2022a) or exploring the effect of fitting different Rasch models to the same data set, such as in Elliott & Buttery (2022b). There are separate classes for each model: `SLM_Sim` for the simple logistic model (or dichotomous Rasch model) (Rasch, 1960), `PCM_Sim` for the partial credit model (Masters, 1982), `RSM_Sim` for the rating scale model (Andrich, 1978), `MFRM_Sim_Global` for the many-facet Rasch model (Linacre, 1994), `MFRM_Sim_Items` for the vector-by-item extended MFRM (Elliott & Buttery, 2022b), `MFRM_Sim_Thresholds` for the vector-by-threshold extended MFRM (Elliott & Buttery, 2022b) and `MFRM_Sim_Matrix` for the matrix extended MFRM (Elliott & Buttery, 2022b). All data is generated to fit the chosen model.

**References**

&nbsp;&nbsp;&nbsp;&nbsp; Andrich, D. (1978). A rating formulation for ordered response categories. *Psychometrika*, *43*(4), 561–573.

&nbsp;&nbsp;&nbsp;&nbsp; Elliott, M., & Buttery, P. J. (2022a) Non-iterative Conditional Pairwise Estimation for the Rating Scale Model, *Educational and Psychological Measurement*, *82*(5), 989-1019.

&nbsp;&nbsp;&nbsp;&nbsp; Elliott, M. and Buttery, P. J. (2022b) Extended Rater Representations in the Many-Facet Rasch Model, *Journal of Applied Measurement*, *22*(1), 133-160.

&nbsp;&nbsp;&nbsp;&nbsp; Linacre, J. M. (1994). *Many-Facet Rasch Measurement*. MESA Press.

&nbsp;&nbsp;&nbsp;&nbsp; Masters, G. N. (1982). A Rasch model for partial credit scoring. *Psychometrika*, *47*(2), 149–174.

&nbsp;&nbsp;&nbsp;&nbsp; Rasch, G. (1960). *Probabilistic models for some intelligence and attainment tests*. Danmarks Pædagogiske
Institut.

Import the packages and set the working directory (here called `my_working_directory`) - you will save your output files here.

In [1]:
import RaschPy as rp
import numpy as np
import pandas as pd
import os
import pickle

# my_working_directory
os.chdir('C:/Users/elliom/Downloads/sims')

### `SLM_Sim`

Create an object `slm_sim_1` of the class `SLM_Sim` with randomised item difficulties and person abilities. The `SLM_Sim` will do this automatically when you pass `item_range`, `person_sd` and `offset` arguments to the simulation: item difficulties will be sampled from a uniform distribution and person abilities will be sampled from a normal distribution. In this case, we pass `item_range=4` to have items covering a range of 4 logits, and `person_sd=2` and `offset=1` to have a sample of persons with a mean ability 1 logit higher than the items, with a standard deviation of 2 logits. There are 5,000 persons and 30 items, with no missing data for this simulation.

In [2]:
slm_sim_1 = rp.SLM_Sim(no_of_items=30,
                       no_of_persons=5000,
                       item_range=4,
                       person_sd=2,
                       offset=2)

Save the generated response dataframe, which is stored as an attribute `slm_sim_1.scores` to file, and view the first 5 lines.

In [3]:
slm_sim_1.scores.to_csv('slm_sim_1_scores.csv')
slm_sim_1.scores.head(5)

Unnamed: 0,Item_1,Item_2,Item_3,Item_4,Item_5,Item_6,Item_7,Item_8,Item_9,Item_10,...,Item_21,Item_22,Item_23,Item_24,Item_25,Item_26,Item_27,Item_28,Item_29,Item_30
Person_1,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
Person_2,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0
Person_3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Person_4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0
Person_5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


Save the generating item and person parameters, which are also stored as attributes, `slm_sim_1.diffs` and `slm_sim_1.abilities`, to file, and view the first 5 lines of each.

In [4]:
slm_sim_1.diffs.to_csv('slm_sim_1_diffs.csv', header=None)
slm_sim_1.diffs.head(5)

Item_1    0.212071
Item_2   -0.862861
Item_3    0.286360
Item_4    1.324331
Item_5    1.586519
dtype: float64

In [5]:
slm_sim_1.abilities.to_csv('slm_sim_1_abilities.csv', header=None)
slm_sim_1.abilities.head(5)

Person_1    1.551758
Person_2    1.372744
Person_3    2.817531
Person_4    3.521883
Person_5    4.308773
dtype: float64

Create an object `slm_1` of the class `SLM` from the response dataframe for analysis (see the manual and/or the `SLM` example notebook for details on conducting an analysis). Also, save the object `slm_sim_1` to file with `pickle`.

In [6]:
slm_1 = rp.SLM(slm_sim_1.scores)

with open('slm_sim_1.pickle', 'wb') as handle:
    pickle.dump(slm_sim_1, handle, protocol=pickle.HIGHEST_PROTOCOL)

You may wish to create a simulation based on specified, known item difficulties and/or person abilities. This may be done by passing lists to the `manual_diffs` and/or `manual_abilities` arguments (in which case, there is no need to pass the relevant `item_range`, `person_sd` or `offset` arguments). You may also customise the names of the items and/or persons by passing lists of the correct length to the `manual_person_names` and/or `manual_item_names` arguments: this also applies to all simulations.

The `manual_diffs` and `manual_abilities` arguments may also be used to generate random item difficulties and/or person abilities according to distributions other than the default uniform (for items) and normal (for persons). This is what is done in the example `slm_sim_2` below: A set of specified, fixed item difficulties (10 items of difficulty -1 logit and 10 of difficulty +1 logit) are passed together with a random uniform distribution of person abilities between -2 and 2 logits. For this simulation, we also set a proportion of 30% missing data (missing completely at random) by passing the argument `missing=0.3`.

In [7]:
slm_sim_2 = rp.SLM_Sim(no_of_items=20,
                       no_of_persons=5000,
                       missing=0.3,
                       manual_diffs = [-1 for item in range(10)] + [1 for item in range(10)] ,
                       manual_abilities = np.random.uniform(-2, 2, 5000))

Save the generated response dataframeto file, and view the first 5 lines. Missing data is shown as `NaN`.

In [8]:
slm_sim_2.scores.to_csv('slm_sim_2_scores.csv')
slm_sim_2.scores.head(5)

Unnamed: 0,Item_1,Item_2,Item_3,Item_4,Item_5,Item_6,Item_7,Item_8,Item_9,Item_10,Item_11,Item_12,Item_13,Item_14,Item_15,Item_16,Item_17,Item_18,Item_19,Item_20
Person_1,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,,1.0,0.0,,0.0,0.0,0.0,0.0,1.0,1.0,,0.0
Person_2,1.0,1.0,,1.0,1.0,,,1.0,,1.0,,,0.0,,1.0,1.0,0.0,1.0,1.0,1.0
Person_3,,,1.0,1.0,,1.0,0.0,,1.0,0.0,1.0,,0.0,,0.0,1.0,0.0,,0.0,1.0
Person_4,1.0,1.0,1.0,1.0,1.0,,,1.0,1.0,1.0,,,0.0,0.0,,0.0,1.0,1.0,0.0,
Person_5,,1.0,,,1.0,1.0,1.0,1.0,1.0,1.0,,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,


Save the generating item and person parameters to file, and view all item difficulties and the first 5 lines of the person abilities.


In [9]:
slm_sim_2.diffs.to_csv('slm_sim_2_diffs.csv', header=None)
slm_sim_2.diffs

Item_1    -1
Item_2    -1
Item_3    -1
Item_4    -1
Item_5    -1
Item_6    -1
Item_7    -1
Item_8    -1
Item_9    -1
Item_10   -1
Item_11    1
Item_12    1
Item_13    1
Item_14    1
Item_15    1
Item_16    1
Item_17    1
Item_18    1
Item_19    1
Item_20    1
dtype: int32

In [10]:
slm_sim_2.abilities.to_csv('slm_sim_2_abilities.csv', header=None)
slm_sim_2.abilities.head(5)

Person_1   -0.426222
Person_2    1.671151
Person_3    1.130043
Person_4    0.781744
Person_5    0.436726
dtype: float64

Create an object `slm_2` of the class `SLM` from the response dataframe for analysis and save the object `slm_sim_2` to file with `pickle`.

In [11]:
slm_2 = rp.SLM(slm_sim_2.scores)

with open('slm_sim_2.pickle', 'wb') as handle:
    pickle.dump(slm_sim_2, handle, protocol=pickle.HIGHEST_PROTOCOL)