Go directly to:
- [**Start page**](https://github.com/coconeuro/remeta/)
- [**Installation**](https://github.com/coconeuro/remeta/blob/main/INSTALL.md)
- [**Basic Usage**](https://github.com/coconeuro/remeta/blob/main/demo/basic_usage.ipynb)
- [**Common use cases**](https://github.com/coconeuro/remeta/blob/main/demo/common_use_cases.ipynb)
- [**Group estimation and priors** (this page)](https://github.com/coconeuro/remeta/blob/main/demo/group_estimation_priors.ipynb)

## Group estimation and priors

Often, data from single participants is not sufficient for precise parameter estimation. In this case, ReMeta offers three different options to enrich parameter estimation either by group-level or prior information.

### Group estimation (fixed effect)

To use group-level information we need to pass 1) a 2d data to ReMeta (nsubjects x nsamples) and 2) specify the `group` attribute for parameters.

The `fit` method of ReMeta accepts data as either 1d arrays (single participant) or as 2d arrays (group data). To showcase, we simulate type 1 data for 4 participants:


In [1]:
import remeta
import numpy as np
np.random.seed(0)
cfg = remeta.Configuration()
cfg.enable_type1_param_nonlinear_encoding_gain = 1
cfg.skip_type2 = True
params_true = dict(
    type1_noise=0.5,
    type1_bias=0,
    type1_nonlinear_encoding_gain=0.5
)
cfg.true_params = params_true
data = remeta.simu_data(nsubjects=4, nsamples=2000, params=params_true, cfg=cfg)

----------------------------------
..Generative parameters:
    type1_noise: 0.5
    type1_bias: 0
    type1_nonlinear_encoding_gain: 0.5
..Descriptive statistics:
    No. subjects: 4
    No. samples: 2000
    Performance: 83.7% correct
    Choice bias: -0.2%
----------------------------------


We simulated data with a slight non-linearity (`type1_param_nonlinear_encoding_gain=0.5`). Non-linearities in the encoding of the stimulus intensity are notoriously sample-intensive and thus estimates for single participants are often all over the place even if fitted with the same ground truth model. This is exactly what we find, even though each participant has 2000 trials:

In [2]:
rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()


+++ Type 1 level +++
  Subject-level estimation (MLE)
    .. finished (1.4 secs).
  Final report
    Subject 1 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.419 (true: 0.5)
            [subject] type1_nonlinear_encoding_gain: 0.136 (true: 0.5)
            [subject] type1_bias: 0.0172 (true: 0)
        [subject] Neg. LL: 689.68
        [subject] Fitting time: 0.35 secs
        Neg. LL using true params: 1090.66
    Subject 2 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.495 (true: 0.5)
            [subject] type1_nonlinear_encoding_gain: 0.284 (true: 0.5)
            [subject] type1_bias: -0.0215 (true: 0)
        [subject] Neg. LL: 763.88
        [subject] Fitting time: 0.35 secs
        Neg. LL using true params: 1218.72
    Subject 3 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.509 (true: 0.5)
            [subject] type1_nonlinear_encoding_gain: 0.361 

In case of a group-level fit, the result returned by the `summary()` method is a list of length `nsubjects`. We can print the final parameter estimates more cleanly as follows:

In [3]:
for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f}')

Subject 0
	type1_noise: 0.419
	type1_nonlinear_encoding_gain: 0.136
	type1_bias: 0.017
Subject 1
	type1_noise: 0.495
	type1_nonlinear_encoding_gain: 0.284
	type1_bias: -0.022
Subject 2
	type1_noise: 0.509
	type1_nonlinear_encoding_gain: 0.361
	type1_bias: -0.014
Subject 3
	type1_noise: 0.580
	type1_nonlinear_encoding_gain: 1.414
	type1_bias: 0.003


The fitted parameters for `type1_nonlinear_encoding_gain` vary strongly around the true value of `0.5`. Yet, this is no fitting error (the empirical negative log-likelihood is always lower than one for the true parameters), but caused by the fact that non-linearities are very hard to infer from binary choice data. Not impossible, but it takes lots of samples to get anywhere near an acceptable level f precision.

One option to tackle this is to fit a single non-linearity parameter to the entire group. To specify, we set the `group` attribute of the parameter to `'fixed'`:

In [4]:
cfg.type1_param_nonlinear_encoding_gain.group = 'fixed'

We fit the model as usual:

In [5]:
rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()


+++ Type 1 level +++
  Subject-level estimation (MLE)
    .. finished (1.4 secs).

  Group-level optimization (MLE / MAP)
        [21:24:39] Iteration 1 / 30
        [21:24:39] Iteration 11 / 30
        [21:24:39] Iteration 21 / 30
    .. finished (0.2 secs).
  Final report
    Subject 1 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.419 (true: 0.5)
            [subject] type1_nonlinear_encoding_gain: 0.136 (true: 0.5)
            [subject] type1_bias: 0.0172 (true: 0)
        [subject] Neg. LL: 689.68
        [subject] Fitting time: 0.34 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.419 (true: 0.5)
            [group=fixed] type1_nonlinear_encoding_gain: 0.492 (true: 0.5)
            [subject] type1_bias: 0.0172 (true: 0)
        [final] Neg. LL: 692.04
        Neg. LL using true params: 1090.66
    Subject 2 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.4

Now, the parameter `type1_nonlinear_encoding_gain` is fitted to the entire group dataset and thus the parameter is identical for each participant. The final estimate of `0.492` much closer to the ground truth value of `0.5`. We once again print the final parameter more cleanly:

In [6]:
for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f}')

Subject 0
	type1_noise: 0.419
	type1_nonlinear_encoding_gain: 0.492
	type1_bias: 0.017
Subject 1
	type1_noise: 0.495
	type1_nonlinear_encoding_gain: 0.492
	type1_bias: -0.022
Subject 2
	type1_noise: 0.509
	type1_nonlinear_encoding_gain: 0.492
	type1_bias: -0.014
Subject 3
	type1_noise: 0.580
	type1_nonlinear_encoding_gain: 0.492
	type1_bias: 0.003


Note that even though parameter recovery improved, the negative log-likelihood of this group fit is worse (i.e. higher) than the single-subject fit:

In [7]:
for s in range(result.nsubjects):
    print(f'Subject {s}')
    print(f'\tnegll(subject fit): {result.type1.subject.negll[s]:.3f}')
    print(f'\tnegll(group fit): {result.type1.group.negll[s]:.3f}')

Subject 0
	negll(subject fit): 689.682
	negll(group fit): 692.040
Subject 1
	negll(subject fit): 763.884
	negll(group fit): 764.750
Subject 2
	negll(subject fit): 769.558
	negll(group fit): 769.894
Subject 3
	negll(subject fit): 721.552
	negll(group fit): 731.840


Yet, in this case this effectively means that the model is not overfit to random peculiarities of each subject's data and better fits the broad trends in the group data.

### Group estimation (random effect)

Another possibility is to treat a parameter as a random effect. When modeled as a random effect, each subject is fitted to its own data, but the parameter estimate is informed / regularized by an estimate of the parameter's group distribution. If you assume that there is plausibly variability between participants (and there typically is), modeling parameters as random effects might be the better choice.

In [8]:
cfg.type1_param_nonlinear_encoding_gain.group = 'random'
rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()


+++ Type 1 level +++
  Subject-level estimation (MLE)
    .. finished (1.4 secs).

  Group-level optimization (MLE / MAP)
        [21:24:40] Iteration 1 / 30 (Convergence: 0.00007771)
        [21:24:41] Iteration 11 / 30 (Convergence: 0.00000035)
        [21:24:42] Iteration 21 / 30 (Convergence: 0.00000003)
    .. finished (1.8 secs).
  Final report
    Subject 1 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.419 (true: 0.5)
            [subject] type1_nonlinear_encoding_gain: 0.136 (true: 0.5)
            [subject] type1_bias: 0.0172 (true: 0)
        [subject] Neg. LL: 689.68
        [subject] Fitting time: 0.36 secs
        Parameters estimates (group-level fit)
            [subject] type1_noise: 0.455 (true: 0.5)
            [group=random] type1_nonlinear_encoding_gain: 0.485 (true: 0.5)
            [subject] type1_bias: 0.0185 (true: 0)
        [final] Neg. LL: 690.13
        Neg. LL using true params: 1090.66
    Subject 2 / 4
        

In the current example, all participant estimates for `type1_nonlinear_encoding_gain` are pretty similar, reflecting the fact that the data were in fact generated by the same ground truth model:

In [9]:
for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.6f}')

Subject 0
	type1_noise: 0.454689
	type1_nonlinear_encoding_gain: 0.485180
	type1_bias: 0.018546
Subject 1
	type1_noise: 0.519750
	type1_nonlinear_encoding_gain: 0.485182
	type1_bias: -0.022539
Subject 2
	type1_noise: 0.524893
	type1_nonlinear_encoding_gain: 0.485183
	type1_bias: -0.014263
Subject 3
	type1_noise: 0.482807
	type1_nonlinear_encoding_gain: 0.485190
	type1_bias: 0.002341


This observation is matched by an inspection of the population estimate for `type1_nonlinear_encoding_gain`:

In [10]:
print(f'Estimated population mean ± SD: {result.params_random_effect.mean['type1_nonlinear_encoding_gain']:.4f} ± {result.params_random_effect.std['type1_nonlinear_encoding_gain']:.4f}')

Estimated population mean ± SD: 0.4852 ± 0.0012


## Priors

Priors present another way to inform and regularize point estimates of participants. If there is good reason from prior literature or a prior study to assume a prior distribution for a parameter, one can perform Maximum A Posteriori estimation (MAP) instead of Maximum Likelihood estimation (MLE). In Remeta this is possible by specifying the `prior` attribute of a parameter. In the following example, we delete the previous random effect for `type1_param_nonlinear_encoding_gain` and specify a prior instead - a tuple of the form (prior_mean, prior_std).

In [11]:
cfg.type1_param_nonlinear_encoding_gain.group = None
cfg.type1_param_nonlinear_encoding_gain.prior = (0, 1)
rem = remeta.ReMeta(cfg=cfg)
rem.fit(data.stimuli, data.choices, data.confidence)
result = rem.summary()


+++ Type 1 level +++
  Subject-level estimation (MLE)
    .. finished (1.5 secs).
  Final report
    Subject 1 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.418 (true: 0.5)
            [subject+prior=N(0,1)] type1_nonlinear_encoding_gain: 0.122 (true: 0.5)
            [subject] type1_bias: 0.0172 (true: 0)
        [subject] Neg. LL: 689.69
        [subject] Fitting time: 0.45 secs
        Neg. LL using true params: 1090.66
    Subject 2 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.491 (true: 0.5)
            [subject+prior=N(0,1)] type1_nonlinear_encoding_gain: 0.251 (true: 0.5)
            [subject] type1_bias: -0.0214 (true: 0)
        [subject] Neg. LL: 763.92
        [subject] Fitting time: 0.40 secs
        Neg. LL using true params: 1218.72
    Subject 3 / 4
        Parameters estimates (subject-level fit)
            [subject] type1_noise: 0.504 (true: 0.5)
            [subject+prior=N(0,1)

According to our prior, a null effect for the nonlinearity parameter should be most likely. The precision of the prior is moderate with a standard deviation of `1`. As a consequence, the nonlinearity parameter is biased towards 0 compared to the original estimates without a prior:

In [12]:
for s in range(result.nsubjects):
    print(f'Subject {s}')
    for k, v in result.type1.params[s].items():
        print(f'\t{k}: {v:.3f}')

Subject 0
	type1_noise: 0.418
	type1_nonlinear_encoding_gain: 0.122
	type1_bias: 0.017
Subject 1
	type1_noise: 0.491
	type1_nonlinear_encoding_gain: 0.251
	type1_bias: -0.021
Subject 2
	type1_noise: 0.504
	type1_nonlinear_encoding_gain: 0.316
	type1_bias: -0.014
Subject 3
	type1_noise: 0.543
	type1_nonlinear_encoding_gain: 1.061
	type1_bias: 0.003
