# Usage


`modelframe` builds model matrices and response vectors from a given dataset and an `lme4`-style formula.
As an example we consider a sleep study data set from the [`lme4` package](https://cran.r-project.org/web/packages/lme4/index.html) which is included in `modelframe`.

In [1]:
from modelframe import model_frame, load_data

In [2]:
sleepstudy = load_data()
sleepstudy

Unnamed: 0,Reaction,Days,Subject
0,249.5600,0,308
1,258.7047,1,308
2,250.8006,2,308
3,321.4398,3,308
4,356.8519,4,308
...,...,...,...
175,329.6076,5,372
176,334.4818,6,372
177,343.2199,7,372
178,369.1417,8,372


The matrices can be computed from the data set like this:

In [3]:
frame = model_frame("Reaction ~ Days + (1 | Subject)", sleepstudy)
frame

<a model frame>

The model above builds a response vector for the variable `Reaction`, a fixed model matrix for the variable `Days` including an intercept, and a random effects model matrix with an intercept for every `Subject`.

The response variable is a `pandas.Series` and can then be accessed via:

In [4]:
frame.response

0      249.5600
1      258.7047
2      250.8006
3      321.4398
4      356.8519
         ...   
175    329.6076
176    334.4818
177    343.2199
178    369.1417
179    364.1236
Name: Reaction, Length: 180, dtype: float64

The model matrices are `pandas.DataFrame`s and can be accessed like this:

In [5]:
frame.coef_model_matrix

Unnamed: 0,Intercept,Days
0,1.0,0
1,1.0,1
2,1.0,2
3,1.0,3
4,1.0,4
...,...,...
175,1.0,5
176,1.0,6
177,1.0,7
178,1.0,8


In [6]:
frame.ranef_model_matrix

Unnamed: 0_level_0,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept
group,308,309,310,330,331,332,333,334,335,337,349,350,351,352,369,370,371,372
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
176,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
177,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
178,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


It is, of course, also possible to construct more complex formulae:

In [7]:
frame = model_frame("~ 0 + Days + (0 + Days + Reaction | Subject)", sleepstudy)

This builds a fixed effects design matrix without intercept:

In [8]:
frame.coef_model_matrix

Unnamed: 0,Days
0,0
1,1
2,2
3,3
4,4
...,...
175,5
176,6
177,7
178,8


The random effects design matrix also does not use an intercept, but uses slopes for `Days` and `Reaction` for every `Subject`.

In [9]:
frame.ranef_model_matrix

Unnamed: 0_level_0,Days,Reaction,Days,Reaction,Days,Reaction,Days,Reaction,Days,Reaction,...,Days,Reaction,Days,Reaction,Days,Reaction,Days,Reaction,Days,Reaction
group,308,308,309,309,310,310,330,330,331,331,...,352,352,369,369,370,370,371,371,372,372
0,0.0,249.5600,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000
1,1.0,258.7047,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000
2,2.0,250.8006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000
3,3.0,321.4398,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000
4,4.0,356.8519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,329.6076
176,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,334.4818
177,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,343.2199
178,0.0,0.0000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,369.1417


## Random effects

For hierarchical modelling, it is important to know many how levels a random effects term has, in order to be able to construct priors. For instance, for the following formula, we would want to know the dimensionality of the random effect term:

In [10]:
formula = "~ 1 + (1 | Subject)"
formula

'~ 1 + (1 | Subject)'

In [11]:
frame = model_frame(formula, sleepstudy)

We can compute the size of the random effect terms from the `frame` object:

In [12]:
frame.ranef_list

[<random effects term>]

In the formula above we only specified one random effect (`Subject`). Let's check the shape of the model matrix of the term and the matrix itself:

In [13]:
frame.ranef_list[0].Z

Unnamed: 0_level_0,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept,Intercept
group,308,309,310,330,331,332,333,334,335,337,349,350,351,352,369,370,371,372
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
176,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
177,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
178,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [14]:
frame.ranef_list[0].Z.shape

(180, 18)

In this case the matrix has 18 columns, one for every subject. The size of the random effect is:

In [15]:
frame.ranef_list[0].n_terms

1.0

The following example uses a two-dimensional random effects:

In [16]:
frame = model_frame("~ 1 + (Days | Subject)", sleepstudy)

In [17]:
frame.ranef_list[0].Z

Unnamed: 0_level_0,Intercept,Days,Intercept,Days,Intercept,Days,Intercept,Days,Intercept,Days,...,Intercept,Days,Intercept,Days,Intercept,Days,Intercept,Days,Intercept,Days
group,308,308,309,309,310,310,330,330,331,331,...,352,352,369,369,370,370,371,371,372,372
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,5.0
176,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,6.0
177,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,7.0
178,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,8.0


In [18]:
frame.ranef_list[0].Z.shape

(180, 36)

In [19]:
frame.ranef_list[0].n_terms

2.0