This extension serves narrow my use case for writing, compiling, error-checking and validating stan models within jupyter notebook. I am not a comfortable R user and STAN currently leans towards R. I wrote this simple script as I wanted an uninterruped workflow for data manipulation, stan model creation and analysis  all in jupyter notebook. Once the codemirror support is merged in, sysntax highlighting for stan code could also be enabled.

## Installation

`pip install git+https://github.com/Arvinds-ds/stanmagic.git`

## Testing %%stan magic

In [32]:
import numpy as np
from collections import OrderedDict
import pystan

### Load stanmagic extension

In [33]:
%load_ext stanmagic 

The stanmagic extension is already loaded. To reload it, use:
  %reload_ext stanmagic


### Ensure stanc compiler is installed

If the compiler is not in your path, you may have to pass --stanc [compiler_path] to %%stan

In [35]:
!stanc --version

stanc version 2.16.0


### Generate test data

In [19]:
X = np.random.randn(100,3)
beta = np.array([0.1,0.2,0.3])
alpha = 4
sigma = 1.7
y = X@beta + alpha + np.random.randn(100)*sigma
N=100
K=3

In [20]:
data = OrderedDict({'X':X, 'y': y, 'N':N, 'K': K})

### 1. %%stan -f [stan_file_name]

Saves the cell code to a file specified in [stan_file_name]. The file name can also be accessed in _stan_vars['stan_file'] generated in local namespace

In [21]:
%%stan -f test.stan
data {
  
  int<lower=1> N;
  int<lower=1> K;
  
  matrix[N,K] X;
  vector[N] y;
}

parameters{
  real alpha;
  ordered[K] beta;
  real<lower=0> sigma;
  
}

model {
  
  alpha ~ normal(0,10);
  beta[1] ~ normal(.1,.1);
  beta[2] ~ normal(.2,.1);
  beta[3] ~ normal(.3,.1);
  sigma ~ exponential(1);
  for (n in 1:N) {
    y[n] ~ normal(X[n]*beta + alpha, sigma);
  } 
  
}

generated quantities {
  vector[N] y_rep;
  for (n in 1:N) {
    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);
  }
}

Using stanc compiler:  /home/aravind/Downloads/cmdstan-2.16.0/bin/stanc
/home/aravind/Downloads/cmdstan-2.16.0/bin/stanc --o=/tmp/b74e4bf1-804a-4b51-aca1-1594b5b76d9a.cpp test.stan
Model name=test_model
Input file=test.stan
Output file=/tmp/b74e4bf1-804a-4b51-aca1-1594b5b76d9a.cpp


In [22]:
_stan_vars

{'model_name': 'test_model', 'stan_code': None, 'stan_file': 'test.stan'}

In [23]:
model = pystan.StanModel(file=_stan_vars['stan_file'])

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b64da7aba54424736fda036b359c5ae9 NOW.


In [24]:
model.sampling(data=data)

Inference for Stan model: anon_model_b64da7aba54424736fda036b359c5ae9.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha       4.14  2.9e-3   0.17    3.8   4.03   4.15   4.25   4.46   3303    1.0
beta[0]     0.07  1.6e-3   0.07  -0.08   0.02   0.07   0.12    0.2   2193    1.0
beta[1]     0.18  1.1e-3   0.06   0.06   0.14   0.19   0.23   0.31   3144    1.0
beta[2]     0.31  1.1e-3   0.07   0.18   0.26   0.31   0.36   0.46   4000    1.0
sigma       1.64  2.0e-3   0.12   1.43   1.56   1.63   1.71   1.88   3253    1.0
y_rep[0]    4.17    0.03   1.62   0.97   3.12   4.17   5.26   7.36   4000    1.0
y_rep[1]    3.89    0.03   1.69   0.51   2.78    3.9   5.02   7.19   3856    1.0
y_rep[2]    4.49    0.03   1.66   1.23   3.39   4.48   5.56   7.82   4000    1.0
y_rep[3]    4.62    0.03   1.67   1.48   3.47   4.59   5.74   7.93   3884    1.0
y

### 2. %%stan -f [stan_file_name] --save_only

Saves the cell code to a file specified in [stan_file_name].  Skips compile step

In [25]:
%%stan -f test1.stan --save_only
data {
  
  int<lower=1> N;
  int<lower=1> K;
  
  matrix[N,K] X;
  vector[N] y;
}

parameters{
  real alpha;
  ordered[K] beta;
  real<lower=0> sigma;
  
}

model {
  
  alpha ~ normal(0,10);
  beta[1] ~ normal(.1,.1);
  beta[2] ~ normal(.2,.1);
  beta[3] ~ normal(.3,.1);
  sigma ~ exponential(1);
  for (n in 1:N) {
    y[n] ~ normal(X[n]*beta + alpha, sigma);
  } 
  
}

generated quantities {
  vector[N] y_rep;
  for (n in 1:N) {
    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);
  }
}

File test1.stan saved..Skipping Compile


In [26]:
model = pystan.StanModel(file='test1.stan')

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b64da7aba54424736fda036b359c5ae9 NOW.


### 3. %%stan 

Saves the cell code to a code string. The code string can be accessed via _stan_vars['stan_code']

In [27]:
%%stan
data {
  
  int<lower=1> N;
  int<lower=1> K;
  
  matrix[N,K] X;
  vector[N] y;
}

parameters{
  real alpha;
  ordered[K] beta;
  real<lower=0> sigma;
  
}

model {
  
  alpha ~ normal(0,10);
  beta[1] ~ normal(.1,.1);
  beta[2] ~ normal(.2,.1);
  beta[3] ~ normal(.3,.1);
  sigma ~ exponential(1);
  for (n in 1:N) {
    y[n] ~ normal(X[n]*beta + alpha, sigma);
  } 
  
}

generated quantities {
  vector[N] y_rep;
  for (n in 1:N) {
    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);
  }
}

Using stanc compiler:  /home/aravind/Downloads/cmdstan-2.16.0/bin/stanc
/home/aravind/Downloads/cmdstan-2.16.0/bin/stanc --o=/tmp/anon_2c3a5bf5-ccea-41d6-9352-67cbe3330b03_model.cpp /tmp/anon_2c3a5bf5-ccea-41d6-9352-67cbe3330b03.stan
Model name=anon_2c3a5bf5_ccea_41d6_9352_67cbe3330b03_model
Input file=/tmp/anon_2c3a5bf5-ccea-41d6-9352-67cbe3330b03.stan
Output file=/tmp/anon_2c3a5bf5-ccea-41d6-9352-67cbe3330b03_model.cpp


In [28]:
_stan_vars

{'model_name': None,
 'stan_code': 'data {\n  \n  int<lower=1> N;\n  int<lower=1> K;\n  \n  matrix[N,K] X;\n  vector[N] y;\n}\n\nparameters{\n  real alpha;\n  ordered[K] beta;\n  real<lower=0> sigma;\n  \n}\n\nmodel {\n  \n  alpha ~ normal(0,10);\n  beta[1] ~ normal(.1,.1);\n  beta[2] ~ normal(.2,.1);\n  beta[3] ~ normal(.3,.1);\n  sigma ~ exponential(1);\n  for (n in 1:N) {\n    y[n] ~ normal(X[n]*beta + alpha, sigma);\n  } \n  \n}\n\ngenerated quantities {\n  vector[N] y_rep;\n  for (n in 1:N) {\n    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);\n  }\n}',
 'stan_file': None}

In [29]:
model = pystan.StanModel(model_code=_stan_vars['stan_code'])

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b64da7aba54424736fda036b359c5ae9 NOW.


In [30]:
model.sampling(data=data)

Inference for Stan model: anon_model_b64da7aba54424736fda036b359c5ae9.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha       4.13  3.1e-3   0.16    3.8   4.02   4.14   4.24   4.45   2839    1.0
beta[0]     0.07  1.6e-3   0.07  -0.08   0.02   0.07   0.12    0.2   1982    1.0
beta[1]     0.18  1.0e-3   0.07   0.06   0.14   0.18   0.23   0.31   4000    1.0
beta[2]     0.32  1.2e-3   0.07   0.18   0.27   0.31   0.37   0.47   4000    1.0
sigma       1.64  2.2e-3   0.12   1.42   1.56   1.63   1.72   1.89   2830    1.0
y_rep[0]    4.13    0.03   1.67   0.89   3.01   4.14   5.25   7.39   3927    1.0
y_rep[1]    3.85    0.03   1.67   0.61   2.71   3.85   4.95   7.18   4000    1.0
y_rep[2]    4.45    0.03   1.65   1.36   3.28   4.49   5.57   7.64   4000    1.0
y_rep[3]    4.54    0.03   1.63    1.3   3.42   4.57    5.6   7.72   3597    1.0
y

### 4. %%stan -f [stan_file_name] -o [cpp_file_name]
Saves the cell code to a file specified in [stan_file_name] and outputs the compiled cpp file to the file name specified by [cpp_file_name]


### 5. %% stan -f  [stan_file_name] --allow_undefined
passes the --allow_undefined argument to stanc compiler

### 6.%%stan -f  [stan_file_name] --stanc [stanc_compiler_path]
Saves the cell code to a file specified in [stan_file_name] and compiles using the stan compiler specified in [stanc_compiler]. By default, it uses stanc compiler in your path. If your path does not have the stanc compiler, use this option (e.g %%stan binom.stan --stanc "~/cmdstan-2.16.0/bin/stanc")

In [36]:
assert True