This extension serves a narrow use case for writing, compiling, error-checking and validating stan models within jupyter notebook. I am not a comfortable R user and STAN currently leans towards R. I wrote this simple script as I wanted an uninterruped workflow for data manipulation, stan model creation and analysis  all in jupyter notebook. Once the codemirror support is merged in, sysntax highlighting for stan code could also be enabled. Since  few people were asking me for this, I have packaged this as a extension.

Tested to work on Linux/Mac. Python versions 2.7/3.6

## Installation

`pip install git+https://github.com/Arvinds-ds/stanmagic.git`

## Testing %%stan magic

In [1]:
import numpy as np
from collections import OrderedDict
import pystan

### Load stanmagic extension

In [2]:
%load_ext stanmagic 

### Ensure stanc compiler is installed

To get cmdstan installed:-

1. Downlad latest cmdstan-xxxx.tar.gz https://github.com/stan-dev/cmdstan/releases
2. extract to <cmdstan_path>
3. `cd <cmdstan_path>`
4. `make`
3. `make build`
4. export PATH=$PATH:<cmdstan_path>/bin/stanc


If the compiler is not in your path or you don't want to edit your PATH, you may have to pass --stanc [compiler_path] to %%stan. See below

In [3]:
!stanc --version

stanc version 2.16.0


### Generate test data

In [5]:
X = np.random.randn(100,3)
beta = np.array([0.1,0.2,0.3])
alpha = 4
sigma = 1.7
y = np.dot(X,beta) + alpha + np.random.randn(100)*sigma
N=100
K=3

In [7]:
data = OrderedDict({'X':X, 'y': y, 'N':N, 'K': K})

### 1. %%stan -f [stan_file_name]

Saves the cell code to a file specified in [stan_file_name]. The file name can also be accessed in _stan_vars['stan_file'] generated in local namespace

In [8]:
%%stan -f test.stan
data {
  
  int<lower=1> N;
  int<lower=1> K;
  
  matrix[N,K] X;
  vector[N] y;
}

parameters{
  real alpha;
  ordered[K] beta;
  real<lower=0> sigma;
  
}

model {
  
  alpha ~ normal(0,10);
  beta[1] ~ normal(.1,.1);
  beta[2] ~ normal(.2,.1);
  beta[3] ~ normal(.3,.1);
  sigma ~ exponential(1);
  for (n in 1:N) {
    y[n] ~ normal(X[n]*beta + alpha, sigma);
  } 
  
}

generated quantities {
  vector[N] y_rep;
  for (n in 1:N) {
    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);
  }
}

Using stanc compiler:  /home/aravind/Downloads/cmdstan-2.16.0/bin/stanc
/home/aravind/Downloads/cmdstan-2.16.0/bin/stanc --o=/tmp/1c95a9d8-eb04-4ae2-b362-1561a582ddf2.cpp test.stan
Model name=test_model
Input file=test.stan
Output file=/tmp/1c95a9d8-eb04-4ae2-b362-1561a582ddf2.cpp


In [9]:
_stan_vars

{'model_name': 'test_model', 'stan_code': None, 'stan_file': 'test.stan'}

In [10]:
model = pystan.StanModel(file=_stan_vars['stan_file'])

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b64da7aba54424736fda036b359c5ae9 NOW.


In [11]:
model.sampling(data=data)

Inference for Stan model: anon_model_b64da7aba54424736fda036b359c5ae9.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha       3.85  3.0e-3   0.17   3.52   3.74   3.85   3.97   4.18 3324.0    1.0
beta[0]     0.08  1.5e-3   0.07  -0.06   0.03   0.09   0.13   0.22 2259.0    1.0
beta[1]      0.2  1.0e-3   0.07   0.07   0.15    0.2   0.24   0.32 4000.0    1.0
beta[2]     0.31  1.2e-3   0.07   0.17   0.25   0.31   0.36   0.45 4000.0    1.0
sigma        1.7  2.6e-3   0.12   1.48   1.62   1.69   1.78   1.95 2250.0    1.0
y_rep[0]    4.37    0.03   1.77   0.95   3.17   4.35   5.56   7.84 3813.0    1.0
y_rep[1]    3.82    0.03   1.73   0.46   2.68    3.8   4.97   7.26 3995.0    1.0
y_rep[2]    4.25    0.03   1.73    0.8   3.09   4.27   5.44   7.58 4000.0    1.0
y_rep[3]    3.98    0.03   1.72   0.52   2.87   4.01   5.13   7.34 3175.0    1.0
y

### 2. %%stan -f [stan_file_name] --save_only

Saves the cell code to a file specified in [stan_file_name].  Skips compile step

In [12]:
%%stan -f test1.stan --save_only
data {
  
  int<lower=1> N;
  int<lower=1> K;
  
  matrix[N,K] X;
  vector[N] y;
}

parameters{
  real alpha;
  ordered[K] beta;
  real<lower=0> sigma;
  
}

model {
  
  alpha ~ normal(0,10);
  beta[1] ~ normal(.1,.1);
  beta[2] ~ normal(.2,.1);
  beta[3] ~ normal(.3,.1);
  sigma ~ exponential(1);
  for (n in 1:N) {
    y[n] ~ normal(X[n]*beta + alpha, sigma);
  } 
  
}

generated quantities {
  vector[N] y_rep;
  for (n in 1:N) {
    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);
  }
}

File test1.stan saved..Skipping Compile


In [13]:
model = pystan.StanModel(file='test1.stan')

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b64da7aba54424736fda036b359c5ae9 NOW.


### 3. %%stan 

Saves the cell code to a code string. The code string can be accessed via _stan_vars['stan_code']

In [14]:
%%stan
data {
  
  int<lower=1> N;
  int<lower=1> K;
  
  matrix[N,K] X;
  vector[N] y;
}

parameters{
  real alpha;
  ordered[K] beta;
  real<lower=0> sigma;
  
}

model {
  
  alpha ~ normal(0,10);
  beta[1] ~ normal(.1,.1);
  beta[2] ~ normal(.2,.1);
  beta[3] ~ normal(.3,.1);
  sigma ~ exponential(1);
  for (n in 1:N) {
    y[n] ~ normal(X[n]*beta + alpha, sigma);
  } 
  
}

generated quantities {
  vector[N] y_rep;
  for (n in 1:N) {
    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);
  }
}

Using stanc compiler:  /home/aravind/Downloads/cmdstan-2.16.0/bin/stanc
/home/aravind/Downloads/cmdstan-2.16.0/bin/stanc --o=/tmp/anon_00728df0-14e8-411c-8ad8-6e5279e956c4_model.cpp /tmp/anon_00728df0-14e8-411c-8ad8-6e5279e956c4.stan
Model name=anon_00728df0_14e8_411c_8ad8_6e5279e956c4_model
Input file=/tmp/anon_00728df0-14e8-411c-8ad8-6e5279e956c4.stan
Output file=/tmp/anon_00728df0-14e8-411c-8ad8-6e5279e956c4_model.cpp


In [15]:
_stan_vars

{'model_name': None,
 'stan_code': u'data {\n  \n  int<lower=1> N;\n  int<lower=1> K;\n  \n  matrix[N,K] X;\n  vector[N] y;\n}\n\nparameters{\n  real alpha;\n  ordered[K] beta;\n  real<lower=0> sigma;\n  \n}\n\nmodel {\n  \n  alpha ~ normal(0,10);\n  beta[1] ~ normal(.1,.1);\n  beta[2] ~ normal(.2,.1);\n  beta[3] ~ normal(.3,.1);\n  sigma ~ exponential(1);\n  for (n in 1:N) {\n    y[n] ~ normal(X[n]*beta + alpha, sigma);\n  } \n  \n}\n\ngenerated quantities {\n  vector[N] y_rep;\n  for (n in 1:N) {\n    y_rep[n] = normal_rng(X[n]*beta + alpha, sigma);\n  }\n}',
 'stan_file': None}

In [16]:
model = pystan.StanModel(model_code=_stan_vars['stan_code'])

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b64da7aba54424736fda036b359c5ae9 NOW.


In [17]:
model.sampling(data=data)

Inference for Stan model: anon_model_b64da7aba54424736fda036b359c5ae9.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha       3.86  3.4e-3   0.18   3.51   3.74   3.86   3.98    4.2 2615.0    1.0
beta[0]     0.08  1.7e-3   0.07  -0.05   0.03   0.08   0.13   0.22 1814.0    1.0
beta[1]      0.2  1.1e-3   0.07   0.07   0.15    0.2   0.24   0.33 3719.0    1.0
beta[2]     0.31  1.1e-3   0.07   0.17   0.26   0.31   0.36   0.45 4000.0    1.0
sigma       1.71  2.6e-3   0.13   1.49   1.62    1.7   1.79   1.98 2412.0    1.0
y_rep[0]    4.34    0.03   1.73   1.03   3.17   4.34    5.5   7.81 4000.0    1.0
y_rep[1]    3.86    0.03   1.72   0.57   2.69   3.84   5.02   7.26 4000.0    1.0
y_rep[2]    4.26    0.03   1.74   0.91   3.09   4.26   5.42   7.78 4000.0    1.0
y_rep[3]     4.0    0.03   1.71   0.56   2.88    4.0   5.14   7.34 3983.0    1.0
y

### 4. %%stan -f [stan_file_name] -o [cpp_file_name]
Saves the cell code to a file specified in [stan_file_name] and outputs the compiled cpp file to the file name specified by [cpp_file_name]


### 5. %% stan -f  [stan_file_name] --allow_undefined
passes the --allow_undefined argument to stanc compiler

### 6.%%stan -f  [stan_file_name] --stanc [stanc_compiler_path]
Saves the cell code to a file specified in [stan_file_name] and compiles using the stan compiler specified in [stanc_compiler]. By default, it uses stanc compiler in your path. If your path does not have the stanc compiler, use this option (e.g %%stan binom.stan --stanc "~/cmdstan-2.16.0/bin/stanc")

In [18]:
assert True