From d68a8a8166d19cc734a8e3c1f85a0d77490b283b Mon Sep 17 00:00:00 2001 From: Drew Herren Date: Wed, 12 Feb 2025 20:13:19 -0600 Subject: [PATCH] Updated writeup in the low-level interface notebook --- demo/notebooks/prototype_interface.ipynb | 34 ++++-------------------- 1 file changed, 5 insertions(+), 29 deletions(-) diff --git a/demo/notebooks/prototype_interface.ipynb b/demo/notebooks/prototype_interface.ipynb index ef220c6e..04a7af40 100644 --- a/demo/notebooks/prototype_interface.ipynb +++ b/demo/notebooks/prototype_interface.ipynb @@ -20,43 +20,19 @@ "to the C++ code that doesn't require modifying any C++.\n", "\n", "To illustrate when such a prototype interface might be useful, consider\n", - "the classic BART algorithm:\n", - "\n", - "**INPUT**: $y$, $X$, $\\tau$, $\\nu$, $\\lambda$, $\\alpha$, $\\beta$\n", - "\n", - "**OUTPUT**: $m$ samples of a decision forest with $k$ trees and global variance parameter $\\sigma^2$\n", - "\n", - "Initialize $\\sigma^2$ via a default or a data-dependent calibration exercise\n", - "\n", - "Initialize \"forest 0\" with $k$ trees with a single root node, referring to tree $j$'s prediction vector as $f_{0,j}$\n", - "\n", - "Compute residual as $r = y - \\sum_{j=1}^k f_{0,j}$\n", - "\n", - "**FOR** $i$ **IN** $\\left\\{1,\\dots,m\\right\\}$:\n", - "\n", - " Initialize forest $i$ from forest $i-1$\n", - " \n", - " **FOR** $j$ **IN** $\\left\\{1,\\dots,k\\right\\}$:\n", - " \n", - " Add predictions for tree $j$ to residual: $r = r + f_{i,j}$ \n", - " \n", - " Update tree $j$ via Metropolis-Hastings with $r$ and $X$ as data and tree priors depending on ($\\tau$, $\\sigma^2$, $\\alpha$, $\\beta$)\n", - "\n", - " Sample leaf node parameters for tree $j$ via Gibbs (leaf node prior is $N\\left(0,\\tau\\right)$)\n", - " \n", - " Subtract (updated) predictions for tree $j$ from residual: $r = r - f_{i,j}$\n", - "\n", - " Sample $\\sigma^2$ via Gibbs (prior is $IG(\\nu/2,\\nu\\lambda/2)$)\n", + "that that \"classic\" BART algorithm is essentially a Metropolis-within-Gibbs \n", + "sampler, in which the forest is sampled by MCMC, conditional on all of the \n", + "other model parameters, and then the model parameters are updated by Gibbs.\n", "\n", "While the algorithm itself is conceptually simple, much of the core \n", "computation is carried out in low-level languages such as C or C++ \n", - "because of the tree data structure. As a result, any changes to this \n", + "because of the tree data structures. As a result, any changes to this \n", "algorithm, such as supporting heteroskedasticity and categorical outcomes (Murray 2021) \n", "or causal effect estimation (Hahn et al 2020) require modifying low-level code. \n", "\n", "The prototype interface exposes the core components of the \n", "loop above at the R level, thus making it possible to interchange \n", - "C++ computation for steps like \"update tree $j$ via Metropolis-Hastings\" \n", + "C++ computation for steps like \"update forest via Metropolis-Hastings\" \n", "with R computation for a custom variance model, other user-specified additive \n", "mean model components, and so on." ]