-
Notifications
You must be signed in to change notification settings - Fork 142

Description
This is from Autoregressive Models Chapter
To see why, let us consider the conditional for the last dimension, given by
$p(x_n|x_{\lt n})$ . In order to fully specify this conditional, we need to specify a probability for$2^{n−1}$ configurations of the variables$x_1,x_2,\ldots,x_{n−1}$ . Since the probabilities should sum to$1$ , the total number of parameters for specifying this conditional is given by$2^{n−1}−1$ . Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.
Shouldn't it be