Skip to content

Commit

Permalink
tweak README
Browse files Browse the repository at this point in the history
  • Loading branch information
martinjankowiak committed Nov 10, 2021
1 parent cd8ac3f commit 8cc9f49
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 17 deletions.
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,21 @@ In the context of generalized linear models with `P` covariates `{X_1, ..., X_P}
Bayesian variable selection can be used to identify *sparse* subsets of covariates (i.e. far fewer than `P`)
that are sufficient for explaining the observed responses.

In more detail, Bayesian variable selection can be understood as a model selection problem in which we consider
In more detail, Bayesian variable selection is formulated as a model selection problem in which we consider
the space of `2^P` models in which some covariates are included and the rest are excluded.
For example, one particular model might be `Y = b_3 X_3 + b_9 X_9`.
A priori we assume that models with fewer included covariates are more likely than those with more included covariates.
The models best supported by the data are encoded as a posterior distribution over the space of models.
The set of parsimonious models best supported by the data then emerges from the posterior distribution over the space of models.

What's especially appealing about Bayesian variable selection is that it provides us with an interpretable score
What's especially appealing about Bayesian variable selection is that it provides an interpretable score
called the PIP (posterior inclusion probability) for each covariate `X_p`.
The PIP is a true probability and so it satisfies `0 <= PIP <= 1` by definition.
Covariates with large PIPs are good candidates for being explanatory of the response `Y`.

Being able to compute PIPs is particularly useful for high-dimensional datasets with large `P`.
For example, we might want to select a small number of covariates to include in a predictive model (i.e. feature selection).
Alternatively, in settings where it is implausible to subject all `P` covariates to
some expensive downstream analysis (e.g. a lab experiment),
some expensive downstream analysis (e.g. a laboratory experiment),
Bayesian variable selection can be used to select a small number of covariates for further analysis.


Expand Down Expand Up @@ -69,7 +70,7 @@ selector = NormalLikelihoodVariableSelector(dataframe, # pass in the data
S=1, # specify the expected number of covariates to include a priori
)

# run the MCMC algorithm to compute posterior compusion probabilities and other posterior quantities of interest
# run the MCMC algorithm to compute posterior inclusion probabilities and other posterior quantities of interest
selector.run(T=1000, T_burnin=500)

# inspect the results
Expand Down
25 changes: 13 additions & 12 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,27 @@ What is Bayesian variable selection?
------------------------------------

Bayesian variable selection is a model-based approach for identifying parsimonious explanations of observed data.
In the context of generalized linear models with `P` covariates `{X_1, ..., X_P}` and responses `Y`,
Bayesian variable selection can be used to identify *sparse* subsets of covariates (i.e. far fewer than `P`)
In the context of generalized linear models with `P` covariates `{X_1, ..., X_P}` and responses `Y`,
Bayesian variable selection can be used to identify *sparse* subsets of covariates (i.e. far fewer than `P`)
that are sufficient for explaining the observed responses.

In more detail, Bayesian variable selection can be understood as a model selection problem in which we consider
In more detail, Bayesian variable selection is formulated as a model selection problem in which we consider
the space of `2^P` models in which some covariates are included and the rest are excluded.
For example, one particular model might be `Y = b_3 X_3 + b_9 X_9`.
A priori we assume that models with fewer included covariates are more likely than those with more included covariates.
The models best supported by the data are encoded as a posterior distribution over the space of models.
The set of parsimonious models best supported by the data then emerges from the posterior distribution over the space of models.

What's especially appealing about Bayesian variable selection is that it provides us with an interpretable score
called the PIP (posterior inclusion probability) for each covariate `X_p`.
What's especially appealing about Bayesian variable selection is that it provides an interpretable score
called the PIP (posterior inclusion probability) for each covariate `X_p`.
The PIP is a true probability and so it satisfies `0 <= PIP <= 1` by definition.
Covariates with large PIPs are good candidates for being explanatory of the response `Y`.

Being able to compute PIPs is particularly useful for high-dimensional datasets with large `P`.
For example, we might want to select a small number of covariates to include in a predictive model (i.e. feature selection).
Alternatively, in settings where it is implausible to subject all `P` covariates to
some expensive downstream analysis (e.g. a lab experiment),
Bayesian variable selection can be used to select a small number of covariates for further analysis.
For example, we might want to select a small number of covariates to include in a predictive model (i.e. feature selection).
Alternatively, in settings where it is implausible to subject all `P` covariates to
some expensive downstream analysis (e.g. a laboratory experiment),
Bayesian variable selection can be used to select a small number of covariates for further analysis.


Requirements
-------------
Expand Down Expand Up @@ -73,7 +74,7 @@ Using millipede is easy:
S=1, # specify the expected number of covariates to include a priori
)

# run the MCMC algorithm to compute posterior compusion probabilities and other posterior quantities of interest
# run the MCMC algorithm to compute posterior inclusion probabilities and other posterior quantities of interest
selector.run(T=1000, T_burnin=500)

# inspect the results
Expand Down

0 comments on commit 8cc9f49

Please sign in to comment.