diff --git a/R/example_data.R b/R/example_data.R
index 1730a0d..04717bf 100644
--- a/R/example_data.R
+++ b/R/example_data.R
@@ -1,8 +1,11 @@
-#' Generating test data with mtcars.
+#' Generating example data with mtcars.
 #'
-#' @description generates test data using base R's mtcars dataset
+#' @description Generates test data using base R's mtcars dataset.
+#' The response variable `y` is horsepower (`hp`), while the remaining variables
+#' represent the predictive features `X`.
 #'
-#' @param seed random seed to use. Defaults to 99.
+#' @param seed random seed to use.
+#' Defaults to 99.
 #'
 #' @return X_train, y_train, X_val, y_val (as a list of dataframes)
 #'
diff --git a/README.md b/README.md
index 07e3d9a..6a9f81b 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 [![Coverage status](https://codecov.io/gh/UBC-MDS/punisheR/branch/master/graph/badge.svg)](https://codecov.io/github/UBC-MDS/punisheR?branch=master)
 
 
-PunisheR is a package for feature and model selection in R. Specifically, this package implements tools for
+**punisheR** is a package for feature and model selection in R. Specifically, this package implements tools for
 forward and backward model selection (see [here](https://en.wikipedia.org/wiki/Stepwise_regression)).
 In order to measure model quality during the selection procedures, we have also implemented
 the Akaike and Bayesian Information Criterion (see below), both of which *punish* complex models -- hence this package's
@@ -72,23 +72,21 @@ X_val <- data[[3]]
 y_val <- data[[4]]
 ```
 
-### Forward Selection using r-squared
+### Forward selection
 
 ```r
-
 forward(X_train, y_train, X_val, y_val, min_change=0.5,
     n_features=NULL, criterion='r-squared', verbose=FALSE)
     
 #> [1] 10
 
 ```
-When implementing forward selection on the demo data, it returns a list of features for the best model. Here it
-can be seen that the function correctly returns only 1 feature.
+When implementing forward selection on the demo data, it returns a list of features for the best model. In this example, we use r-squared to determine the "best" model. Here it
+can be seen that the function correctly returns only 1 feature. 
 
-### Backward Selection using r-squared
+### Backward selection
 
 ```r
-
 backward(X_train, y_train, X_val, y_val,
     n_features=1, min_change=NULL, criterion='r-squared',
     verbose=FALSE)
@@ -100,7 +98,7 @@ backward(X_train, y_train, X_val, y_val,
 When implementing backward selection on the demo data, it returns a list of features for the best model.
 Here it can be seen that the function correctly returns only 1 feature.
 
-### Criterions
+### Scoring a model with AIC, BIC, and r-squared
 
 ```r
 model <- lm(y_train ~ mpg + cyl + disp, data = X_train)
@@ -113,7 +111,7 @@ bic(model)
 
 ```
 
-When scoring the two the model using AIC and BIC, we can see that the penalty when using `bic` is greater
+When scoring the model using AIC and BIC, we can see that the penalty when using `bic` is greater
 than the penalty obtained using `aic`.
 
 ```r
@@ -125,7 +123,7 @@ The value returned by the function `r_squared()` will be between 0 and 1.
 
 ## Vignette
 
-For a more comprehensive guide of PunisheR, you can read the vignette [here](vignettes/punisheR.md).
+For a more comprehensive guide of PunisheR, you can read the vignette [here](vignettes/punisheR.md) or html version [here](https://s3-us-west-2.amazonaws.com/punisherpkg/punisheR.html).
 
 
 
diff --git a/man/figures/logo.png b/man/figures/logo.png
index cc4df3e..1cc4fcf 100644
Binary files a/man/figures/logo.png and b/man/figures/logo.png differ
diff --git a/man/mtcars_data.Rd b/man/mtcars_data.Rd
new file mode 100644
index 0000000..3f51aff
--- /dev/null
+++ b/man/mtcars_data.Rd
@@ -0,0 +1,20 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/example_data.R
+\name{mtcars_data}
+\alias{mtcars_data}
+\title{Generating example data with mtcars.}
+\usage{
+mtcars_data(seed = 99)
+}
+\arguments{
+\item{seed}{random seed to use.
+Defaults to 99.}
+}
+\value{
+X_train, y_train, X_val, y_val (as a list of dataframes)
+}
+\description{
+Generates test data using base R's mtcars dataset.
+The response variable `y` is horsepower (`hp`), while the remaining variables
+represent the predictive features `X`.
+}
diff --git a/vignettes/punisheR.Rmd b/vignettes/punisheR.Rmd
index 96a97ad..b22bcc2 100644
--- a/vignettes/punisheR.Rmd
+++ b/vignettes/punisheR.Rmd
@@ -1,9 +1,7 @@
 ---
-title: "punisheR"
+title: "A complete guide to punisheR"
 author: "Jill Cates, Tariq Hassan, Avinash Prabhakaran"
-date: "`r Sys.Date()`"
 output: 
-    github_document : default
     rmarkdown::html_vignette : default
 vignette: >
   %\VignetteIndexEntry{Vignette Title}
@@ -12,13 +10,17 @@ vignette: >
 ---
 
 ```{r setup, include = FALSE}
-
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"
 )
 ```
 
+```{r, include=FALSE}
+library(knitr)
+library(punisheR)
+```
+
 ## Introduction
 
 [punisheR](https://github.com/UBC-MDS/punisheR) is a package for feature and model selection in R. Specifically, this package implements tools for forward and backward model selection. In order to measure model quality during the selection procedures, we have also implemented the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
@@ -36,14 +38,10 @@ Sources: https://en.wikipedia.org/wiki/Stepwise_regression
 
 The package contains three metrics that evaluate model performance: 
 
-- `aic()`: The [Akaike information criterion](https://en.wikipedia.org/wiki/Akaike_information_criterion) (AIC) adds a penalty term which penalizes more complex models. Its formal definition is:
-$$-2\ln(L)+2*k $$
-where $k$ is the number of features and $L$ is the maximized value of the likelihood function.
+- `aic()`: The [Akaike information criterion](https://en.wikipedia.org/wiki/Akaike_information_criterion) (AIC) adds a penalty term which penalizes more complex models. Its formal definition is: $-2\ln(L)+2*k $ where $k$ is the number of features and $L$ is the maximized value of the likelihood function.
 
 
-- `bic()`: The [Bayesian information criterion](https://en.wikipedia.org/wiki/Bayesian_information_criterion) adds a penality term which penalizes complex models to a greater extent than AIC. Its formal definition is:
-$$-2*\ln(L)+\ln(n)*k$$
-where $k$ is the number of features, $n$ is the number of observations, and $L$ is the maximized value of the likelihood function.
+- `bic()`: The [Bayesian information criterion](https://en.wikipedia.org/wiki/Bayesian_information_criterion) adds a penality term which penalizes complex models to a greater extent than AIC. Its formal definition is: $-2*\ln(L)+\ln(n)*k$ where $k$ is the number of features, $n$ is the number of observations, and $L$ is the maximized value of the likelihood function.
 
 - `r_squared()`: The [coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination) is the proportion of the variance in the response variable that can be predicted from the explanatory variable.
 
@@ -57,15 +55,14 @@ and [MASS](https://cran.r-project.org/web/packages/MASS/MASS.pdf) packages. The
 [`ols_step_backward()`](https://www.rdocumentation.org/packages/olsrr/versions/0.4.0/topics/ols_step_backward) for forward and backward stepwise selection, respectively. Both of these use p-value as a metric for feature selection. The latter, MASS, contains [`StepAIC()`](https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html), which is complete with three modes: forward, backward or both. Other packages that provide subset selection for regression models are [leaps](https://cran.r-project.org/web/packages/leaps/leaps.pdf) and [bestglm](https://cran.r-project.org/web/packages/bestglm/bestglm.pdf).
 
 
+## Loading the demo data
 
-```{r}
-library(knitr)
-library(punisheR)
-```
+To demonstrate how punisheR's feature selection and criterion functions work, we will use our demo data `mtcars_data()` which arranges `[mtcars](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html)` into the correct format for our use cases. 
 
+`mtcars_data()` returns a list of 4 dataframes in the following order: X_train, y_train, X_val, and y_val. Horsepower (`hp`) is the response variable (`y`), while the remaining variables of `mtcars` are the predictive features (`X`). The data is split into training data, which is used to *train* the model, and validation data which *validates* (scores) it.  
 
 ```{r}
-#Loading the demo mtcars data
+# Loading the demo mtcars data
 data <- mtcars_data()
 X_train <- data[[1]]
 y_train <- data[[2]]
@@ -74,17 +71,26 @@ y_val <- data[[4]]
 ```
 
 
-## Forward Selection by specifying the number of features
+## Forward Selection
+
+There are two parameters that determine how features are selected in forward selection:
 
-###### Usage example with `aic` as criterion
+1. `n_features` specifies the number of features. If you set `n_features` to 3, the forward selection function will select the 3 best features for your model.  
+2. `min_change` specifies the minimum change in score in order to proceed to the next iteration. The function stops when there are no features left that cause a change larger than the threshold `min_change`.  
+
+In order for forward selection to work, only one of `n_features` and `min_change` can be active. The other must be set to NULL.
+
+Let's look at how `n_features` works within forward selection:
+
+###### a) Usage example with `aic` as criterion
 
 ```{r}
 forward(X_train, y_train, X_val, y_val, min_change=NULL,
                     n_features=2, criterion='aic', verbose=FALSE)
 ```
-When implementing forward selection on the mtcars dataset with `hp` as the explanatory variable , it returns a list of features that form the best model. In the above example, the desired number of features has been specified as 2 and the criterion being used is `aic`. The function returns a list of 2 features.
+When implementing forward selection on the mtcars dataset with `hp` as the response variable , it returns a list of features that form the best model. In the above example, the desired number of features has been specified as 2 and the criterion being used is `aic`. The function returns a list of 2 features.
 
-###### Usage example with `bic` as criterion
+###### b) Usage example with `bic` as criterion
 
 ```{r}
 forward(X_train, y_train, X_val, y_val, min_change=NULL,
@@ -93,7 +99,7 @@ forward(X_train, y_train, X_val, y_val, min_change=NULL,
 
 In the above example, the desired number of features has been specified as 3 and the criterion being used is `bic`. The function returns a list of 3 features.
 
-###### Usage example with `r-squared` as criterion
+###### c) Usage example with `r-squared` as criterion
 
 ```{r}
 forward(X_train, y_train, X_val, y_val, min_change=NULL,
@@ -103,20 +109,24 @@ forward(X_train, y_train, X_val, y_val, min_change=NULL,
 In the above example, the desired number of features has been specified as 4 and the criterion being used is `r-squared`. The function returns a list of 4 features.
 
 
-#### Forward Selection by specifying the smallest change in criterion 
+Forward selection also works by specifying the smallest change in criterion, `min_change`:
 
 ```{r}
 forward(X_train, y_train, X_val, y_val, min_change=0.5,
                     n_features=NULL, criterion='r-squared', verbose=FALSE)
 ```
 
-In the example above, `forward` selction returns a list of 6 features when a minimum change of 0.5 is required in `r-squared` score for an additional feature to be selected.
+In the example above, `forward` selction returns a list of 6 features when a minimum change of 0.5 is required in `r-squared`'s score for an additional feature to be selected.
+
+**Note**: When using the criterion as `aic` or `bic`, the value for `min_change` should be carefully selected as `aic` and `bic` tend to have much larger values than `r-squared`.
+
 
-**Note**: When using the criterion as `aic` or `bic`, the value for `min_change` should be carefully selected as `aic` and `bic` tends to have much larger values.
 
+## Backward Selection
 
+Backward selection works in the same way as forward selection such that you must configure `n_features` or `min_change`, as well as the `criterion` to score the model. 
 
-#### Backward Selection by specifying the number of features
+###### a) Usage example with `aic` as criterion
 
 ```{r}
 backward(X_train, y_train, X_val, y_val,
@@ -124,21 +134,25 @@ backward(X_train, y_train, X_val, y_val,
                      verbose=FALSE)
 ```
 
+###### b) Usage example with `bic` as criterion
+
 ```{r}
 backward(X_train, y_train, X_val, y_val,
                      n_features=7, min_change=NULL, criterion='bic',
                      verbose=FALSE)
 ```
 
+###### c) Usage example with `r-squared` as criterion
+
 ```{r}
 backward(X_train, y_train, X_val, y_val,
                      n_features=7, min_change=NULL, criterion='r-squared',
                      verbose=FALSE)
 ```
 
-Similarly, for backward selection, the number of features are specified as 7 and the examples using all the three criterion are provided above.
+With `n_features` configured to 7, each example above returns the 7 best features based on model score. You can see above that changing the criterion can result in a different output of "best" features.
 
-#### Backward Selection by specifying the smallest change in criterion
+In the example below, `backward` selection returns a list of 10 features when the `min_change` in the `r-squared` criterion is specified as 0.5.
 
 ```{r}
 backward(X_train, y_train, X_val, y_val,
@@ -146,21 +160,21 @@ backward(X_train, y_train, X_val, y_val,
                      verbose=FALSE)
 ```
 
-In the example above, `backward` selection returns a list of 10 features when the minimum change in the `r-squared` criterion is specified as 0.5.
+## AIC, BIC & $R^2$
 
-#### AIC, BIC & $R^2$
+punisheR also provides three standalone functions to compute AIC, BIC, and $R^2$. For `aic()` and `bic()` you simply need to pass in the model (e.g., a `lm()` object). You can also pass in the validation data and response variable (`X_val`, `y_val`). By default, `X` and `y` are extracted from the model. 
 
 ```{r}
 model <- lm(y_train ~ mpg + cyl + disp, data = X_train)
-
-aic(model)
 ```
 
 ```{r}
-bic(model)
+aic(model, X_val, y_val)
+
+bic(model, X_val, y_val)
 ```
 
-When scoring the two the model using AIC and BIC, we can see that the penalty when using `bic` is greater than the penalty obtained using `aic`.
+When scoring the model using AIC and BIC, we can see that the penalty when using `bic` is greater than the penalty obtained using `aic`.
 
 ```{r}
 r_squared(model, X_val, y_val)
diff --git a/vignettes/punisheR.md b/vignettes/punisheR.md
index 35769d0..762a1cf 100644
--- a/vignettes/punisheR.md
+++ b/vignettes/punisheR.md
@@ -1,67 +1,47 @@
----
-title: "punisheR"
-author: "Jill Cates, Tariq Hassan, Avinash Prabhakaran"
-date: "2018-03-17"
-output: 
-    github_document : default
-    rmarkdown::html_vignette : default
-vignette: >
-  %\VignetteIndexEntry{Vignette Title}
-  %\VignetteEncoding{UTF-8}
-  %\VignetteEngine{knitr::rmarkdown}
----
+A complete guide to punisheR
+================
+Jill Cates, Tariq Hassan, Avinash Prabhakaran
 
-
-
-## Introduction
+Introduction
+------------
 
 [punisheR](https://github.com/UBC-MDS/punisheR) is a package for feature and model selection in R. Specifically, this package implements tools for forward and backward model selection. In order to measure model quality during the selection procedures, we have also implemented the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
 
-## Functions included
+Functions included
+------------------
 
 The package contains two stepwise feature selection techniques:
 
-- `forward()`: [Forward selection](https://en.wikipedia.org/wiki/Stepwise_regression#Main_approaches) starts with one feature and iteratively adds the features with the best scores using a model fit criterion. The process of adding features is repeated until either the maximum number of features (`n_features`) is reached or the change in score is less than the `min_change` threshold.
-
+-   `forward()`: [Forward selection](https://en.wikipedia.org/wiki/Stepwise_regression#Main_approaches) starts with one feature and iteratively adds the features with the best scores using a model fit criterion. The process of adding features is repeated until either the maximum number of features (`n_features`) is reached or the change in score is less than the `min_change` threshold.
 
-- `backward()`: [Backward selection/elimination](https://en.wikipedia.org/wiki/Stepwise_regression#Main_approaches) starts with all features and iteratively deletes features with the worst scores using model fit criterion. The process of deleting features is repeated until either the maximum number of features (`n_features`) is reached or the change in score is less than the `min_change` threshold.
+-   `backward()`: [Backward selection/elimination](https://en.wikipedia.org/wiki/Stepwise_regression#Main_approaches) starts with all features and iteratively deletes features with the worst scores using model fit criterion. The process of deleting features is repeated until either the maximum number of features (`n_features`) is reached or the change in score is less than the `min_change` threshold.
 
-Sources: https://en.wikipedia.org/wiki/Stepwise_regression
+Sources: <https://en.wikipedia.org/wiki/Stepwise_regression>
 
-The package contains three metrics that evaluate model performance: 
+The package contains three metrics that evaluate model performance:
 
-- `aic()`: The [Akaike information criterion](https://en.wikipedia.org/wiki/Akaike_information_criterion) (AIC) adds a penalty term which penalizes more complex models. Its formal definition is:
-$$-2\ln(L)+2*k $$
-where $k$ is the number of features and $L$ is the maximized value of the likelihood function.
+-   `aic()`: The [Akaike information criterion](https://en.wikipedia.org/wiki/Akaike_information_criterion) (AIC) adds a penalty term which penalizes more complex models. Its formal definition is: $-2(L)+2\*k $ where *k* is the number of features and *L* is the maximized value of the likelihood function.
 
+-   `bic()`: The [Bayesian information criterion](https://en.wikipedia.org/wiki/Bayesian_information_criterion) adds a penality term which penalizes complex models to a greater extent than AIC. Its formal definition is: −2 \* ln(*L*)+ln(*n*)\**k* where *k* is the number of features, *n* is the number of observations, and *L* is the maximized value of the likelihood function.
 
-- `bic()`: The [Bayesian information criterion](https://en.wikipedia.org/wiki/Bayesian_information_criterion) adds a penality term which penalizes complex models to a greater extent than AIC. Its formal definition is:
-$$-2*\ln(L)+\ln(n)*k$$
-where $k$ is the number of features, $n$ is the number of observations, and $L$ is the maximized value of the likelihood function.
-
-- `r_squared()`: The [coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination) is the proportion of the variance in the response variable that can be predicted from the explanatory variable.
+-   `r_squared()`: The [coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination) is the proportion of the variance in the response variable that can be predicted from the explanatory variable.
 
 These three criteria measure the relative quality of models within `forward()` and `backward()` and can be configured using the `criterion` parameter. In general, having more parameters in your model increases prediction accuracy but is highly susceptible to overfitting. AIC and BIC add a penalty for the number of features in a model. The lower the AIC and BIC score, the better the model.
 
-## How does punisheR fit into the existing R ecosystem?
+How does punisheR fit into the existing R ecosystem?
+----------------------------------------------------
 
-In the R ecosystem, forward and backward selection are implemented in both the [olsrr](https://cran.r-project.org/web/packages/olsrr/)
-and [MASS](https://cran.r-project.org/web/packages/MASS/MASS.pdf) packages. The former provides
-[`ols_step_forward()`](https://www.rdocumentation.org/packages/olsrr/versions/0.4.0/topics/ols_step_forward) and
-[`ols_step_backward()`](https://www.rdocumentation.org/packages/olsrr/versions/0.4.0/topics/ols_step_backward) for forward and backward stepwise selection, respectively. Both of these use p-value as a metric for feature selection. The latter, MASS, contains [`StepAIC()`](https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html), which is complete with three modes: forward, backward or both. Other packages that provide subset selection for regression models are [leaps](https://cran.r-project.org/web/packages/leaps/leaps.pdf) and [bestglm](https://cran.r-project.org/web/packages/bestglm/bestglm.pdf).
+In the R ecosystem, forward and backward selection are implemented in both the [olsrr](https://cran.r-project.org/web/packages/olsrr/) and [MASS](https://cran.r-project.org/web/packages/MASS/MASS.pdf) packages. The former provides [`ols_step_forward()`](https://www.rdocumentation.org/packages/olsrr/versions/0.4.0/topics/ols_step_forward) and [`ols_step_backward()`](https://www.rdocumentation.org/packages/olsrr/versions/0.4.0/topics/ols_step_backward) for forward and backward stepwise selection, respectively. Both of these use p-value as a metric for feature selection. The latter, MASS, contains [`StepAIC()`](https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html), which is complete with three modes: forward, backward or both. Other packages that provide subset selection for regression models are [leaps](https://cran.r-project.org/web/packages/leaps/leaps.pdf) and [bestglm](https://cran.r-project.org/web/packages/bestglm/bestglm.pdf).
 
+Loading the demo data
+---------------------
 
+To demonstrate how punisheR's feature selection and criterion functions work, we will use our demo data `mtcars_data()` which arranges `[mtcars](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html)` into the correct format for our use cases.
 
+`mtcars_data()` returns a list of 4 dataframes in the following order: X\_train, y\_train, X\_val, and y\_val. Horsepower (`hp`) is the response variable (`y`), while the remaining variables of `mtcars` are the predictive features (`X`). The data is split into training data, which is used to *train* the model, and validation data which *validates* (scores) it.
 
-```r
-library(knitr)
-library(punisheR)
-```
-
-
-
-```r
-#Loading the demo mtcars data
+``` r
+# Loading the demo mtcars data
 data <- mtcars_data()
 X_train <- data[[1]]
 y_train <- data[[2]]
@@ -69,23 +49,31 @@ X_val <- data[[3]]
 y_val <- data[[4]]
 ```
 
+Forward Selection
+-----------------
+
+There are two parameters that determine how features are selected in forward selection:
 
-## Forward Selection by specifying the number of features
+1.  `n_features` specifies the number of features. If you set `n_features` to 3, the forward selection function will select the 3 best features for your model.
+2.  `min_change` specifies the minimum change in score in order to proceed to the next iteration. The function stops when there are no features left that cause a change larger than the threshold `min_change`.
 
-###### Usage example with `aic` as criterion
+In order for forward selection to work, only one of `n_features` and `min_change` can be active. The other must be set to NULL.
 
+Let's look at how `n_features` works within forward selection:
 
-```r
+###### a) Usage example with `aic` as criterion
+
+``` r
 forward(X_train, y_train, X_val, y_val, min_change=NULL,
                     n_features=2, criterion='aic', verbose=FALSE)
 #> [1] 9 4
 ```
-When implementing forward selection on the mtcars dataset with `hp` as the explanatory variable , it returns a list of features that form the best model. In the above example, the desired number of features has been specified as 2 and the criterion being used is `aic`. The function returns a list of 2 features.
 
-###### Usage example with `bic` as criterion
+When implementing forward selection on the mtcars dataset with `hp` as the response variable , it returns a list of features that form the best model. In the above example, the desired number of features has been specified as 2 and the criterion being used is `aic`. The function returns a list of 2 features.
 
+###### b) Usage example with `bic` as criterion
 
-```r
+``` r
 forward(X_train, y_train, X_val, y_val, min_change=NULL,
                     n_features=3, criterion='bic', verbose=FALSE)
 #> [1] 9 4 8
@@ -93,10 +81,9 @@ forward(X_train, y_train, X_val, y_val, min_change=NULL,
 
 In the above example, the desired number of features has been specified as 3 and the criterion being used is `bic`. The function returns a list of 3 features.
 
-###### Usage example with `r-squared` as criterion
-
+###### c) Usage example with `r-squared` as criterion
 
-```r
+``` r
 forward(X_train, y_train, X_val, y_val, min_change=NULL,
                     n_features=4, criterion='r-squared', verbose=FALSE)
 #> [1] 2 1 6 3
@@ -104,82 +91,81 @@ forward(X_train, y_train, X_val, y_val, min_change=NULL,
 
 In the above example, the desired number of features has been specified as 4 and the criterion being used is `r-squared`. The function returns a list of 4 features.
 
+Forward selection also works by specifying the smallest change in criterion, `min_change`:
 
-#### Forward Selection by specifying the smallest change in criterion 
-
-
-```r
+``` r
 forward(X_train, y_train, X_val, y_val, min_change=0.5,
                     n_features=NULL, criterion='r-squared', verbose=FALSE)
 #> [1] 2 1 6 3 7 5
 ```
 
-In the example above, `forward` selction returns a list of 6 features when a minimum change of 0.5 is required in `r-squared` score for an additional feature to be selected.
-
-**Note**: When using the criterion as `aic` or `bic`, the value for `min_change` should be carefully selected as `aic` and `bic` tends to have much larger values.
+In the example above, `forward` selction returns a list of 6 features when a minimum change of 0.5 is required in `r-squared`'s score for an additional feature to be selected.
 
+**Note**: When using the criterion as `aic` or `bic`, the value for `min_change` should be carefully selected as `aic` and `bic` tend to have much larger values than `r-squared`.
 
+Backward Selection
+------------------
 
-#### Backward Selection by specifying the number of features
+Backward selection works in the same way as forward selection such that you must configure `n_features` or `min_change`, as well as the `criterion` to score the model.
 
+###### a) Usage example with `aic` as criterion
 
-```r
+``` r
 backward(X_train, y_train, X_val, y_val,
                      n_features=7, min_change=NULL, criterion='aic',
                      verbose=FALSE)
 #> [1]  1  4  5  7  8  9 10
 ```
 
+###### b) Usage example with `bic` as criterion
 
-```r
+``` r
 backward(X_train, y_train, X_val, y_val,
                      n_features=7, min_change=NULL, criterion='bic',
                      verbose=FALSE)
 #> [1]  1  4  5  7  8  9 10
 ```
 
+###### c) Usage example with `r-squared` as criterion
 
-```r
+``` r
 backward(X_train, y_train, X_val, y_val,
                      n_features=7, min_change=NULL, criterion='r-squared',
                      verbose=FALSE)
 #> [1] 1 2 3 5 6 7 9
 ```
 
-Similarly, for backward selection, the number of features are specified as 7 and the examples using all the three criterion are provided above.
+With `n_features` configured to 7, each example above returns the 7 best features based on model score. You can see above that changing the criterion can result in a different output of "best" features.
 
-#### Backward Selection by specifying the smallest change in criterion
+In the example below, `backward` selection returns a list of 10 features when the `min_change` in the `r-squared` criterion is specified as 0.5.
 
-
-```r
+``` r
 backward(X_train, y_train, X_val, y_val,
                      n_features=NULL, min_change=0.5, criterion='r-squared',
                      verbose=FALSE)
 #>  [1]  1  2  3  4  5  6  7  8  9 10
 ```
 
-In the example above, `backward` selection returns a list of 10 features when the minimum change in the `r-squared` criterion is specified as 0.5.
-
-#### AIC, BIC & $R^2$
+AIC, BIC & *R*<sup>2</sup>
+--------------------------
 
+punisheR also provides three standalone functions to compute AIC, BIC, and *R*<sup>2</sup>. For `aic()` and `bic()` you simply need to pass in the model (e.g., a `lm()` object). You can also pass in the validation data and response variable (`X_val`, `y_val`). By default, `X` and `y` are extracted from the model.
 
-```r
+``` r
 model <- lm(y_train ~ mpg + cyl + disp, data = X_train)
-
-aic(model)
-#> [1] 252.6288
 ```
 
+``` r
+aic(model, X_val, y_val)
+#> [1] 217.1279
 
-```r
-bic(model)
-#> [1] 258.5191
+bic(model, X_val, y_val)
+#> [1] 223.0182
 ```
 
-When scoring the two the model using AIC and BIC, we can see that the penalty when using `bic` is greater than the penalty obtained using `aic`.
-
+When scoring the model using AIC and BIC, we can see that the penalty when using `bic` is greater than the penalty obtained using `aic`.
 
-```r
+``` r
 r_squared(model, X_val, y_val)
 #> [1] 0.7838625
 ```