TuringLang · yebai · Aug 26, 2025 · Aug 26, 2025 · Aug 26, 2025
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@
 [AdvancedVI](https://github.com/TuringLang/AdvancedVI.jl) provides implementations of variational inference (VI) algorithms, which is a family of algorithms aiming for scalable approximate Bayesian inference by leveraging optimization.
 `AdvancedVI` is part of the [Turing](https://turinglang.org/stable/) probabilistic programming ecosystem.
 The purpose of this package is to provide a common accessible interface for various VI algorithms and utilities so that other packages, e.g. `Turing`, only need to write a light wrapper for integration.
-For example, integrating `Turing` with  `AdvancedVI.ADVI` only involves converting a `Turing.Model` into a [`LogDensityProblem`](https://github.com/tpapp/LogDensityProblems.jl) and extracting a corresponding `Bijectors.bijector`.
+For example, integrating `Turing` with `AdvancedVI.ADVI` only involves converting a `Turing.Model` into a [`LogDensityProblem`](https://github.com/tpapp/LogDensityProblems.jl) and extracting a corresponding `Bijectors.bijector`.
 
 ## Basic Example
 

diff --git a/docs/src/families.md b/docs/src/families.md
@@ -14,7 +14,7 @@ z \stackrel{d}{=} C u + m;\quad u \sim \varphi
 
 where ``C`` is the *scale*, ``m`` is the location, and ``\varphi`` is the *base distribution*.
 ``m`` and ``C`` form the variational parameters ``\lambda = (m, C)`` of ``q_{\lambda}``.
-The location-scale family encompases many practical variational families, which can be instantiated by setting the *base distribution* of ``u`` and the structure of ``C``.
+The location-scale family encompasses many practical variational families, which can be instantiated by setting the *base distribution* of ``u`` and the structure of ``C``.
 
 The probability density is given by
 
@@ -25,7 +25,7 @@ The probability density is given by
 the covariance is given as
 
 ```math
-  \mathrm{Var}\left(q_{\lambda}\right) = C \mathrm{Var}(q_{\lambda}) C^{\top}
+  \mathrm{Var}\left(q_{\lambda}\right) = C \mathrm{Var}(\varphi) C^{\top}
 ```
 
 and the entropy is given as
@@ -35,7 +35,7 @@ and the entropy is given as
 ```
 
 where ``\mathbb{H}(\varphi)`` is the entropy of the base distribution.
-Notice the ``\mathbb{H}(\varphi)`` does not depend on ``\log |C|``.
+Notice that ``\mathbb{H}(\varphi)`` does not depend on the variational parameters ``\lambda``.
 The derivative of the entropy with respect to ``\lambda`` is thus independent of the base distribution.
 
 ### API

diff --git a/docs/src/tutorials/basic.md b/docs/src/tutorials/basic.md
@@ -11,7 +11,7 @@ y &\sim \mathcal{N}\left(\mu_y, \sigma_y^2\right)
 
 BBVI with `Bijectors.Exp` bijectors is able to infer this model exactly.
 
-Using the `LogDensityProblems` interface, we the model can be defined as follows:
+Using the `LogDensityProblems` interface, the model can be defined as follows:
 
 ```@example elboexample
 using LogDensityProblems
@@ -79,7 +79,7 @@ nothing
 ```
 
 Now, `KLMinRepGradDescent` requires the variational approximation and the target log-density to have the same support.
-Since `y` follows a log-normal prior, its support is bounded to be the positive half-space ``\mathbb{R}_+``.
+Since `x` follows a log-normal prior, its support is bounded to be the positive half-space ``\mathbb{R}_+``.
 Thus, we will use [Bijectors](https://github.com/TuringLang/Bijectors.jl) to match the support of our target posterior and the variational approximation.
 
 ```@example elboexample
@@ -127,10 +127,10 @@ nothing
 For more information see [this section](@ref clipscale).
 
 `q_out` is the final output of the optimization procedure.
-If a parameter averaging strategy is used through the keyword argument `averager`, `q_out` is be the output of the averaging strategy.
+If a parameter averaging strategy is used through the keyword argument `averager`, `q_out` will be the output of the averaging strategy.
 
 The selected inference procedure stores per-iteration statistics into `stats`.
-For instance, the ELBO can be ploted as follows:
+For instance, the ELBO can be plotted as follows:
 
 ```@example elboexample
 using Plots

diff --git a/src/AdvancedVI.jl b/src/AdvancedVI.jl
@@ -91,7 +91,7 @@ This is an indirection for handling the type stability of `restructure`, as some
 
 # Arguments
 - `ad::ADTypes.AbstractADType`: Automatic differentiation backend. 
-- `restructure`: Callable for restructuring the varitional distribution from `params`.
+- `restructure`: Callable for restructuring the variational distribution from `params`.
 - `params`: Variational Parameters.
 """
 restructure_ad_forward(::ADTypes.AbstractADType, restructure, params) = restructure(params)
@@ -217,7 +217,7 @@ init(::Random.AbstractRNG, ::AbstractAlgorithm, ::Any, ::Any) = nothing
 """
     step(rng, alg, state, callback, objargs...; kwargs...)
 
-Perform a single step of `alg` given the previous `stat`.
+Perform a single step of `alg` given the previous `state`.
 
 # Arguments
 - `rng::Random.AbstractRNG`: Random number generator.

diff --git a/src/algorithms/paramspacesgd/repgradelbo.jl b/src/algorithms/paramspacesgd/repgradelbo.jl
@@ -137,12 +137,12 @@ AD-guaranteed forward path of the reparameterization gradient objective.
 - `aux`: Auxiliary information excluded from the AD path.
 
 # Auxiliary Information 
-`aux` should containt the following entries:
+`aux` should contain the following entries:
 - `rng`: Random number generator.
 - `obj`: The `RepGradELBO` objective.
 - `problem`: The target `LogDensityProblem`.
 - `adtype`: The `ADType` used for differentiating the forward path.
-- `restructure`: Callable for restructuring the varitional distribution from `params`.
+- `restructure`: Callable for restructuring the variational distribution from `params`.
 - `q_stop`: A copy of `restructure(params)` with its gradient "stopped" (excluded from the AD path).
 """
 function estimate_repgradelbo_ad_forward(params, aux)

diff --git a/src/algorithms/paramspacesgd/scoregradelbo.jl b/src/algorithms/paramspacesgd/scoregradelbo.jl
@@ -78,11 +78,11 @@ AD-guaranteed forward path of the score gradient objective.
 - `aux`: Auxiliary information excluded from the AD path.
 
 # Auxiliary Information 
-`aux` should containt the following entries:
+`aux` should contain the following entries:
 - `samples_stop`: Samples drawn from `q = restructure(params)` but with their gradients stopped (excluded from the AD path).
 - `logprob_stop`: Log-densities of the target `LogDensityProblem` evaluated over `samples_stop`.
 - `adtype`: The `ADType` used for differentiating the forward path.
-- `restructure`: Callable for restructuring the varitional distribution from `params`.
+- `restructure`: Callable for restructuring the variational distribution from `params`.
 """
 function estimate_scoregradelbo_ad_forward(params, aux)
     (; samples_stop, logprob_stop, adtype, restructure) = aux